Efficiently Apply the Same Function to Multiple Datasets in R
🔁 Introduction
In data analysis, we often encounter situations where we need to perform the same transformation or calculation on multiple datasets — such as datasets grouped by age, treatment, or study phase.
Rather than repeating the same code block over and over, R provides a more efficient, less error-prone solution: the lapply()
function.
✨ Why Use lapply()
?
When you have more than 5 or even 10 datasets to process, using a loop or manually running the same function can quickly become tedious and risky. By placing all target datasets in a list and applying a custom function via lapply()
, you can:
- Minimize code repetition
- Reduce the chance of mistakes
- Improve code scalability and clarity
🔧 Example: Adjust Units and Calculate BMI
Let’s say you have datasets containing weight (in 公斤) and height (in 公尺) for different age groups. The goal is to:
- Convert the units from Chinese characters to standard abbreviations (
kg
,m
)
- Calculate the Body Mass Index (BMI) using the formula:
\[ \text{BMI} = \frac{\text{weight (kg)}}{\text{height (m)}^2} \]
📦 Step 1: Create Sample Datasets
# Sample data for three age groups
<- data.frame(Height = c("1.65公尺", "1.70公尺"), Weight = c("60公斤", "65公斤"))
group1
<- data.frame(Height = c("1.60公尺", "1.75公尺"), Weight = c("55公斤", "70公斤"))
group2
<- data.frame(Height = c("1.72公尺", "1.80公尺"), Weight = c("68公斤", "75公斤"))
group3 # Combine them into a list
<- list(group1, group2, group3) groups
🔧 Step 2: Define a Custom Function
# Function to convert units and calculate BMI
<- function(df) {
replace_function $Height <- as.numeric(gsub("公尺", "", df$Height)) # Remove '公尺' and convert to numeric
df$Weight <- as.numeric(gsub("公斤", "", df$Weight)) # Remove '公斤' and convert to numeric
df$BMI <- round(df$Weight / (df$Height^2), 1) # Calculate BMI
dfreturn(df)
}
🚀 Step 3: Apply the Function Using lapply()
# Apply the same function to all groups
<- lapply(groups, replace_function)
results
# Preview the result for group1
1]] results[[
✅ Output (example)
Height Weight BMI1 1.65 60 22.0
2 1.70 65 22.5
💡 Final Thoughts
The combination of lapply()
and gsub()
demonstrates a powerful pattern in R: clean, consistent, and reproducible operations across datasets.
Whether you’re dealing with different demographic groups or multiple study arms, putting your datasets in a list and defining a reusable function can save time and prevent mistakes — especially in clinical trial data preparation or large-scale reporting tasks.
Happy coding!