I am struggling with variable labels of data.frame columns. Say I have the following data frame (part of much larger data frame):
data <- data.frame(age =
You can do this by creating a list from the named vector of var.labels
and assigning that to the label
values. I've used match
to ensure that values of var.labels
are assigned to their corresponding column in data
even if the order of var.labels
is different from the order of the data
columns.
library(Hmisc)
var.labels = c(age="Age in Years", sex="Sex of the participant")
label(data) = as.list(var.labels[match(names(data), names(var.labels))])
label(data)
age sex
"Age in Years" "Sex of the participant"
Original Answer
My original answer used lapply
, which isn't actually necessary. Here's the original answer for archival purposes:
You can assign the labels using lapply
:
label(data) = lapply(names(data), function(x) var.labels[match(x, names(var.labels))])
lapply
applies a function to each element of a list or vector. In this case the function is applied to each value of names(data)
and it picks out the label value from var.labels
that corresponds to the current value of names(data)
.
Reading through a few tutorials is a good way to get the general idea, but you'll really get the hang of it if you start using lapply
in different situations and see how it behaves.
If your vector of labels matches the order of your data.frame columns, but isn't a named vector (so can't be used to subset data.frame columns by name like the lapply
approach in the other answer), you can use a for-loop:
for(i in seq_along(data)){
Hmisc::label(data[, i]) <- var.labels[i]
}
label(data)
#> age sex
#> "Age in Years" "Sex of the participant"
Instead of {Hmisc}
you can use the package {labelled}
:
data <- labelled::set_variable_labels(data, .labels = var.labels)
I highly recommend to use the Hmisc::upData()
function.
Here a reprex example:
set.seed(22)
data <- data.frame(age = floor(rnorm(6,25,10)),
sex = gl(2,1,6, labels = c("f","m")))
var.labels <- c(age = "Age in Years",
sex = "Sex of the participant")
dplyr::as.tbl(data) # as tibble ---------------------------------------------
#> # A tibble: 6 × 2
#> age sex
#> <dbl> <fctr>
#> 1 19 f
#> 2 49 m
#> 3 35 f
#> 4 27 m
#> 5 22 f
#> 6 43 m
data <- Hmisc::upData(data, labels = var.labels) # update data --------------
#> Input object size: 1328 bytes; 2 variables 6 observations
#> New object size: 2096 bytes; 2 variables 6 observations
Hmisc::label(data) # check new labels ---------------------------------------
#> age sex
#> "Age in Years" "Sex of the participant"
Hmisc::contents(data) # data dictionary -------------------------------------
#>
#> Data frame:data 6 observations and 2 variables Maximum # NAs:0
#>
#>
#> Labels Levels Class Storage
#> age Age in Years integer integer
#> sex Sex of the participant 2 integer
#>
#> +--------+------+
#> |Variable|Levels|
#> +--------+------+
#> | sex | f,m |
#> +--------+------+