问题
I'm trying to create a REDCap data dictionary from an SPSS output. SPSS lists the allowed values, or factors, for each variable like this:
SEX 0 Male
1 Female
LANGUAGE 1 English
2 Spanish
3 Other
6 Unknown
How can I convert the above to this format for REDCap:
Variable Values
SEX 0, Male | 1, Female
LANGUAGE 1, English | 2, Spanish | 3, Other | 6, Unknown
The language I'm best with is R.
回答1:
Here's one approach that relies on sub() and tidyr::fill(). It returns a dataset that you may want to write to disk (with something like readr::write_csv() or paste from the R console directly into the REDCap data dictionary.
Step 1: read the plain-text as a single-column dataset.
With your scenario, raw_text
could be a file-path.
raw_text <- "
SEX 0 Male
1 Female
LANGUAGE 1 English
2 Spanish
3 Other
6 Unknown"
ds_raw <- readr::read_csv(
file = raw_text,
col_names = FALSE,
trim_ws = FALSE
)
Step 2: extract the implied structure from the single column
- regexes identify & separate the columns. (The initial
\\s*?
can probably be dropped if you're reading from a file.). - blanks in
Variable
are replaced withNA
s. ID
andValue
are smushed to createValues
.tidyr::fill()
carries forward the missingVariable
cells.
library(magrittr)
pattern <- "^\\s*?(\\w+)?\\s+(\\d{1,3})\\s+(.+?)$"
ds_completed <- ds_raw %>%
dplyr::mutate(
Variable = sub(pattern, "\\1", X1),
ID = as.integer(sub(pattern, "\\2", X1)),
Value = sub(pattern, "\\3", X1),
Variable = dplyr::na_if(Variable, ""),
Values = paste0(ID, ", ", Value)
) %>%
tidyr::fill(Variable) %>%
dplyr::select(-X1)
Intermediate Result:
# A tibble: 6 x 4
Variable ID Value Values
<chr> <int> <chr> <chr>
1 SEX 0 Male 0, Male
2 SEX 1 Female 1, Female
3 LANGUAGE 1 English 1, English
4 LANGUAGE 2 Spanish 2, Spanish
5 LANGUAGE 3 Other 3, Other
6 LANGUAGE 6 Unknown 6, Unknown
Step 3: determine & record the initial order of Variable
ds_order <- ds_completed %>%
dplyr::distinct(Variable) %>%
tibble::rowid_to_column("variable_order")
Step 4: output one line per unique Variable
- collapse
Values
, separated by a pipe. - restore
Variable
order by joining onds_order
andarrange()
ing.
ds_completed %>%
dplyr::group_by(Variable) %>%
dplyr::summarize(
Values = paste(Values, collapse = " | ")
) %>%
dplyr::ungroup() %>%
dplyr::left_join(ds_order, by="Variable") %>%
dplyr::arrange(variable_order) %>%
dplyr::select(-variable_order)
Result
# A tibble: 2 x 2
Variable Values
<chr> <chr>
1 SEX 0, Male | 1, Female
2 LANGUAGE 1, English | 2, Spanish | 3, Other | 6, Unknown
Formalizing in a package function.
I've never needed to go from an SPSS format to a REDCap data dictionary, but it makes sense that you need to here. If this a frequent need for SPSS users (who know a little R), I 'm willing to move this a REDCapR function and write unit tests if you'll create a new issue and save some example input datasets and expected datasets (for the unit tests).
If you ever need to translate in the opposite direction, consider REDCapR::checkbox_choices().
Other resources
REDCapR and redcapAPI are the two R packages developed around the REDCap API. There are roughly a dozen packages written in various languages for the REDCap API, but SPSS isn't currently one of them.
来源:https://stackoverflow.com/questions/50949896/r-convert-values-into-pipe-delimited-format