R: Convert values into pipe-delimited format

懵懂的女人 提交于 2019-12-11 06:49:56

问题


I'm trying to create a REDCap data dictionary from an SPSS output. SPSS lists the allowed values, or factors, for each variable like this:

SEX       0 Male
          1 Female

LANGUAGE  1 English
          2 Spanish
          3 Other
          6 Unknown

How can I convert the above to this format for REDCap:

Variable        Values
SEX             0, Male | 1, Female
LANGUAGE        1, English | 2, Spanish | 3, Other | 6, Unknown

The language I'm best with is R.


回答1:


Here's one approach that relies on sub() and tidyr::fill(). It returns a dataset that you may want to write to disk (with something like readr::write_csv() or paste from the R console directly into the REDCap data dictionary.

Step 1: read the plain-text as a single-column dataset.

With your scenario, raw_text could be a file-path.

raw_text <- "
  SEX       0 Male
            1 Female

  LANGUAGE  1 English
            2 Spanish
            3 Other
            6 Unknown"

ds_raw <- readr::read_csv(
  file      = raw_text,
  col_names = FALSE,
  trim_ws   = FALSE
)

Step 2: extract the implied structure from the single column

  • regexes identify & separate the columns. (The initial \\s*? can probably be dropped if you're reading from a file.).
  • blanks in Variable are replaced with NAs.
  • ID and Value are smushed to create Values.
  • tidyr::fill() carries forward the missing Variable cells.
library(magrittr)
pattern <- "^\\s*?(\\w+)?\\s+(\\d{1,3})\\s+(.+?)$"
ds_completed <- ds_raw %>%
  dplyr::mutate(
    Variable    = sub(pattern, "\\1", X1),
    ID          = as.integer(sub(pattern, "\\2", X1)),
    Value       = sub(pattern, "\\3", X1),
    Variable    = dplyr::na_if(Variable, ""),

    Values      = paste0(ID, ", ", Value)
  ) %>% 
  tidyr::fill(Variable) %>% 
  dplyr::select(-X1)

Intermediate Result:

# A tibble: 6 x 4
  Variable    ID Value   Values    
  <chr>    <int> <chr>   <chr>     
1 SEX          0 Male    0, Male   
2 SEX          1 Female  1, Female 
3 LANGUAGE     1 English 1, English
4 LANGUAGE     2 Spanish 2, Spanish
5 LANGUAGE     3 Other   3, Other  
6 LANGUAGE     6 Unknown 6, Unknown

Step 3: determine & record the initial order of Variable

ds_order <- ds_completed %>% 
  dplyr::distinct(Variable) %>% 
  tibble::rowid_to_column("variable_order")

Step 4: output one line per unique Variable

  • collapse Values, separated by a pipe.
  • restore Variable order by joining on ds_order and arrange()ing.
ds_completed %>% 
  dplyr::group_by(Variable) %>% 
  dplyr::summarize(
    Values  = paste(Values, collapse = " | ")
  ) %>% 
  dplyr::ungroup() %>% 
  dplyr::left_join(ds_order, by="Variable") %>% 
  dplyr::arrange(variable_order) %>% 
  dplyr::select(-variable_order)

Result

# A tibble: 2 x 2
  Variable Values                                         
  <chr>    <chr>                                          
1 SEX      0, Male | 1, Female                            
2 LANGUAGE 1, English | 2, Spanish | 3, Other | 6, Unknown

Formalizing in a package function.

I've never needed to go from an SPSS format to a REDCap data dictionary, but it makes sense that you need to here. If this a frequent need for SPSS users (who know a little R), I 'm willing to move this a REDCapR function and write unit tests if you'll create a new issue and save some example input datasets and expected datasets (for the unit tests).

If you ever need to translate in the opposite direction, consider REDCapR::checkbox_choices().

Other resources

REDCapR and redcapAPI are the two R packages developed around the REDCap API. There are roughly a dozen packages written in various languages for the REDCap API, but SPSS isn't currently one of them.



来源:https://stackoverflow.com/questions/50949896/r-convert-values-into-pipe-delimited-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!