R: SAS (if/then statement )in R

后端 未结 4 1376
遥遥无期
遥遥无期 2021-01-16 17:28

I was working previously with SAS and then decided to shift to R for academic requirements reasons. My data (healthdemo) are health data containing some health diagnostic co

4条回答
  •  鱼传尺愫
    2021-01-16 17:50

    I had a similar struggle when I transitioned from SAS to R for health-related research. My solution was to, as much as possible, let go the "if...then" approach and take advantage of some of R's unique native programming capabilities. Here are two approaches to your problem.

    First, you can use indexing to find and replace elements. Here is some hospital discharge data of the kind you describe:

    hosp<-read.csv(file="http://www.columbia.edu/~cjd11/charles_dimaggio/DIRE/resources/R/sparcsShort.csv",stringsAsFactors=F)
    head(hosp)
    

    Say I want to identify every birth-related diagnosis in Manhattan. I first create a logical vector that returns a series of TRUES and FALSES for my search criteria, then I index my data frame by that logical vector. In this case I am also restricting the columns or variables I want returned:

    myObs<-hosp$county==59 & hosp$pdx=="V3000 " #note space
    myVars<-c("age", "sex", "disp")
    myFile<-hosp[myObs,myVars]
    head(myFile)
    

    The second, and perhaps more computationally elegant, approach is to use a function like "grep". Say you're interested in identifying all substance abuse diagnoses, e.g. alcohol abuse (291, 303, 305 and sub-codes), opioids, cannabis, amphetamines, hallucinogenics, and cocaine (304 and related sub-codes), or non-specific substance abuse-related diagnoses (292). In SAS you would write out a long if-then statement (or a more efficient array) of some kind:

    #/*********************** SUBSTANCE ABUSE *****************/
    #if pdx in /* use ICD9 codes to create diagnoses */ (’2910’,’2911’,’2912’,’2913’,’2914’,’2915’,
    #   ’29181’,’29189’, ’2919’,’2920’,’29211’,’29212’,’2922’,’29281’,’29282’,’29283’, #........etc....,’30592’,’30593’)
    #Then subst_ab=1; 
    #Else subst_ab=0;
    

    In R, you can instead write:

    substance<-grep("^291[0-9,0-9]|^292[0-9,0-9]|^303[0-9,0-9]|^304[0-9,0-9]^305[0-9,0-9]", hosp$pdx)
    hosp$pdx[substance]
    hosp$subsAb<-"No"
    hosp$subsAb[substance]<-"Yes"
    hosp$subsAb[1:100]
    
    table(hosp$subsAb)
    plot(table(hosp$subsAb))
    
    library(ggplot2)
    qplot(subsAb, age,data=hosp, alpha = I(1/50))
    

    Tomas Aragon has written a wonderful introduction to R for epidemiologists that goes into these approaches in detail. (http://www.medepi.net/docs/ph251d_fall2012_epir-chap01-04.pdf)

提交回复
热议问题