How to copy specific values from one data column to another while matching other columns in R?

对着背影说爱祢 提交于 2019-12-12 16:13:06

问题


I've searched a number of places (stackoverflow, r-blogger, etc), but haven't quite found a good option for doing this in R. Hopefully someone has some ideas.

I have a set of environmental sampling data. The data includes a variety of fields (visit date, region, location, sample medium, sample component, result, etc.).

Here's a subset of the pertinent fields. This is where I start...

visit_date   region    location     media      component     result
1990-08-20   LAKE      555723       water       Mg            *Nondetect
1999-07-01   HILL      432422       water       Ca            3.2
2010-09-12   LAKE      555723       water       pH            6.8
2010-09-12   LAKE      555723       water       Mg            2.1
2010-09-12   HILL      432423       water       pH            7.2
2010-09-12   HILL      432423       water       N             0.8
2010-09-12   HILL      432423       water       NH4          112

What I hope to reach is a table/dataframe like this:

visit_date   region    location     media      component     result        pH
1990-08-20   LAKE      555723       water       Mg            *Nondetect  *Not recorded
1999-07-01   HILL      432422       water       Ca            3.2         *Not recorded
2010-09-12   LAKE      555723       water       pH            6.8         6.8
2010-09-12   LAKE      555723       water       Mg            2.1         6.8
2010-09-12   HILL      432423       water       pH            7.2         7.2
2010-09-12   HILL      432423       water       N             0.8         7.2
2010-09-12   HILL      432423       water       NH4          112          7.2

I attempted to use the method here -- R finding rows of a data frame where certain columns match those of another -- but unfortunately didn't get to the result I wanted. Instead the pH column was either my pre-populated value -999 or NA and not the pH value for that particular visit date if it was collected. Since the result data set is around 500k records, I'm using unique(tResult$pH) to determine the values of the pH column.

Here's that attempt. res is the original result data.frame and component would be the pH result subset (the pH sample results from the main results table).

keys <- c("region", "location", "visit_date", "media")

tResults <- data.table(res, key=keys)
tComponent <- data.table(component, key=keys)

tResults[tComponent, pH>0]

I've attempted using match, merge, and within on the original data frame without success. Since then I've generated a subset for the components (pH in this example) where I copied over the results column to a new "pH" column, thinking I could match the keys and update a new "pH" column in the main result set.

Since not all result values are numeric (with values like *Not recorded) I attempted to use numerics like -888 or other values which could substitute so I could force at least the result and pH columns to be numeric. Aside from the dates which are POSIXct values, the remaining columns are character columns. Original dataframe was created using StringsAsFactors=FALSE.

Once I can do this, I'll be able to generate similar columns for other components that can be used to populate and calculate other values for a given sample. At least that's my goal.

So I'm stumped on this one. In my mind it should be easy but I'm certainly NOT seeing it!

Your help and ideas are certainly welcome and appreciated!


回答1:


#df1 is your first data set and is dataframe
df1$phtem<-with(df1,ifelse(component=="pH",result,NA))

library(data.table)
library(zoo) # locf function

setDT(df1)[,pH:=na.locf(phtem,na.rm = FALSE)]
    visit_date region location media component     result phtem  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect    NA  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2    NA  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8   6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1    NA 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2   7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8    NA 7.2
7: 2010-09-12   HILL   432423 water       NH4        112    NA 7.2

# you can delete phtem if you don't need.

Edit:

library(data.table)
setDT(df1)[,pH:=result[component=="pH"],by="region,location,visit_date,media"]
df1

   visit_date region location media component     result  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8 7.2
7: 2010-09-12   HILL   432423 water       NH4        112 7.2


来源:https://stackoverflow.com/questions/29000289/how-to-copy-specific-values-from-one-data-column-to-another-while-matching-other

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!