问题
I've searched a number of places (stackoverflow, r-blogger, etc), but haven't quite found a good option for doing this in R. Hopefully someone has some ideas.
I have a set of environmental sampling data. The data includes a variety of fields (visit date, region, location, sample medium, sample component, result, etc.).
Here's a subset of the pertinent fields. This is where I start...
visit_date region location media component result
1990-08-20 LAKE 555723 water Mg *Nondetect
1999-07-01 HILL 432422 water Ca 3.2
2010-09-12 LAKE 555723 water pH 6.8
2010-09-12 LAKE 555723 water Mg 2.1
2010-09-12 HILL 432423 water pH 7.2
2010-09-12 HILL 432423 water N 0.8
2010-09-12 HILL 432423 water NH4 112
What I hope to reach is a table/dataframe like this:
visit_date region location media component result pH
1990-08-20 LAKE 555723 water Mg *Nondetect *Not recorded
1999-07-01 HILL 432422 water Ca 3.2 *Not recorded
2010-09-12 LAKE 555723 water pH 6.8 6.8
2010-09-12 LAKE 555723 water Mg 2.1 6.8
2010-09-12 HILL 432423 water pH 7.2 7.2
2010-09-12 HILL 432423 water N 0.8 7.2
2010-09-12 HILL 432423 water NH4 112 7.2
I attempted to use the method here --
R finding rows of a data frame where certain columns match those of another -- but unfortunately didn't get to the result I wanted. Instead the pH column was either my pre-populated value -999
or NA
and not the pH value for that particular visit date if it was collected. Since the result data set is around 500k records, I'm using unique(tResult$pH)
to determine the values of the pH column.
Here's that attempt. res
is the original result data.frame and component
would be the pH result subset (the pH sample results from the main results table).
keys <- c("region", "location", "visit_date", "media")
tResults <- data.table(res, key=keys)
tComponent <- data.table(component, key=keys)
tResults[tComponent, pH>0]
I've attempted using match
, merge
, and within
on the original data frame without success. Since then I've generated a subset for the components (pH in this example) where I copied over the results column to a new "pH" column, thinking I could match the keys and update a new "pH" column in the main result set.
Since not all result values are numeric (with values like *Not recorded
) I attempted to use numerics like -888
or other values which could substitute so I could force at least the result and pH columns to be numeric. Aside from the dates which are POSIXct
values, the remaining columns are character
columns. Original dataframe was created using StringsAsFactors=FALSE
.
Once I can do this, I'll be able to generate similar columns for other components that can be used to populate and calculate other values for a given sample. At least that's my goal.
So I'm stumped on this one. In my mind it should be easy but I'm certainly NOT seeing it!
Your help and ideas are certainly welcome and appreciated!
回答1:
#df1 is your first data set and is dataframe
df1$phtem<-with(df1,ifelse(component=="pH",result,NA))
library(data.table)
library(zoo) # locf function
setDT(df1)[,pH:=na.locf(phtem,na.rm = FALSE)]
visit_date region location media component result phtem pH
1: 1990-08-20 LAKE 555723 water Mg *Nondetect NA NA
2: 1999-07-01 HILL 432422 water Ca 3.2 NA NA
3: 2010-09-12 LAKE 555723 water pH 6.8 6.8 6.8
4: 2010-09-12 LAKE 555723 water Mg 2.1 NA 6.8
5: 2010-09-12 HILL 432423 water pH 7.2 7.2 7.2
6: 2010-09-12 HILL 432423 water N 0.8 NA 7.2
7: 2010-09-12 HILL 432423 water NH4 112 NA 7.2
# you can delete phtem if you don't need.
Edit:
library(data.table)
setDT(df1)[,pH:=result[component=="pH"],by="region,location,visit_date,media"]
df1
visit_date region location media component result pH
1: 1990-08-20 LAKE 555723 water Mg *Nondetect NA
2: 1999-07-01 HILL 432422 water Ca 3.2 NA
3: 2010-09-12 LAKE 555723 water pH 6.8 6.8
4: 2010-09-12 LAKE 555723 water Mg 2.1 6.8
5: 2010-09-12 HILL 432423 water pH 7.2 7.2
6: 2010-09-12 HILL 432423 water N 0.8 7.2
7: 2010-09-12 HILL 432423 water NH4 112 7.2
来源:https://stackoverflow.com/questions/29000289/how-to-copy-specific-values-from-one-data-column-to-another-while-matching-other