问题
I am trying to convert long format wind data into wide format. Both wind speed and wind direction are listed within the Parameter.Name column. These values need to be cast by both Local.Site.Name, and Date.Local variables.
If there are multiple observations per unique Local.Site.Name + Date.Local row, then I want the mean value of those observations. The built-in argument 'fun.aggregate = mean' works just fine for wind speed, but mean wind direction cannot be computed this way because the values are in degrees. For example, the average of two wind directions near North (350, 10) would output as South (180). For example: ((350 + 10)/2 = 180), despite the polar average being 360 or 0.
The 'circular' package will allow us to compute the mean wind direction without having to perform any trigonometry, but I am having trouble trying to nest this additional function within the 'fun.aggregate' argument. I thought a simple else if statement would do the trick, but I am running into the following error:
Error in vaggregate(.value = value, .group = overall, .fun = fun.aggregate, : could not find function ".fun"
In addition: Warning messages:
1: In if (wind$Parameter.Name == "Wind Direction - Resultant") { :
the condition has length > 1 and only the first element will be used
2: In if (wind$Parameter.Name == "Wind Speed - Resultant") { :
the condition has length > 1 and only the first element will be used
3: In mean.default(wind$"Wind Speed - Resultant") :
argument is not numeric or logical: returning NA
The goal is to be able to use the fun.aggregate = mean
for Wind Speed, but the mean(circular(Wind Direction, units = 'degrees')
for Wind Direction.
Here's the original data (>100MB): https://drive.google.com/open?id=0By6o_bZ8CGwuUUhGdk9ONTgtT0E
Here's a subset of the data (1st 100 rows): https://drive.google.com/open?id=0By6o_bZ8CGwucVZGT0pBQlFzT2M
Here's my script:
library(reshape2)
library(dplyr)
library(circular)
#read in the long format data:
wind <- read.csv("<INSERT_FILE_PATH_HERE>", header = TRUE)
#cast into wide format:
wind.w <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = (
if (wind$Parameter.Name == "Wind Direction - Resultant") {
mean(circular(wind$"Wind Direction - Resultant", units = 'degrees'))
}
else if (wind$Parameter.Name == "Wind Speed - Resultant") {
mean(wind$"Wind Speed - Resultant")
}),
na.rm = TRUE)
Any help would be greatly appreciated!
-spacedSparking
EDIT: HERE'S THE SOLUTION:
library(reshape2)
library(SDMTools)
library(dplyr)
#read in the EPA wind data:
#This data is publicly accessible, and can be found here: https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html
wind <- read.csv("daily_WIND_2016.csv", sep = ',', header = TRUE, stringsAsFactors = FALSE)
#convert long format wind speed data by date and site id:
wind_speed <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
mean(x, na.rm=TRUE)
},
subset = .(Parameter.Name == "Wind Speed - Resultant")
)
#convert long format wind direction data into wide format by date and local site id:
wind_direction <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
if(length(x) > 0)
circular.averaging(x, deg = TRUE)
else
-1
},
subset= .(Parameter.Name == "Wind Direction - Resultant")
)
#join the wide format split wind_speed and wind_direction dataframes
wind.w <- merge(wind_speed, wind_direction)
回答1:
you can use subset in dcast to apply the two functions and get seperate dataframes then merge them
library(reshape2)
library(dplyr)
library(circular)
#cast into wide format:
wind_speed <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
mean(x, na.rm=TRUE)
},
subset=.(Parameter.Name == "Wind Speed - Resultant")
)
wind_direction <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
if(length(x) > 0)
mean(circular(c(x), units="degrees"), na.rm=TRUE)
else
-1
},
subset=.(Parameter.Name == "Wind Direction - Resultant")
)
wind.w <- merge(wind_speed, wind_direction)
回答2:
You're using wind.w
inside of the code which defines wind.w
- that's not going to work!
You're also using the angled quote marks (`) instead of the straight quote marks ('). The straight quote marks should be used to delineate a string.
回答3:
Alright thanks to all of your help I managed to solve this pesky wind direction problem. Sometimes solving problems is just a matter of knowing the right questions to ask. In my case, learning the term 'vector-averaging' was all I needed! There is a built-in vector-averaging function called circular.averaging()
from the SDMTools
package that averages wind direction and produces an output that is still between 0-359 degrees! What I ended up doing was appending tjjjohnson's script. I changed the fun.aggregate
argument from mean(circular(c(x), units = "degrees"), na.rm = TRUE)
to circular.averaging(x, deg = TRUE)
Here are histograms of the raw and aggregated data! Everything is looking good, thanks everyone!
来源:https://stackoverflow.com/questions/42147914/reshaping-epa-wind-speed-direction-data-with-dcast-in-r