问题
I'm trying to calculate sums and means on a very large dataset (~22000 records) for several parameters (e.g. Er_Count, Mn_Count) by month, year , Survey ID and Grid ID. I tried this code initially to get overall sums:
dlply(Effort_All,c("Er_Count","Mn_Count","Bp_Count"),sum)
And received the following error: Error: only defined on a data frame with all numeric variables Since I cannot even get overall sums, I am unable to get statistics by the specific variables either. Do I need to split the data in some manner?
I have included a sample dataset of 25 records below.
structure(list(Grid_ID = structure(c(527L, 92L, 331L, 395L, 934L,
93L), .Label = c("1", "1,000", "1,001", "1,002", "1,003", "1,004",
"1,005", "1,006", "1,007", "1,008", "1,009", "1,010", "1,011",
"1,012", "1,013", "1,014", "1,015", "1,016", "1,017", "1,018",
"1,019", "1,020", "1,021", "1,022", "1,023", "1,024", "1,025",
"1,026", "1,027", "1,028", "1,029", "1,030", "1,031", "1,032",
"1,033", "1,034", "1,035", "1,036", "1,037", "1,038", "1,039",
"1,040", "1,041", "1,042", "1,043", "1,044", "1,045", "1,046",
"1,047", "1,048", "1,049", "1,050", "1,051", "1,052", "1,053",
"1,054", "1,055", "1,056", "1,057", "1,058", "1,059", "1,060",
"1,061", "10", "100", "101", "102", "103", "104", "105", "106",
"107", "108", "109", "11", "110", "111", "112", "113", "114",
"115", "116", "117", "118", "119", "12", "120", "121", "122",
"123", "124", "125", "126", "127", "128", "129", "13", "130",
"131", "132", "133", "134", "135", "136", "137", "138", "139",
"14", "140", "141", "142", "143", "144", "145", "146", "147",
"148", "149", "15", "150", "151", "152", "153", "154", "155",
"156", "157", "158", "159", "16", "160", "161", "162", "163",
"164", "165", "166", "167", "168", "169", "17", "170", "171",
"172", "173", "174", "175", "176", "177", "178", "179", "18",
"180", "181", "182", "183", "184", "185", "186", "187", "188",
"189", "19", "190", "191", "192", "193", "194", "195", "196",
"197", "198", "199", "2", "20", "200", "201", "202", "203", "204",
"205", "206", "207", "208", "209", "21", "210", "211", "212",
"213", "214", "215", "216", "217", "218", "219", "22", "220",
"221", "222", "223", "224", "225", "226", "227", "228", "229",
"23", "230", "231", "232", "233", "234", "235", "236", "237",
"238", "239", "24", "240", "241", "242", "243", "244", "245",
"246", "247", "248", "249", "25", "250", "251", "252", "253",
"254", "255", "256", "257", "258", "259", "26", "260", "261",
"262", "263", "264", "265", "266", "267", "268", "269", "27",
"270", "271", "272", "273", "274", "275", "276", "277", "278",
"279", "28", "280", "281", "282", "283", "284", "285", "286",
"287", "288", "289", "29", "290", "291", "292", "293", "294",
"295", "296", "297", "298", "299", "3", "30", "300", "301", "302",
"303", "304", "305", "306", "307", "308", "309", "31", "310",
"311", "312", "313", "314", "315", "316", "317", "318", "319",
"32", "320", "321", "322", "323", "324", "325", "326", "327",
"328", "329", "33", "330", "331", "332", "333", "334", "335",
"336", "337", "338", "339", "34", "340", "341", "342", "343",
"344", "345", "346", "347", "348", "349", "35", "350", "351",
"352", "353", "354", "355", "356", "357", "358", "359", "36",
"360", "361", "362", "363", "364", "365", "366", "367", "368",
"369", "37", "370", "371", "372", "373", "374", "375", "376",
"377", "378", "379", "38", "380", "381", "382", "383", "384",
"385", "386", "387", "388", "389", "39", "390", "391", "392",
"393", "394", "395", "396", "397", "398", "399", "4", "40", "400",
"401", "402", "403", "404", "405", "406", "407", "408", "409",
"41", "410", "411", "412", "413", "414", "415", "416", "417",
"418", "419", "42", "420", "421", "422", "423", "424", "425",
"426", "427", "428", "429", "43", "430", "431", "432", "433",
"434", "435", "436", "437", "438", "439", "44", "440", "441",
"442", "443", "444", "445", "446", "447", "448", "449", "45",
"450", "451", "452", "453", "454", "455", "456", "457", "458",
"459", "46", "460", "461", "462", "463", "464", "465", "466",
"467", "468", "469", "47", "470", "471", "472", "473", "474",
"475", "476", "477", "478", "479", "48", "480", "481", "482",
"483", "484", "485", "486", "487", "488", "489", "49", "490",
"491", "492", "493", "494", "495", "496", "497", "498", "499",
"5", "50", "500", "501", "502", "503", "504", "505", "506", "507",
"508", "509", "51", "510", "511", "512", "513", "514", "515",
"516", "517", "518", "519", "52", "520", "521", "522", "523",
"524", "525", "526", "527", "528", "529", "53", "530", "531",
"532", "533", "534", "535", "536", "537", "538", "539", "54",
"540", "541", "542", "543", "544", "545", "546", "547", "548",
"549", "55", "550", "551", "552", "553", "554", "555", "556",
"557", "558", "559", "56", "560", "561", "562", "563", "564",
"565", "566", "567", "568", "569", "57", "570", "571", "572",
"573", "574", "575", "576", "577", "578", "579", "58", "580",
"581", "582", "583", "584", "585", "586", "587", "588", "589",
"59", "590", "591", "592", "593", "594", "595", "596", "597",
"598", "599", "6", "60", "600", "601", "602", "603", "604", "605",
"606", "607", "608", "609", "61", "610", "611", "612", "613",
"614", "615", "616", "617", "618", "619", "62", "620", "621",
"622", "623", "624", "625", "626", "627", "628", "629", "63",
"630", "631", "632", "633", "634", "635", "636", "637", "638",
"639", "64", "640", "641", "642", "643", "644", "645", "646",
"647", "648", "649", "65", "650", "651", "652", "653", "654",
"655", "656", "657", "658", "659", "66", "660", "661", "662",
"663", "664", "665", "666", "667", "668", "669", "67", "670",
"671", "672", "673", "674", "675", "676", "677", "678", "679",
"68", "680", "681", "682", "683", "684", "685", "686", "687",
"688", "689", "69", "690", "691", "692", "693", "694", "695",
"696", "697", "698", "699", "7", "70", "700", "701", "702", "703",
"704", "705", "706", "707", "708", "709", "71", "710", "711",
"712", "713", "714", "715", "716", "717", "718", "719", "72",
"720", "721", "722", "723", "724", "725", "726", "727", "728",
"729", "73", "730", "731", "732", "733", "734", "735", "736",
"737", "738", "739", "74", "740", "741", "742", "743", "744",
"745", "746", "747", "748", "749", "75", "750", "751", "752",
"753", "754", "755", "756", "757", "758", "759", "76", "760",
"761", "762", "763", "764", "765", "766", "767", "768", "769",
"77", "770", "771", "772", "773", "774", "775", "776", "777",
"778", "779", "78", "780", "781", "782", "783", "784", "785",
"786", "787", "788", "789", "79", "790", "791", "792", "793",
"794", "795", "796", "797", "798", "799", "8", "80", "800", "801",
"802", "803", "804", "805", "806", "807", "808", "809", "81",
"810", "811", "812", "813", "814", "815", "816", "817", "818",
"819", "82", "820", "821", "822", "823", "824", "825", "826",
"827", "828", "829", "83", "830", "831", "832", "833", "834",
"835", "836", "837", "838", "839", "84", "840", "841", "842",
"843", "844", "845", "846", "847", "848", "849", "85", "850",
"851", "852", "853", "854", "855", "856", "857", "858", "859",
"86", "860", "861", "862", "863", "864", "865", "866", "867",
"868", "869", "87", "870", "871", "872", "873", "874", "875",
"876", "877", "878", "879", "88", "880", "881", "882", "883",
"884", "885", "886", "887", "888", "889", "89", "890", "891",
"892", "893", "894", "895", "896", "897", "898", "899", "9",
"90", "900", "901", "902", "903", "904", "905", "906", "907",
"908", "909", "91", "910", "911", "912", "913", "914", "915",
"916", "917", "918", "919", "92", "920", "921", "922", "923",
"924", "925", "926", "927", "928", "929", "93", "930", "931",
"932", "933", "934", "935", "936", "937", "938", "939", "94",
"940", "941", "942", "943", "944", "945", "946", "947", "948",
"949", "95", "950", "951", "952", "953", "954", "955", "956",
"957", "958", "959", "96", "960", "961", "962", "963", "964",
"965", "966", "967", "968", "969", "97", "970", "971", "972",
"973", "974", "975", "976", "977", "978", "979", "98", "980",
"981", "982", "983", "984", "985", "986", "987", "988", "989",
"99", "990", "991", "992", "993", "994", "995", "996", "997",
"998", "999"), class = "factor"), ER_Groups = c(2, 2, 2, 3, 5,
6), Er_Count = c(60, 75, 14, 12, 8, 26), Mn_Count = c(30, 9, 6, 33,
7, 12), Bp_Groups = c(1, 2, 1, 1, 0, 1), Bp_Count = c(3, 3, 2,
5, 0, 6), Mn_Groups = c(1, 1, 3, 1, 0, 0), Month = c(10L, 6L,
12L, 4L, 2L, 4L), Year = c(2000L, 2001L, 2009L, 2004L, 2002L,
2001L), SurveyID = structure(c(16L, 24L, 93L, 56L, 34L, 22L), .Label = c("199708HS",
"199808HS", "199908HS", "199909SSLQ", "199910SSL", "199911SSL",
"200001SSLQ", "200002SSL", "200003SSLQ", "200004SSLQ", "200005SSL",
"200006SSL", "200007SSL", "200008HS", "200008SSL", "200009SSL",
"200010SSL", "200011SSL", "200101SSL", "200102SSL", "200103SSL",
"200104SSL", "200105SSL", "200106SSL", "200107SSL", "200108HS",
"200108SSL", "200109SSL", "200110SSL", "200111SSL", "200112SSL",
"200201SSL", "200202SSL", "200203SSL", "200204SSL", "200205SSL",
"200206SSL", "200207SSL", "200208HS", "200208SSL", "200210SSL",
"200211SSL", "200212SSL", "200301SSL", "200302SSL", "200303SSL",
"200304SSL", "200305SSL", "200306SSL", "200307SSL", "200309SSL",
"200310SSL", "200311SSL", "200312SSL", "200403SSL", "200404SSL",
"200405SSL", "200406SSL", "200407SSL", "200408HS", "200408SSL",
"200409SSL", "200505SSL", "200506SSL", "200507SSL", "200510SSL",
"200512SSL", "200603SSL", "200609SSL", "200612SSL", "200709GAP07",
"200710GAP07", "200712GAP07", "200802GAP07", "200803GAP07", "200804GAP07",
"200805GAP07", "200806GAP07", "200807GAP07", "200808GAP07", "200809GAP08",
"200810GAP08", "200812GAP08", "200901GAP08", "200903GAP08", "200904GAP08",
"200905GAP08", "200906GAP08", "200907GAP08", "200908GAP08", "200909GAP08",
"200910GAP09", "200912GAP09", "201001GAP09", "201002GAP09", "201003GAP09",
"201004GAP09", "201005GAP09", "201006GAP09", "201007GAP09", "201008GAP09",
"201009GAP09", "201010GAP09", "201011GAP09", "201101GAP09", "201102GAP09",
"201103GAP09", "201104GAP09", "201106GAP09", "201108GAP09", "201109GAP09",
"201111GAP09", "201201GAP09", "201203GAP09", "201205GAP09", "201207GAP09",
"201208GAP09", "201211GAP09", "201301GAP09", "201303GAP09", "201305GAP09",
"201307GAP09", "201309GAP09", "201311GAP09"), class = "factor"),
Er_Group_Density = c(4, 9, 12, 4, 1, 0), Mn_Group_Density = c(3,
1, 1, 1, 0, 2), Bp_Group_Density = c(1, 2, 1, 0, 1, 0), Er_Count_Density = c(50,
14, 12, 9, 6, 4), Mn_Count_Density = c(9, 5, 2, 3, 2, 0), Bp_Count_Density = c(2,
3, 0, 4, 1, 0)), .Names = c("Grid_ID", "ER_Groups", "Er_Count",
"Mn_Count", "Bp_Groups", "Bp_Count", "Mn_Groups", "Month", "Year",
"SurveyID", "Er_Group_Density", "Mn_Group_Density", "Bp_Group_Density",
"Er_Count_Density", "Mn_Count_Density", "Bp_Count_Density"), row.names = c(2770L,
4421L, 17348L, 11263L, 6736L, 3974L), class = "data.frame")
回答1:
There are a number of ways to get statistics by group. I'll assume you have a bias for plyr, since your example uses it.
Remember that dlply() splits the data into smaller dataframes by the grouping variables, then it applies the requested function to each of the smaller dataframes. Therefore the function you pass should operate on a whole dataframe. sum() does not do this. You can write your own function, though.
Based on your description, what you want is something like this
myfun <- function(x) colSums(x[, c("Er_Count", "Mn_Count", "Bp_Count")])
dlply(Effort_All, c("Month", "Year", "Grid_ID", "SurveyID"), myfun)
Remember that the second argument to dlply() is the set of variables used for grouping. Not sure why you want the output as a list. Would it be easier to read if you used ddply (with the same arguments)?
Other approaches include using sqldf() or something like lapply().
=============== EDIT: Other approaches =============
sqldf is always very easy to read and understand:
output <- sqldf('select Month,Year,Grid_ID,SurveyID,
sum(Er_Count) as ercount,
sum(Mn_Count) as mncount,
sum(Bp_Count) as bpcount
from Effort_All
group by Month, Year, Grid_ID, SurveyID')
lapply works pretty much the same way as dlply. Just different arguments.
回答2:
Also, you could use colwise
from plyr
dlply(Effort_All, .(Month, Year, Grid_ID, SurveyID), colwise(sum, .(Er_Count, Mn_Count, Bp_Count)))
Or summarise_each
from dplyr
library(dplyr)
Effort_All%>%
group_by(Month, Year, Grid_ID, SurveyID) %>%
summarise_each(funs(sum), Er_Count, Mn_Count, Bp_Count)
#Source: local data frame [6 x 7]
#Groups: Month, Year, Grid_ID
# Month Year Grid_ID SurveyID Er_Count Mn_Count Bp_Count
# 1 2 2002 884 200203SSL 8 7 0
# 2 4 2001 126 200104SSL 26 12 6
# 3 4 2004 399 200404SSL 12 33 5
# 4 6 2001 125 200106SSL 75 9 3
# 5 10 2000 517 200009SSL 60 30 3
# 6 12 2009 340 200912GAP09 14 6 2
来源:https://stackoverflow.com/questions/25089665/error-only-defined-on-a-data-frame-with-all-numeric-variables-with-ddply-on-lar