Error: only defined on a data frame with all numeric variables with ddply on large dataset

冷暖自知 提交于 2019-12-25 03:53:56

问题


I'm trying to calculate sums and means on a very large dataset (~22000 records) for several parameters (e.g. Er_Count, Mn_Count) by month, year , Survey ID and Grid ID. I tried this code initially to get overall sums:

dlply(Effort_All,c("Er_Count","Mn_Count","Bp_Count"),sum)

And received the following error: Error: only defined on a data frame with all numeric variables Since I cannot even get overall sums, I am unable to get statistics by the specific variables either. Do I need to split the data in some manner?

I have included a sample dataset of 25 records below.

    structure(list(Grid_ID = structure(c(527L, 92L, 331L, 395L, 934L, 
    93L), .Label = c("1", "1,000", "1,001", "1,002", "1,003", "1,004", 
"1,005", "1,006", "1,007", "1,008", "1,009", "1,010", "1,011", 
"1,012", "1,013", "1,014", "1,015", "1,016", "1,017", "1,018", 
"1,019", "1,020", "1,021", "1,022", "1,023", "1,024", "1,025", 
"1,026", "1,027", "1,028", "1,029", "1,030", "1,031", "1,032", 
"1,033", "1,034", "1,035", "1,036", "1,037", "1,038", "1,039", 
"1,040", "1,041", "1,042", "1,043", "1,044", "1,045", "1,046", 
"1,047", "1,048", "1,049", "1,050", "1,051", "1,052", "1,053", 
"1,054", "1,055", "1,056", "1,057", "1,058", "1,059", "1,060", 
"1,061", "10", "100", "101", "102", "103", "104", "105", "106", 
"107", "108", "109", "11", "110", "111", "112", "113", "114", 
"115", "116", "117", "118", "119", "12", "120", "121", "122", 
"123", "124", "125", "126", "127", "128", "129", "13", "130", 
"131", "132", "133", "134", "135", "136", "137", "138", "139", 
"14", "140", "141", "142", "143", "144", "145", "146", "147", 
"148", "149", "15", "150", "151", "152", "153", "154", "155", 
"156", "157", "158", "159", "16", "160", "161", "162", "163", 
"164", "165", "166", "167", "168", "169", "17", "170", "171", 
"172", "173", "174", "175", "176", "177", "178", "179", "18", 
"180", "181", "182", "183", "184", "185", "186", "187", "188", 
"189", "19", "190", "191", "192", "193", "194", "195", "196", 
"197", "198", "199", "2", "20", "200", "201", "202", "203", "204", 
"205", "206", "207", "208", "209", "21", "210", "211", "212", 
"213", "214", "215", "216", "217", "218", "219", "22", "220", 
"221", "222", "223", "224", "225", "226", "227", "228", "229", 
"23", "230", "231", "232", "233", "234", "235", "236", "237", 
"238", "239", "24", "240", "241", "242", "243", "244", "245", 
"246", "247", "248", "249", "25", "250", "251", "252", "253", 
"254", "255", "256", "257", "258", "259", "26", "260", "261", 
"262", "263", "264", "265", "266", "267", "268", "269", "27", 
"270", "271", "272", "273", "274", "275", "276", "277", "278", 
"279", "28", "280", "281", "282", "283", "284", "285", "286", 
"287", "288", "289", "29", "290", "291", "292", "293", "294", 
"295", "296", "297", "298", "299", "3", "30", "300", "301", "302", 
"303", "304", "305", "306", "307", "308", "309", "31", "310", 
"311", "312", "313", "314", "315", "316", "317", "318", "319", 
"32", "320", "321", "322", "323", "324", "325", "326", "327", 
"328", "329", "33", "330", "331", "332", "333", "334", "335", 
"336", "337", "338", "339", "34", "340", "341", "342", "343", 
"344", "345", "346", "347", "348", "349", "35", "350", "351", 
"352", "353", "354", "355", "356", "357", "358", "359", "36", 
"360", "361", "362", "363", "364", "365", "366", "367", "368", 
"369", "37", "370", "371", "372", "373", "374", "375", "376", 
"377", "378", "379", "38", "380", "381", "382", "383", "384", 
"385", "386", "387", "388", "389", "39", "390", "391", "392", 
"393", "394", "395", "396", "397", "398", "399", "4", "40", "400", 
"401", "402", "403", "404", "405", "406", "407", "408", "409", 
"41", "410", "411", "412", "413", "414", "415", "416", "417", 
"418", "419", "42", "420", "421", "422", "423", "424", "425", 
"426", "427", "428", "429", "43", "430", "431", "432", "433", 
"434", "435", "436", "437", "438", "439", "44", "440", "441", 
"442", "443", "444", "445", "446", "447", "448", "449", "45", 
"450", "451", "452", "453", "454", "455", "456", "457", "458", 
"459", "46", "460", "461", "462", "463", "464", "465", "466", 
"467", "468", "469", "47", "470", "471", "472", "473", "474", 
"475", "476", "477", "478", "479", "48", "480", "481", "482", 
"483", "484", "485", "486", "487", "488", "489", "49", "490", 
"491", "492", "493", "494", "495", "496", "497", "498", "499", 
"5", "50", "500", "501", "502", "503", "504", "505", "506", "507", 
"508", "509", "51", "510", "511", "512", "513", "514", "515", 
"516", "517", "518", "519", "52", "520", "521", "522", "523", 
"524", "525", "526", "527", "528", "529", "53", "530", "531", 
"532", "533", "534", "535", "536", "537", "538", "539", "54", 
"540", "541", "542", "543", "544", "545", "546", "547", "548", 
"549", "55", "550", "551", "552", "553", "554", "555", "556", 
"557", "558", "559", "56", "560", "561", "562", "563", "564", 
"565", "566", "567", "568", "569", "57", "570", "571", "572", 
"573", "574", "575", "576", "577", "578", "579", "58", "580", 
"581", "582", "583", "584", "585", "586", "587", "588", "589", 
"59", "590", "591", "592", "593", "594", "595", "596", "597", 
"598", "599", "6", "60", "600", "601", "602", "603", "604", "605", 
"606", "607", "608", "609", "61", "610", "611", "612", "613", 
"614", "615", "616", "617", "618", "619", "62", "620", "621", 
"622", "623", "624", "625", "626", "627", "628", "629", "63", 
"630", "631", "632", "633", "634", "635", "636", "637", "638", 
"639", "64", "640", "641", "642", "643", "644", "645", "646", 
"647", "648", "649", "65", "650", "651", "652", "653", "654", 
"655", "656", "657", "658", "659", "66", "660", "661", "662", 
"663", "664", "665", "666", "667", "668", "669", "67", "670", 
"671", "672", "673", "674", "675", "676", "677", "678", "679", 
"68", "680", "681", "682", "683", "684", "685", "686", "687", 
"688", "689", "69", "690", "691", "692", "693", "694", "695", 
"696", "697", "698", "699", "7", "70", "700", "701", "702", "703", 
"704", "705", "706", "707", "708", "709", "71", "710", "711", 
"712", "713", "714", "715", "716", "717", "718", "719", "72", 
"720", "721", "722", "723", "724", "725", "726", "727", "728", 
"729", "73", "730", "731", "732", "733", "734", "735", "736", 
"737", "738", "739", "74", "740", "741", "742", "743", "744", 
"745", "746", "747", "748", "749", "75", "750", "751", "752", 
"753", "754", "755", "756", "757", "758", "759", "76", "760", 
"761", "762", "763", "764", "765", "766", "767", "768", "769", 
"77", "770", "771", "772", "773", "774", "775", "776", "777", 
"778", "779", "78", "780", "781", "782", "783", "784", "785", 
"786", "787", "788", "789", "79", "790", "791", "792", "793", 
"794", "795", "796", "797", "798", "799", "8", "80", "800", "801", 
"802", "803", "804", "805", "806", "807", "808", "809", "81", 
"810", "811", "812", "813", "814", "815", "816", "817", "818", 
"819", "82", "820", "821", "822", "823", "824", "825", "826", 
"827", "828", "829", "83", "830", "831", "832", "833", "834", 
"835", "836", "837", "838", "839", "84", "840", "841", "842", 
"843", "844", "845", "846", "847", "848", "849", "85", "850", 
"851", "852", "853", "854", "855", "856", "857", "858", "859", 
"86", "860", "861", "862", "863", "864", "865", "866", "867", 
"868", "869", "87", "870", "871", "872", "873", "874", "875", 
"876", "877", "878", "879", "88", "880", "881", "882", "883", 
"884", "885", "886", "887", "888", "889", "89", "890", "891", 
"892", "893", "894", "895", "896", "897", "898", "899", "9", 
"90", "900", "901", "902", "903", "904", "905", "906", "907", 
"908", "909", "91", "910", "911", "912", "913", "914", "915", 
"916", "917", "918", "919", "92", "920", "921", "922", "923", 
"924", "925", "926", "927", "928", "929", "93", "930", "931", 
"932", "933", "934", "935", "936", "937", "938", "939", "94", 
"940", "941", "942", "943", "944", "945", "946", "947", "948", 
"949", "95", "950", "951", "952", "953", "954", "955", "956", 
"957", "958", "959", "96", "960", "961", "962", "963", "964", 
"965", "966", "967", "968", "969", "97", "970", "971", "972", 
"973", "974", "975", "976", "977", "978", "979", "98", "980", 
"981", "982", "983", "984", "985", "986", "987", "988", "989", 
"99", "990", "991", "992", "993", "994", "995", "996", "997", 
"998", "999"), class = "factor"), ER_Groups = c(2, 2, 2, 3, 5, 
6), Er_Count = c(60, 75, 14, 12, 8, 26), Mn_Count = c(30, 9, 6, 33, 
7, 12), Bp_Groups = c(1, 2, 1, 1, 0, 1), Bp_Count = c(3, 3, 2, 
5, 0, 6), Mn_Groups = c(1, 1, 3, 1, 0, 0), Month = c(10L, 6L, 
12L, 4L, 2L, 4L), Year = c(2000L, 2001L, 2009L, 2004L, 2002L, 
2001L), SurveyID = structure(c(16L, 24L, 93L, 56L, 34L, 22L), .Label = c("199708HS", 
"199808HS", "199908HS", "199909SSLQ", "199910SSL", "199911SSL", 
"200001SSLQ", "200002SSL", "200003SSLQ", "200004SSLQ", "200005SSL", 
"200006SSL", "200007SSL", "200008HS", "200008SSL", "200009SSL", 
"200010SSL", "200011SSL", "200101SSL", "200102SSL", "200103SSL", 
"200104SSL", "200105SSL", "200106SSL", "200107SSL", "200108HS", 
"200108SSL", "200109SSL", "200110SSL", "200111SSL", "200112SSL", 
"200201SSL", "200202SSL", "200203SSL", "200204SSL", "200205SSL", 
"200206SSL", "200207SSL", "200208HS", "200208SSL", "200210SSL", 
"200211SSL", "200212SSL", "200301SSL", "200302SSL", "200303SSL", 
"200304SSL", "200305SSL", "200306SSL", "200307SSL", "200309SSL", 
"200310SSL", "200311SSL", "200312SSL", "200403SSL", "200404SSL", 
"200405SSL", "200406SSL", "200407SSL", "200408HS", "200408SSL", 
"200409SSL", "200505SSL", "200506SSL", "200507SSL", "200510SSL", 
"200512SSL", "200603SSL", "200609SSL", "200612SSL", "200709GAP07", 
"200710GAP07", "200712GAP07", "200802GAP07", "200803GAP07", "200804GAP07", 
"200805GAP07", "200806GAP07", "200807GAP07", "200808GAP07", "200809GAP08", 
"200810GAP08", "200812GAP08", "200901GAP08", "200903GAP08", "200904GAP08", 
"200905GAP08", "200906GAP08", "200907GAP08", "200908GAP08", "200909GAP08", 
"200910GAP09", "200912GAP09", "201001GAP09", "201002GAP09", "201003GAP09", 
"201004GAP09", "201005GAP09", "201006GAP09", "201007GAP09", "201008GAP09", 
"201009GAP09", "201010GAP09", "201011GAP09", "201101GAP09", "201102GAP09", 
"201103GAP09", "201104GAP09", "201106GAP09", "201108GAP09", "201109GAP09", 
"201111GAP09", "201201GAP09", "201203GAP09", "201205GAP09", "201207GAP09", 
"201208GAP09", "201211GAP09", "201301GAP09", "201303GAP09", "201305GAP09", 
"201307GAP09", "201309GAP09", "201311GAP09"), class = "factor"), 
    Er_Group_Density = c(4, 9, 12, 4, 1, 0), Mn_Group_Density = c(3, 
    1, 1, 1, 0, 2), Bp_Group_Density = c(1, 2, 1, 0, 1, 0), Er_Count_Density = c(50, 
    14, 12, 9, 6, 4), Mn_Count_Density = c(9, 5, 2, 3, 2, 0), Bp_Count_Density = c(2, 
    3, 0, 4, 1, 0)), .Names = c("Grid_ID", "ER_Groups", "Er_Count", 
"Mn_Count", "Bp_Groups", "Bp_Count", "Mn_Groups", "Month", "Year", 
"SurveyID", "Er_Group_Density", "Mn_Group_Density", "Bp_Group_Density", 
"Er_Count_Density", "Mn_Count_Density", "Bp_Count_Density"), row.names = c(2770L, 
4421L, 17348L, 11263L, 6736L, 3974L), class = "data.frame")

回答1:


There are a number of ways to get statistics by group. I'll assume you have a bias for plyr, since your example uses it.

Remember that dlply() splits the data into smaller dataframes by the grouping variables, then it applies the requested function to each of the smaller dataframes. Therefore the function you pass should operate on a whole dataframe. sum() does not do this. You can write your own function, though.

Based on your description, what you want is something like this

myfun <- function(x) colSums(x[, c("Er_Count", "Mn_Count", "Bp_Count")])
dlply(Effort_All, c("Month", "Year", "Grid_ID", "SurveyID"), myfun)

Remember that the second argument to dlply() is the set of variables used for grouping. Not sure why you want the output as a list. Would it be easier to read if you used ddply (with the same arguments)?

Other approaches include using sqldf() or something like lapply().

=============== EDIT: Other approaches =============

sqldf is always very easy to read and understand:

output <- sqldf('select Month,Year,Grid_ID,SurveyID,
                        sum(Er_Count) as ercount, 
                        sum(Mn_Count) as mncount,
                        sum(Bp_Count) as bpcount
                 from Effort_All 
                 group by Month, Year, Grid_ID, SurveyID')

lapply works pretty much the same way as dlply. Just different arguments.




回答2:


Also, you could use colwise from plyr

 dlply(Effort_All, .(Month, Year, Grid_ID, SurveyID), colwise(sum, .(Er_Count, Mn_Count, Bp_Count)))

Or summarise_each from dplyr

library(dplyr)
Effort_All%>%
group_by(Month, Year, Grid_ID, SurveyID) %>% 
summarise_each(funs(sum), Er_Count, Mn_Count, Bp_Count)
#Source: local data frame [6 x 7]
#Groups: Month, Year, Grid_ID

#    Month Year Grid_ID    SurveyID Er_Count Mn_Count Bp_Count
#  1     2 2002     884   200203SSL        8        7        0
#  2     4 2001     126   200104SSL       26       12        6
#  3     4 2004     399   200404SSL       12       33        5
#  4     6 2001     125   200106SSL       75        9        3
#  5    10 2000     517   200009SSL       60       30        3
#  6    12 2009     340 200912GAP09       14        6        2


来源:https://stackoverflow.com/questions/25089665/error-only-defined-on-a-data-frame-with-all-numeric-variables-with-ddply-on-lar

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!