Calculating subtotals in R

后端 未结 7 1728
北荒
北荒 2021-02-15 11:28

I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:

column name: date / mcode / mname / ycode / yname / yissue          


        
7条回答
  •  既然无缘
    2021-02-15 12:01

    OK. Assuming your data are in a data frame named foo:

    > head(foo)
                 date mcode      mname ycode yname   yissue bsent breturn tsent
    417572 2010/07/28 45740 ENDPOINT A  5772  XMAG 20100800     7       0     7
    417573 2010/07/31 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
    417574 2010/08/04 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
    417575 2010/08/14 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
    417576 2010/08/26 45740 ENDPOINT A  5772  XMAG 20100800     0       4     0
    417577 2010/07/28 45741 ENDPOINT L  5772  XMAG 20100800     2       0     2
           treturn csales
    417572       0      0
    417573       0      1
    417574       0      1
    417575       0      1
    417576       0      0
    417577       0      0
    

    Then this will do the aggregation of the numeric columns in your data:

    > aggregate(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data = foo, 
    +           FUN = sum)
      yname bsent breturn tsent treturn csales
    1  XMAG    14       8    14       0      6
    2  YMAG    11       6    11       6      5
    

    That was using the snippet of data you included in your Q. I used the formula interface to aggregate(), which is a bit nicer in this instance because you don't need all the foo$ bits on the variable names you wish the aggregate. If you have missing data (NA)in your full data set, then you'll need add an extra argument na.rm = TRUE which will get passed to sum(), like so:

    > aggregate(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data = foo, 
    +           FUN = sum, na.rm = TRUE)
    

提交回复
热议问题