Simple cross-tabulation in pandas

后端 未结 2 434
忘掉有多难
忘掉有多难 2021-02-04 00:01

I stumbled across pandas and it looks ideal for simple calculations that I\'d like to do. I have a SAS background and was thinking it\'d replace proc freq -- it looks like it\'l

2条回答
  •  滥情空心
    2021-02-04 00:44

    Assuming that you have a file called 2010.csv with contents

    category,value
    AB,100.00
    AB,200.00
    AC,150.00
    AD,500.00
    

    Then, using the ability to apply multiple aggregation functions following a groupby, you can say:

    import pandas
    data_2010 = pandas.read_csv("/path/to/2010.csv")
    data_2010.groupby("category").agg([len, sum])
    

    You should get a result that looks something like

              value     
                len  sum
    category            
    AB            2  300
    AC            1  150
    AD            1  500
    

    Note that Wes will likely come by to point out that sum is optimized and that you should probably use np.sum.

提交回复
热议问题