Pandas groupby to to_csv

前端 未结 5 536
隐瞒了意图╮
隐瞒了意图╮ 2020-12-10 03:42

Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.

Python 3.6.1, Pandas 0.20.1

groupby result

相关标签:
5条回答
  • 2020-12-10 03:55

    Try changing your second line to week_grouped = week_grouped.sum() and re-running all three lines.

    If you run week_grouped.sum() in its own Jupyter notebook cell, you'll see how the statement returns the output to the cell's output, instead of assigning the result back to week_grouped. Some pandas methods have an inplace=True argument (e.g., df.sort_values(by=col_name, inplace=True)), but sum does not.

    EDIT: does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby:

    df = pd.read_csv('input.csv')
    df[['id', 'count']].to_csv('output.csv')
    
    0 讨论(0)
  • 2020-12-10 04:05

    Try doing this:

    week_grouped = df.groupby('week')
    week_grouped.sum().reset_index().to_csv('week_grouped.csv')
    

    That'll write the entire dataframe to the file. If you only want those two columns then,

    week_grouped = df.groupby('week')
    week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
    

    Here's a line by line explanation of the original code:

    # This creates a "groupby" object (not a dataframe object) 
    # and you store it in the week_grouped variable.
    week_grouped = df.groupby('week')
    
    # This instructs pandas to sum up all the numeric type columns in each 
    # group. This returns a dataframe where each row is the sum of the 
    # group's numeric columns. You're not storing this dataframe in your 
    # example.
    week_grouped.sum() 
    
    # Here you're calling the to_csv method on a groupby object... but
    # that object type doesn't have that method. Dataframes have that method. 
    # So we should store the previous line's result (a dataframe) into a variable 
    # and then call its to_csv method.
    week_grouped.to_csv('week_grouped.csv')
    
    # Like this:
    summed_weeks = week_grouped.sum()
    summed_weeks.to_csv('...')
    
    # Or with less typing simply
    week_grouped.sum().to_csv('...')
    
    0 讨论(0)
  • 2020-12-10 04:08

    Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.

    In your example week_grouped = df.groupby('week') is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:

    for k, gr in week_grouped:
        # do your stuff instead of print
        print(k)
        print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
        print(gr)
        # You can save each 'gr' in a csv as follows
        gr.to_csv('{}.csv'.format(k))
    

    Or alternatively you can compute aggregation function on your grouped object

    result = week_grouped.sum()
    # This will be already one row per key and its aggregation result
    result.to_csv('result.csv') 
    

    In your example you need to assign the function result to some variable as by default pandas objects are immutable.

    some_variable = week_grouped.sum() 
    some_variable.to_csv('week_grouped.csv') # This will work
    

    basically result.csv and week_grouped.csv are meant to be same

    0 讨论(0)
  • 2020-12-10 04:08

    I feel that there is no need to use a groupby, you can just drop the columns you do not want too.

    df = df.drop(['month','year'], axis=1)
    df.reset_index()
    df.to_csv('Your path')
    
    0 讨论(0)
  • 2020-12-10 04:11

    Pandas groupby generates a lot of information (count, mean, std, ...). If you want to save all of them in a csv file, first you need to convert it to a regular Dataframe:

    import pandas as pd
    ...
    ...
    MyGroupDataFrame = MyDataFrame.groupby('id')
    pd.DataFrame(MyGroupDataFrame.describe()).to_csv("myTSVFile.tsv", sep='\t', encoding='utf-8')
    
    0 讨论(0)
提交回复
热议问题