Pandas groupby to to_csv

前端 未结 5 536
隐瞒了意图╮ 2020-12-10 03:42

Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.

Python 3.6.1, Pandas 0.20.1

groupby result

  • 2020-12-10 03:55

    Try changing your second line to week_grouped = week_grouped.sum() and re-running all three lines.

    If you run week_grouped.sum() in its own Jupyter notebook cell, you'll see how the statement returns the output to the cell's output, instead of assigning the result back to week_grouped. Some pandas methods have an inplace=True argument (e.g., df.sort_values(by=col_name, inplace=True)), but sum does not.

    EDIT: does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby:

    df = pd.read_csv('input.csv')
    df[['id', 'count']].to_csv('output.csv')
    0 讨论(0)
  • 2020-12-10 04:05

    Try doing this:

    week_grouped = df.groupby('week')

    That'll write the entire dataframe to the file. If you only want those two columns then,

    week_grouped = df.groupby('week')
    week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')

    Here's a line by line explanation of the original code:

    # This creates a "groupby" object (not a dataframe object) 
    # and you store it in the week_grouped variable.
    week_grouped = df.groupby('week')
    # This instructs pandas to sum up all the numeric type columns in each 
    # group. This returns a dataframe where each row is the sum of the 
    # group's numeric columns. You're not storing this dataframe in your 
    # example.
    # Here you're calling the to_csv method on a groupby object... but
    # that object type doesn't have that method. Dataframes have that method. 
    # So we should store the previous line's result (a dataframe) into a variable 
    # and then call its to_csv method.
    # Like this:
    summed_weeks = week_grouped.sum()
    # Or with less typing simply
    0 讨论(0)
  • 2020-12-10 04:08

    Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.

    In your example week_grouped = df.groupby('week') is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:

    for k, gr in week_grouped:
        # do your stuff instead of print
        print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
        # You can save each 'gr' in a csv as follows

    Or alternatively you can compute aggregation function on your grouped object

    result = week_grouped.sum()
    # This will be already one row per key and its aggregation result

    In your example you need to assign the function result to some variable as by default pandas objects are immutable.

    some_variable = week_grouped.sum() 
    some_variable.to_csv('week_grouped.csv') # This will work

    basically result.csv and week_grouped.csv are meant to be same

    0 讨论(0)
  • 2020-12-10 04:08

    I feel that there is no need to use a groupby, you can just drop the columns you do not want too.

    df = df.drop(['month','year'], axis=1)
    df.to_csv('Your path')
    0 讨论(0)
  • 2020-12-10 04:11

    Pandas groupby generates a lot of information (count, mean, std, ...). If you want to save all of them in a csv file, first you need to convert it to a regular Dataframe:

    import pandas as pd
    MyGroupDataFrame = MyDataFrame.groupby('id')
    pd.DataFrame(MyGroupDataFrame.describe()).to_csv("myTSVFile.tsv", sep='\t', encoding='utf-8')
    0 讨论(0)