Split (explode) pandas dataframe string entry to separate rows

后端 未结 22 3551
一向
一向 2020-11-21 05:03

I have a pandas dataframe in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as

22条回答
  •  北荒
    北荒 (楼主)
    2020-11-21 05:23

    Here's a function I wrote for this common task. It's more efficient than the Series/stack methods. Column order and names are retained.

    def tidy_split(df, column, sep='|', keep=False):
        """
        Split the values of a column and expand so the new DataFrame has one split
        value per row. Filters rows where the column is missing.
    
        Params
        ------
        df : pandas.DataFrame
            dataframe with the column to split and expand
        column : str
            the column to split and expand
        sep : str
            the string used to split the column's values
        keep : bool
            whether to retain the presplit value as it's own row
    
        Returns
        -------
        pandas.DataFrame
            Returns a dataframe with the same columns as `df`.
        """
        indexes = list()
        new_values = list()
        df = df.dropna(subset=[column])
        for i, presplit in enumerate(df[column].astype(str)):
            values = presplit.split(sep)
            if keep and len(values) > 1:
                indexes.append(i)
                new_values.append(presplit)
            for value in values:
                indexes.append(i)
                new_values.append(value)
        new_df = df.iloc[indexes, :].copy()
        new_df[column] = new_values
        return new_df
    

    With this function, the original question is as simple as:

    tidy_split(a, 'var1', sep=',')
    

提交回复
热议问题