Combine two columns of text in pandas dataframe

后端 未结 18 1164
-上瘾入骨i
-上瘾入骨i 2020-11-22 01:32

I have a 20 x 4000 dataframe in Python using pandas. Two of these columns are named Year and quarter. I\'d like to create a variable called p

18条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-22 01:59

    The method cat() of the .str accessor works really well for this:

    >>> import pandas as pd
    >>> df = pd.DataFrame([["2014", "q1"], 
    ...                    ["2015", "q3"]],
    ...                   columns=('Year', 'Quarter'))
    >>> print(df)
       Year Quarter
    0  2014      q1
    1  2015      q3
    >>> df['Period'] = df.Year.str.cat(df.Quarter)
    >>> print(df)
       Year Quarter  Period
    0  2014      q1  2014q1
    1  2015      q3  2015q3
    

    cat() even allows you to add a separator so, for example, suppose you only have integers for year and period, you can do this:

    >>> import pandas as pd
    >>> df = pd.DataFrame([[2014, 1],
    ...                    [2015, 3]],
    ...                   columns=('Year', 'Quarter'))
    >>> print(df)
       Year Quarter
    0  2014       1
    1  2015       3
    >>> df['Period'] = df.Year.astype(str).str.cat(df.Quarter.astype(str), sep='q')
    >>> print(df)
       Year Quarter  Period
    0  2014       1  2014q1
    1  2015       3  2015q3
    

    Joining multiple columns is just a matter of passing either a list of series or a dataframe containing all but the first column as a parameter to str.cat() invoked on the first column (Series):

    >>> df = pd.DataFrame(
    ...     [['USA', 'Nevada', 'Las Vegas'],
    ...      ['Brazil', 'Pernambuco', 'Recife']],
    ...     columns=['Country', 'State', 'City'],
    ... )
    >>> df['AllTogether'] = df['Country'].str.cat(df[['State', 'City']], sep=' - ')
    >>> print(df)
      Country       State       City                   AllTogether
    0     USA      Nevada  Las Vegas      USA - Nevada - Las Vegas
    1  Brazil  Pernambuco     Recife  Brazil - Pernambuco - Recife
    

    Do note that if your pandas dataframe/series has null values, you need to include the parameter na_rep to replace the NaN values with a string, otherwise the combined column will default to NaN.

提交回复
热议问题