Strip / trim all strings of a dataframe

后端 未结 6 852
夕颜
夕颜 2020-11-27 13:04

Cleaning the values of a multitype data frame in python/pandas, I want to trim the strings. I am currently doing it in two instructions :

import pandas as pd         


        
相关标签:
6条回答
  • 2020-11-27 13:07

    If you really want to use regex, then

    >>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
    >>> df
       0   1
    0  a  10
    1  c   5
    

    But it should be faster to do it like this:

    >>> df[0] = df[0].str.strip()
    
    0 讨论(0)
  • 2020-11-27 13:14
    def trim(x):
        if x.dtype == object:
            x = x.str.split(' ').str[0]
        return(x)
    
    df = df.apply(trim)
    
    0 讨论(0)
  • 2020-11-27 13:15

    Money Shot

    Here's a compact version of using applymap with a straightforward lambda expression to call strip only when the value is of a string type:

    df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
    

    Full Example

    A more complete example:

    import pandas as pd
    
    
    def trim_all_columns(df):
        """
        Trim whitespace from ends of each value across all series in dataframe
        """
        trim_strings = lambda x: x.strip() if isinstance(x, str) else x
        return df.applymap(trim_strings)
    
    
    # simple example of trimming whitespace from data elements
    df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
    df = trim_all_columns(df)
    print(df)
    
    
    >>>
       0   1
    0  a  10
    1  c   5
    

    Working Example

    Here's a working example hosted by trinket: https://trinket.io/python3/e6ab7fb4ab

    0 讨论(0)
  • 2020-11-27 13:17

    You can try:

    df[0] = df[0].str.strip()
    

    or more specifically for all string columns

    non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
    df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())
    
    0 讨论(0)
  • 2020-11-27 13:20

    You can use the apply function of the Series object:

    >>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
    >>> df[0][0]
    '  a  '
    >>> df[0] = df[0].apply(lambda x: x.strip())
    >>> df[0][0]
    'a'
    

    Note the usage of strip and not the regex which is much faster

    Another option - use the apply function of the DataFrame object:

    >>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
    >>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)
    
       0   1
    0  a  10
    1  c   5
    
    0 讨论(0)
  • 2020-11-27 13:27

    You can use DataFrame.select_dtypes to select string columns and then apply function str.strip.

    Notice: Values cannot be types like dicts or lists, because their dtypes is object.

    df_obj = df.select_dtypes(['object'])
    print (df_obj)
    0    a  
    1    c  
    
    df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
    print (df)
    
       0   1
    0  a  10
    1  c   5
    

    But if there are only a few columns use str.strip:

    df[0] = df[0].str.strip()
    
    0 讨论(0)
提交回复
热议问题