How do you read in a dataframe with lists using pd.read_clipboard?

前端 未结 5 769
无人及你
无人及你 2021-01-05 00:33

Here\'s some data from another question:

                          positive                 negative          neutral
1   [marvel, moral, bold, destiny]              


        
相关标签:
5条回答
  • 2021-01-05 00:48

    Another alternative is

    In [43]:  df.applymap(lambda x: x[1:-1].split(', '))
    Out[43]: 
                             positive                negative         neutral
    1  [marvel, moral, bold, destiny]                      []  [view, should]
    2                     [beautiful]     [complicated, need]              []
    3                     [celebrate]  [crippling, addiction]           [big]
    

    Note that this assumes the first and last character in each cell is [ and ]. It also assumes there is exactly one space after the commas.

    0 讨论(0)
  • 2021-01-05 00:50

    TL;DR

    For basic structures you can use yaml without having to add quotes:

    import yaml
    df = pd.read_clipboard(sep='\s{2,}').applymap(yaml.load)
    
    type(df.iloc[0, 0])
    Out: list
    

    Full Answer

    Under certain conditions, you can read your lists as strings and the convert them using literal_eval (or pd.eval, if they are simple lists).

    For example,

               A   B
    0  [1, 2, 3]  11
    1  [4, 5, 6]  12
    

    First, ensure there are at least two spaces between the columns, then copy your data and run the following:

    import ast 
    
    df = pd.read_clipboard(sep=r'\s{2,}', engine='python')
    df['A'] = df['A'].map(ast.literal_eval)    
    df
        
               A   B
    0  [1, 2, 3]  11
    1  [4, 5, 6]  12
    
    df.dtypes
    
    A    object
    B     int64
    dtype: object
    

    Notes

    • for multiple columns, use applymap in the conversion step:

      df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(ast.literal_eval)
      
    • if your columns can contain NaNs, define a function that can handle them appropriately:

      parser = lambda x: x if pd.isna(x) else ast.literal_eval(x)
      df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(parser)
      
    • if your columns contain lists of strings, you will need something like yaml.load (requires installation) to parse them instead if you don't want to manually add quotes to the data. See above.

    0 讨论(0)
  • 2021-01-05 00:54

    Per help from @MaxU

    df = pd.read_clipboard(sep='\s{2,}', engine='python')
    

    Then:

    >>> df.apply(lambda col: col.str[1:-1].str.split(', '))
                             positive                negative         neutral
    1  [marvel, moral, bold, destiny]                      []  [view, should]
    2                     [beautiful]     [complicated, need]              []
    3                     [celebrate]  [crippling, addiction]           [big]
    
    >>> df.apply(lambda col: col.str[1:-1].str.split()).loc[3, 'negative']
    ['crippling', 'addiction']
    

    And per the notes from @unutbu who came up with a similar solution:

    assumes the first and last character in each cell is [ and ]. It also assumes there is exactly one space after the commas.

    0 讨论(0)
  • 2021-01-05 00:55

    I did it this way:

    df = pd.read_clipboard(sep='\s{2,}', engine='python')
    df = df.apply(lambda x: x.str.replace(r'[\[\]]*', '').str.split(',\s*', expand=False))
    

    PS i'm sure - there must be a better way to do that...

    0 讨论(0)
  • 2021-01-05 00:55

    Another version:

    df.applymap(lambda x:
                ast.literal_eval("[" + re.sub(r"[[\]]", "'", 
                                              re.sub("[,\s]+", "','", x)) + "]"))
    
    0 讨论(0)
提交回复
热议问题