How do you handle column names having spaces in them when using pd.read_clipboard?

前端 未结 2 1365
-上瘾入骨i
-上瘾入骨i 2021-01-21 00:37

This is a real problem I\'ve faced for a long time.

Take this dataframe:

         A         B  THRESHOLD
       NaN       NaN        NaN
 -0.041158 -0.16         


        
相关标签:
2条回答
  • 2021-01-21 01:19

    What I do in this situation is that I make all my columns two or more spaces apart, then I use sep='\s\s+' for my delimiter, this way when I do have column headings with a single space such as, Col #3 above it treats it as one column.

             A         B     Col #3
           NaN       NaN        NaN
     -0.041158  -0.161571   0.329038
      0.238156   0.525878   0.110370
      0.606738   0.854177  -0.095147
      0.200166   0.385453   0.166235
    
    df = pd.read_clipboard(sep='\s\s+')
    

    You do get this warning, but you can ignore it since it as done it right. Or you could put the engine='python' if your OCD gets the best of you. :)

    C:\Program Files\Anaconda3\lib\site-packages\pandas\io\clipboards.py:63: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. return read_table(StringIO(text), sep=sep, **kwargs)

    print(df)
    
              A         B    Col #3
    0       NaN       NaN       NaN
    1 -0.041158 -0.161571  0.329038
    2  0.238156  0.525878  0.110370
    3  0.606738  0.854177 -0.095147
    4  0.200166  0.385453  0.166235
    
    0 讨论(0)
  • 2021-01-21 01:27

    Using re, io and pd.read_table to drive the point I was making in the comments, I copied the exact text you have in the post, applied a first round of re.sub to remove any leading whitespace. Then, I replaced any space that is preceded by a number--this is unique to the case at hand since the column names are mostly string characters--with 2 spaces. Once all that is done, I converted the resulting string into an io.StringIO object and fed the latter to the pd.read_table function. This essentially the same thing as copying the text and pasting it in sublime text, and then applying to search and replace operations before you finally copy the resulting string and feed it to pd.read_clipboard.

    The following snippet of code illustrates the point:

    import pandas as pd
    import re
    import io
    
    
    text = """         A         B     Col #3
            NaN       NaN        NaN
      -0.041158 -0.161571   0.329038
       0.238156  0.525878   0.110370
       0.606738  0.854177  -0.095147
       0.200166  0.385453   0.166235"""
    
    
    with io.StringIO(re.sub("(?<=[0-9]) +", "  ", re.sub("^ +", "", text))) as fs:
        df =  pd.read_table(fs, header=0, sep="\s{2,}",engine='python')
    
    
    #           A         B    Col #3
    # 0       NaN       NaN       NaN
    # 1 -0.041158 -0.161571  0.329038
    # 2  0.238156  0.525878  0.110370
    # 3  0.606738  0.854177 -0.095147
    # 4  0.200166  0.385453  0.166235
    

    Thanks for asking the question.

    0 讨论(0)
提交回复
热议问题