How do you handle column names having spaces in them when using pd.read_clipboard?

前端 未结 2 1366
-上瘾入骨i
-上瘾入骨i 2021-01-21 00:37

This is a real problem I\'ve faced for a long time.

Take this dataframe:

         A         B  THRESHOLD
       NaN       NaN        NaN
 -0.041158 -0.16         


        
2条回答
  •  不知归路
    2021-01-21 01:27

    Using re, io and pd.read_table to drive the point I was making in the comments, I copied the exact text you have in the post, applied a first round of re.sub to remove any leading whitespace. Then, I replaced any space that is preceded by a number--this is unique to the case at hand since the column names are mostly string characters--with 2 spaces. Once all that is done, I converted the resulting string into an io.StringIO object and fed the latter to the pd.read_table function. This essentially the same thing as copying the text and pasting it in sublime text, and then applying to search and replace operations before you finally copy the resulting string and feed it to pd.read_clipboard.

    The following snippet of code illustrates the point:

    import pandas as pd
    import re
    import io
    
    
    text = """         A         B     Col #3
            NaN       NaN        NaN
      -0.041158 -0.161571   0.329038
       0.238156  0.525878   0.110370
       0.606738  0.854177  -0.095147
       0.200166  0.385453   0.166235"""
    
    
    with io.StringIO(re.sub("(?<=[0-9]) +", "  ", re.sub("^ +", "", text))) as fs:
        df =  pd.read_table(fs, header=0, sep="\s{2,}",engine='python')
    
    
    #           A         B    Col #3
    # 0       NaN       NaN       NaN
    # 1 -0.041158 -0.161571  0.329038
    # 2  0.238156  0.525878  0.110370
    # 3  0.606738  0.854177 -0.095147
    # 4  0.200166  0.385453  0.166235
    

    Thanks for asking the question.

提交回复
热议问题