Read tab-delimited fields with pandas, some lines with more than one tabs

后端 未结 2 627
粉色の甜心
粉色の甜心 2021-01-22 03:20

I am trying to read a tab separated txt file using Pandas. The file looks like this:

data file sample

14.38   14.21   0.8951  5.386   3.312   2.462   4.9         


        
相关标签:
2条回答
  • 2021-01-22 03:23

    If I use this code:

    import pandas as pd
    parsed_csv_txt = pd.read_csv("tabbed.txt",sep="\t")
    print(parsed_csv_txt)
    

    On this file:

    a   b   c   d   e
    14.69   2452    982 234 12
    14.11   5435    234     12
    16.63   1       12  66
    

    I get:

           a     b      c      d   e
    0  14.69  2452  982.0  234.0  12
    1  14.11  5435  234.0    NaN  12
    2  16.63     1    NaN   12.0  66
    

    Are there any issues with the output that we see here?

    If you would like a different output such as:

           a     b    c    d     e
    0  14.69  2452  982  234  12.0
    1  14.11  5435  234   12   NaN
    2  16.63     1   12   66   NaN
    

    Use this code:

    import pandas as pd
    parsed_csv_txt = pd.read_csv("tabbed.txt",delim_whitespace=True)
    print(parsed_csv_txt)
    

    Note

    For a longer discussion around the topic of variable amounts of whitespace between values check out this discussion: Can pandas handle variable-length whitespace as column delimiters

    0 讨论(0)
  • 2021-01-22 03:37

    Pandas read_csv is very versatile, you can use it with delim_whitespace = True to handle variable number of whitespaces.

    df = pd.read_csv(filename, delim_whitespace=True)
    

    Option 2: Use separator argument

    df = pd.read_csv(filename, sep='\t+')
    
    0 讨论(0)
提交回复
热议问题