I am trying to read a tab separated txt file using Pandas. The file looks like this:
data file sample
14.38 14.21 0.8951 5.386 3.312 2.462 4.9
If I use this code:
import pandas as pd
parsed_csv_txt = pd.read_csv("tabbed.txt",sep="\t")
print(parsed_csv_txt)
On this file:
a b c d e
14.69 2452 982 234 12
14.11 5435 234 12
16.63 1 12 66
I get:
a b c d e
0 14.69 2452 982.0 234.0 12
1 14.11 5435 234.0 NaN 12
2 16.63 1 NaN 12.0 66
Are there any issues with the output that we see here?
If you would like a different output such as:
a b c d e
0 14.69 2452 982 234 12.0
1 14.11 5435 234 12 NaN
2 16.63 1 12 66 NaN
Use this code:
import pandas as pd
parsed_csv_txt = pd.read_csv("tabbed.txt",delim_whitespace=True)
print(parsed_csv_txt)
Note
For a longer discussion around the topic of variable amounts of whitespace between values check out this discussion: Can pandas handle variable-length whitespace as column delimiters
Pandas read_csv is very versatile, you can use it with delim_whitespace = True to handle variable number of whitespaces.
df = pd.read_csv(filename, delim_whitespace=True)
Option 2: Use separator argument
df = pd.read_csv(filename, sep='\t+')