Pandas seems to ignore first column name when reading tab-delimited data, gives KeyError

前端未结

关注

 4  1490

[愿得一人] 2021-01-04 17:37

I am using pandas 0.12.0 in ipython3 on Ubuntu 13.10, in order to wrangle large tab-delimited datasets in txt files. Using read_table to create a DataFrame from the txt app

4条回答

太阳男子 (楼主)

2021-01-04 18:16
I think the issue you're having is just that the "tabs" in datafile.txt aren't actually tabs. (When I read it in using your code, the dataframe has 1 column and 15 rows.) You could do a regex search-and-replace, or, alternately, just parse it as-is:
```
import pandas as pd
from numpy import transpose

with open('~/datafile.txt', 'r') as datafile:
    data = datafile.read()
while '  ' in data:
    data = data.replace('  ', ' ')
data = transpose([row.split(' ') for row in data.strip().split('\n')])
datadict = {}
for col in data:
    datadict[col[0]] = col[1:]
samples = pd.DataFrame(datadict)
print(samples['RECORDING_SESSION_LABEL'])
```
This works ok for me on your datafile.txt: the resulting dataframe has 15 rows x 7 columns.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...