Pandas read “delimited” file

早过忘川 提交于 2019-12-24 07:27:42

问题


Hi, I have such .txt file, with the first column represent index, which is followed by three columns inside a pair of "()" representing x, y and z coordinates.

I want to load the first four columns of this file to pandas Dataframe. However, I found it's pretty hard as the delimiter is firstly " " and then "(" and inside the parenthesis there is ",".

Could someone give me some hint on how to deal with such situation?

Thank you! Shawn


回答1:


It is possible to write your own parser. Something like:

Code:

def parse_my_file(filename):
    with open(filename) as f:
        for line in f:
            yield [x.strip(',()')
                   for x in re.split(r'\s+', line.strip())[:4]]

Test Code:

df = pd.DataFrame(parse_my_file('file1'))
print(df)

Results:

    0       1       2  3
0  g1     -16       0  0
1  gr      10       0  0
2  D1  -6.858  2.7432  0
3  D2  -2.286  2.7432  0

This data file was created when I typed in your first four rows.




回答2:


You can use regex pattern as seperator of CSV.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

like this.

import pandas as pd

df = pd.read_csv('Initial_Coordinate.txt', sep=r'[()]', header=None)
print(df)

However, rather than creating complex delimiters, it is better to fix it as a simple delimiter and then read it with pandas.

thx



来源:https://stackoverflow.com/questions/44103290/pandas-read-delimited-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!