问题
Hi, I have such .txt file, with the first column represent index, which is followed by three columns inside a pair of "()" representing x, y and z coordinates.
I want to load the first four columns of this file to pandas Dataframe. However, I found it's pretty hard as the delimiter is firstly " " and then "(" and inside the parenthesis there is ",".
Could someone give me some hint on how to deal with such situation?
Thank you! Shawn
回答1:
It is possible to write your own parser. Something like:
Code:
def parse_my_file(filename):
with open(filename) as f:
for line in f:
yield [x.strip(',()')
for x in re.split(r'\s+', line.strip())[:4]]
Test Code:
df = pd.DataFrame(parse_my_file('file1'))
print(df)
Results:
0 1 2 3
0 g1 -16 0 0
1 gr 10 0 0
2 D1 -6.858 2.7432 0
3 D2 -2.286 2.7432 0
This data file was created when I typed in your first four rows.
回答2:
You can use regex pattern
as seperator of CSV.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
like this.
import pandas as pd
df = pd.read_csv('Initial_Coordinate.txt', sep=r'[()]', header=None)
print(df)
However, rather than creating complex delimiters, it is better to fix it as a simple delimiter and then read it with pandas.
thx
来源:https://stackoverflow.com/questions/44103290/pandas-read-delimited-file