问题
I am new to Python
and I would like to use pandas
for reading the data. I have done some searching and effort to solve my issue but still I am struggling. thanks for your help in advance!
I have a.txt file looking like this;
skip1
A1| A2 |A3 |A4# A5# A6 A7| A8 , A9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
END***
Some other data starts from here
The first task is that
I would like to assign A1,A2,A3,A4,A5,A6,A7,A8 and A9 as column names. However, there are multiple separators such as ' ','|','#'
and this makes hassle to assign separator when reading the file. I tried like this;
import pandas as pd
import glob
filelist=glob.glob('*.txt')
print(filelist)
df = pd.read_csv(filelist,skiprows=1,skipfooter=2,skipinitialspace=True, header=0, sep=r'\| |,|#',engine='python')
But it seems that nothing is happened when I check Spyder's data explorer df.
The second task is that during the reading removing the data starting with the rows END***
that I don't need. The header has always the same length. However, skipfooter needs the number of lines to skip, which should be changed between the files.
Some several questions already been asked but It seems I can't make them work on my question!
how-to-read-txt-file-in-pandas-with-multiple-delimiters
pandas-read-delimited-file?
import-text-to-pandas-with-multiple-delimiters
pandas-ignore-all-lines-following-a-specific-string-when-reading-a-file-into-a
EDIT: about removing the the reading removing the data starting with the rows END
If the b.txt file like this b.txt
skip1
A1| A2 |A3 |A4# A5# A6 A7| A8 , A9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
END123
Some other data starts from here
an by using the second solution below;
txt = open('b.txt').read().split('\nEND')[0]
_, h, txt = txt.split('\n', 2)
pat = r'[\|, ,#,\,]+'
names = re.split(pat, h.strip())
pd.read_csv(
pd.io.common.StringIO(txt),
names=names, header=None,
engine='python')
Getting this,
A1 A2 A3 A4 A5 A6 A7 A8 A9
0 1 2 3 4 5 6 7 8 9
1 1 2 3 4 5 6 7 8 9
2 1 2 3 4 5 6 7 8 9
回答1:
Split the file, then read from string
txt = open('test.txt').read().split('\nEND***')[0]
pd.read_csv(
pd.io.common.StringIO(txt),
sep=r'\W+',
skiprows=1, engine='python')
A1 A2 A3 A4 A5 A6 A7 A8 A9
0 1 2 3 4 5 6 7 8 9
1 1 2 3 4 5 6 7 8 9
2 1 2 3 4 5 6 7 8 9
We can be very explicit with the parsing of the header and parse the rest of the file as csv
txt = open('test.txt').read().split('\nEND***')[0]
_, h, txt = txt.split('\n', 2)
pat = r'[\|, ,#,\,]+'
names = re.split(pat, h.strip())
pd.read_csv(
pd.io.common.StringIO(txt),
names=names, header=None,
engine='python')
A1 A2 A3 A4 A5 A6 A7 A8 A9
0 1 2 3 4 5 6 7 8 9
1 1 2 3 4 5 6 7 8 9
2 1 2 3 4 5 6 7 8 9
回答2:
answering first question:
In [182]: df = pd.read_csv(filename, sep=r'\s*(?:\||\#|\,)\s*',
skiprows=1, engine='python')
In [183]: df
Out[183]:
A1 A2 A3 A4 A5 A6 A7 A8 A9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
来源:https://stackoverflow.com/questions/45695040/reading-files-with-multiple-delimiter-in-column-headers-and-skipping-some-rows-a