Multiple delimiters in column headers also separates the row values

亡梦爱人 提交于 2019-12-06 05:41:05

Updated Answer
It was just easier to get rid of spaces... Let me know if this works

txt = open('b.txt').read().split('\nEND')[0] \
    .replace(' ', '').replace('|\n', '\n').split('\n', 1)[1]

pd.read_csv(
    pd.io.common.StringIO(txt),
    sep=r'#\||\||#|,',
    engine='python')

   A1  A2  A3  A4  A5    A6     A7  A8   A9  A10
0   1   2   3   4   5  6.cd  7.dvd NaN  NaN   10
1   1   2   3   4   5  6.cd    NaN NaN  9.0   10
2   1   2   3   4   5   NaN  7.dvd NaN  NaN   10

Old Answer

I used \W+ as a fast and easy way to parse what you showed. Below I used something more specific to the actual delimiters you need.

txt = open('b.txt').read().split('\nEND')[0]
pd.read_csv(
    pd.io.common.StringIO(txt),
    sep=r'[\|, ,#,\,]+',
    skiprows=1,index_col=False, engine='python')

   A1  A2  A3  A4    A5  A6  A7     A8  A9
0   1   2   3   4  5.cd   6   7  8.dvd   9
1   1   2   3   4  5.cd   6   7  8.dvd   9
2   1   2   3   4  5.cd   6   7  8.dvd   9

However, I still think this is a cleaner way to do it. Here, I separate the parsing of the header from the parsing of the rest of the data. That way, I assume the data should only be using , as a separator.

txt = open('b.txt').read().split('END')[0]
_, h, txt = txt.split('\n', 2)
pat = r'[\|, ,#,\,]+'
names = re.split(pat, h.strip())

pd.read_csv(
    pd.io.common.StringIO(txt),
    names=names, header=None,
    engine='python')

   A1  A2  A3  A4    A5  A6  A7     A8  A9
0   1   2   3   4  5.cd   6   7  8.dvd   9
1   1   2   3   4  5.cd   6   7  8.dvd   9
2   1   2   3   4  5.cd   6   7  8.dvd   9
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!