python chain a list from a tsv file

烂漫一生 提交于 2019-12-02 03:29:33

If the separator betweeen first is only a space and not a serie of spaces or a tab, you could do that

with open('file_name') as f:
    lines = f.readlines()
for line in lines:
    e_line = line.split(' ')
    real_line = e_line[3]
    print real_line.split(';')

Answer to your updated question.

But the problem is that it not deleting the first 3 columns ?

There are several mistakes.

Your code:

import re
with open('test.tsv') as f:
    lines = f.readlines()
for line in lines[22:len(lines)]:
    re.sub(r"^\s+", " ", line, flags = re.MULTILINE)
    e_line = line.split(' ')
    real_line = e_line[0]
    print real_line.split(';')

This line does nothing...

re.sub(r"^\s+", " ", line, flags = re.MULTILINE)

Because re.sub function doesn't change your line variable, but returns replaced string. So you may want to do as below.

line = re.sub(r"^\s+", " ", line, flags = re.MULTILINE)

And your regexp ^s\+ matches only string which starts with whitespaces or tabs. Because you use ^. But I think you just want to replace consective whitespaces or tabs with one space. So then, above code will be as below.(Just remove ^ in the regexp)

line = re.sub(r"\s+", " ", line, flags = re.MULTILINE)

Now, each string in line are separated just one space. So line.split(' ') will work as you want.

Next, e_line[0] returns first element of e_line which is 1st column of the line. But you want to skip first 3 columns and get 4th column. You can do like this:

e_line = line.split(' ')
real_line = e_line[3]

OK. Now entire code is look like this.

for line in lines:#<---I also changed here because there is no need to skip first 22 lines in your example.
    line = re.sub(r"\s+", " ", line)
    e_line = line.split(' ')
    real_line = e_line[3]
    print real_line

output:

14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade
14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade
14th_century;Niger;Nigeria;British_Empire;Slavery;Africa;Atlantic_slave_trade;African_slave_trade

P.S:

This line can become more pythonic.

before:

for line in lines[22:len(lines)]:

after:

for line in lines[22:]:

And, you don't need to use flags = re.MULTILINE, because line is single-line in the for-loop.

You don't need to use regex for this. The csv module can handle tab-separated files too:

import csv

filereader = csv.reader(open('test.tsv', 'rb'), delimiter='\t')
path_list = [row[3].split(';') for row in filereader]

print(path_list)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!