Parse a plain text file into a CSV file using Python

后端未结

关注

 2  1548

I have a series of HTML files that are parsed into a single text file using Beautiful Soup. The HTML files are formatted such that their output is always three lines within

相关标签:

2条回答

甜味超标

2020-12-09 23:49
I'm not entirely sure what CSV library you're using, but it doesn't look like Python's built-in one. Anyway, here's how I'd do it:
```
import csv
import itertools

with open('extracted.txt', 'r') as in_file:
    stripped = (line.strip() for line in in_file)
    lines = (line for line in stripped if line)
    grouped = itertools.izip(*[lines] * 3)
    with open('extracted.csv', 'w') as out_file:
        writer = csv.writer(out_file)
        writer.writerow(('title', 'intro', 'tagline'))
        writer.writerows(grouped)
```
This sort of makes a pipeline. It first gets data from the file, then removes all the whitespace from the lines, then removes any empty lines, then groups them into groups of three, and then (after writing the CSV header) writes those groups to the CSV file.

To combine the last two columns as you mentioned in the comments, you could change the writerow call in the obvious way and the writerows to:
```
writer.writerows((title, intro + tagline) for title, intro, tagline in grouped)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

渐次进展

2020-12-10 00:11

Perhaps I didn't understand you correctly, but you can do:

file = open("extracted.txt")

# if you don't want to do .strip() again, just create a list of the stripped 
# lines first.
lines = [line.strip() for line in file if line.strip()]

for i, line in enumerate(lines):
    csv.SetCell(i % 3, line)

0 讨论(0)