Parse a plain text file into a CSV file using Python

后端 未结 2 1548
星月不相逢
星月不相逢 2020-12-09 23:08

I have a series of HTML files that are parsed into a single text file using Beautiful Soup. The HTML files are formatted such that their output is always three lines within

相关标签:
2条回答
  • 2020-12-09 23:49

    I'm not entirely sure what CSV library you're using, but it doesn't look like Python's built-in one. Anyway, here's how I'd do it:

    import csv
    import itertools
    
    with open('extracted.txt', 'r') as in_file:
        stripped = (line.strip() for line in in_file)
        lines = (line for line in stripped if line)
        grouped = itertools.izip(*[lines] * 3)
        with open('extracted.csv', 'w') as out_file:
            writer = csv.writer(out_file)
            writer.writerow(('title', 'intro', 'tagline'))
            writer.writerows(grouped)
    

    This sort of makes a pipeline. It first gets data from the file, then removes all the whitespace from the lines, then removes any empty lines, then groups them into groups of three, and then (after writing the CSV header) writes those groups to the CSV file.

    To combine the last two columns as you mentioned in the comments, you could change the writerow call in the obvious way and the writerows to:

    writer.writerows((title, intro + tagline) for title, intro, tagline in grouped)
    
    0 讨论(0)
  • 2020-12-10 00:11

    Perhaps I didn't understand you correctly, but you can do:

    file = open("extracted.txt")
    
    # if you don't want to do .strip() again, just create a list of the stripped 
    # lines first.
    lines = [line.strip() for line in file if line.strip()]
    
    for i, line in enumerate(lines):
        csv.SetCell(i % 3, line)
    
    0 讨论(0)
提交回复
热议问题