CSV writing strings of text that need a unique delimiter

后端 未结 3 1281
无人及你
无人及你 2021-01-25 06:14

I wrote an HTML parser in python used to extract data to look like this in a csv file:

    itemA, itemB, itemC, Sentence that might contain commas, or colons: li         


        
相关标签:
3条回答
  • 2021-01-25 06:46

    CSV files usually use double quotes " to wrap long fields that might contain a field separator like a comma. If the field contains a double quote it's escaped with a backslash: \".

    0 讨论(0)
  • 2021-01-25 06:54

    Yes, delimiters separate values within each line of a CSV file. There are two strategies to delimiting text that has a lot of punctuation marks. First, you can quote the values, e.g.:

    Value 1, Value 2, "This value has a comma, <- right there", Value 4
    

    The second strategy is to use tabs (i.e., '\t').

    Python's built-in CSV module can both read and write CSV files that use quotes. Check out the example code under the csv.reader function. The built-in csv module will handle quotes correctly, e.g. it will escape quotes that are in the value itself.

    0 讨论(0)
  • 2021-01-25 07:04

    As I suggested informally in a comment, unique just means you need to use some character that won't be in the data — chr(255) might be a good choice. For example:

    Note: The code shown is for Python 2.x — see comments for a Python 3 version.

    import csv
    
    DELIMITER = chr(255)
    data = ["itemA", "itemB", "itemC",
            "Sentence that might contain commas, colons: or even \"quotes\"."]
    
    with open('data.csv', 'wb') as outfile:
        writer = csv.writer(outfile, delimiter=DELIMITER)
        writer.writerow(data)
    
    with open('data.csv', 'rb') as infile:
        reader = csv.reader(infile, delimiter=DELIMITER)
        for row in reader:
            print row
    

    Output:

    ['itemA', 'itemB', 'itemC', 'Sentence that might contain commas, colons: or even "quotes".']
    

    If you're not using the csv module and instead are writing and/or reading the data manually, then it would go something like this:

    with open('data.csv', 'wb') as outfile:
        outfile.write(DELIMITER.join(data) + '\n')
    
    with open('data.csv', 'rb') as infile:
        row = infile.readline().rstrip().split(DELIMITER)
        print row
    
    0 讨论(0)
提交回复
热议问题