pandas.errors.ParserError: ',' expected after '"'

前端 未结 2 815
逝去的感伤
逝去的感伤 2021-01-22 03:36

I am trying to read this dataset from Kaggle: Amazon sales rank data for print and kindle books

The file amazon_com_extras.csv has a column named \"Title\"

相关标签:
2条回答
  • 2021-01-22 03:57

    This works for me Sniffer:

    import requests
    import csv
    with open('spotify_dataset.csv') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(14734))
    
    
    df = pd.read_csv('spotify_dataset.csv', engine='python', dialect=dialect, error_bad_lines=False)
    
    0 讨论(0)
  • 2021-01-22 04:10

    This is happening to you because there are fields inside the document that contain unescaped quotes inside the quoted text.

    I am not aware of a way to instruct the csv parser to handle that without preprocessing.

    If you don't care about those columns, you can use

    pd.read_csv("amazon_com_extras.csv", engine="python", sep=',', quotechar='"', error_bad_lines=False)
    

    That will disable the Exception from being raised, but it will remove the affected lines (you will see that in the console).

    An example of such a line:

    "1405246510","book","hardcover",""Hannah Montana" Annual 2010","Unknown","Egmont Books Ltd"
    

    Notice the quotes.

    Instead, a more standard dialect of csv would have rendered:

    1405246510,"book","hardcover","""Hannah Montana"" Annual 2010","Unknown","Egmont Books Ltd"
    

    You can, for example, load the file with Libreoffice and re-save it as CSV again to get a working CSV dialect or use other preprocessing techniques.

    0 讨论(0)
提交回复
热议问题