Retrieve delimiter infered by read_csv in pandas

后端 未结 3 767
無奈伤痛
無奈伤痛 2021-02-13 04:18

When using the configuration for automatic separator detection to read csv files (pd.read_csv(file_path, sep=None)), pandas tries to infer the delimiter (or separat

相关标签:
3条回答
  • 2021-02-13 04:55

    I think you can do this without having to import csv:

    reader = pd.read_csv(file_path, sep = None, iterator = True)
    inferred_sep = reader._engine.data.dialect.delimiter
    

    EDIT:

    Forgot the iterator = True argument.

    0 讨论(0)
  • 2021-02-13 05:02

    If all you want to do is detect the dialect of a csv (without loading in your data), you can use the inbuilt csv.Sniffer standard:

    The Sniffer class is used to deduce the format of a CSV file.

    In particular, the sniff method:

    sniff(sample, delimiters=None)
    

    Analyze the given sample and return a Dialect subclass reflecting the parameters found. If the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiter characters.

    Here's an example of its usage:

    with open('example.csv', 'r') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.readline())
        print(dialect.delimiter)
    
    0 讨论(0)
  • 2021-02-13 05:10

    csv.Sniffer

    The Sniffer class is used to deduce the format of a CSV file.

    sniff(sample, delimiters=None)

    Analyze the given sample and return a Dialect subclass reflecting the parameters found. If the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiter characters.


    Dialect.delimiter

    A one-character string used to separate fields. It defaults to ','

    import csv
    
    sniffer = csv.Sniffer()
    dialect = sniffer.sniff('first, second, third, fourth')
    print dialect.delimiter
    
    0 讨论(0)
提交回复
热议问题