Multiple delimiters in single CSV file

前端 未结 3 359
情深已故
情深已故 2020-12-10 18:22

I have a CSV, which has got three different delimiters namely, \'|\', \',\' and \';\' between different columns.

How can I using Python parse this CSV ?

My

相关标签:
3条回答
  • 2020-12-10 18:39

    My sample data was something like this :

    2017-01-24|05:19:30+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:19:30Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
    2017-01-24|05:19:30+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1583e83882b8e7Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:19:26Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
    2017-01-24|05:21:59+0000|TRANSACTIONDelim_secondVIEW_PRIVACY_POLICYDelim_firstCONSUMERIDDelim_secondnullDelim_firstTRANSACTIONDATEDelim_second17-01-24 05:21:59Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
    2017-01-24|05:59:25+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1586a2aa4bc18fDelim_firstTRANSACTIONDATEDelim_second17-01-24 05:59:21Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
    2017-01-24|05:59:36+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1583e83882b8e7Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:59:31Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
    2017-01-24|06:04:25+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:04:24Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second|**
    2017-01-24|06:05:07+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:05:07Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
    2017-01-24|06:05:07+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:05:07Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second|**
    

    So, it contained a '|' delimiter, 'Delim_first' and 'Delim_second' as the delimiters.

    I needed the data to be separated at all the three delimiters.

    Created a pandas Dataframe out of the data and then used ;

    i = 0 
    while i < 8:
        df10[i+6]=(df10[2].str[:].str.split('First_delim').apply(pd.Series).astype(str))[i]
        i = i + 1
    
    j = 0 
    while j < 8:
        k = 0
        df10[2*j+14]=(df10[j+6+k].str[:].str.split('Second_delim').apply(pd.Series).astype(str))[0]
        df10[2*j+15]=(df10[j+6+k].str[:].str.split('Second_delim').apply(pd.Series).astype(str))[1]
        j = j + 1
        k = k + 1
    
    j=0
    for i in df10[1]:
        i = i[:-5]
        df10[1][j]=i
        j = j+1
    
    0 讨论(0)
  • 2020-12-10 18:43

    Sticking with the standard library, re.split() can split a line at any of these characters:

    import re
    
    with open(file_name) as fobj:
        for line in fobj:
            line_data = re.split('Delim_first|Delim_second|[|]', line)
            print(line_data)
    

    This will split at the delimiters |, Delim_first, and Delim_second.

    Or with pandas:

    import pandas as pd
    df = pd.read_csv('multi_delim.csv', sep='Delim_first|Delim_second|[|]', 
                      engine='python', header=None)
    

    Result:

    0 讨论(0)
  • 2020-12-10 18:52

    One easy way to achieve what you want is using pandas package, here's a little example:

    import pandas as pd
    import StringIO
    
    data = StringIO.StringIO("""a;b|c;
    2016-09-05 10:47:00|1,foo;
    2016-09-06 10:47:00;2;foo2;
    2016-09-07 10:47:00;3;foo3;""")
    
    df = pd.read_csv(data, sep='[;,|]', engine='python')
    for c in ['a', 'b', 'c']:
        print('-' * 80)
        print(df[c])
    
    0 讨论(0)
提交回复
热议问题