Read CSV file using Pandas: complex separator

后端 未结 5 780
北海茫月
北海茫月 2020-12-11 19:28

I have a csv file which I want to read using python panda. The header and lines looks the following:

 A           ^B^C^D^E  ^F          ^G           ^H^I^J^K         


        
相关标签:
5条回答
  • 2020-12-11 19:44

    Read the file as you have done and then strip extra whitespace for each column which is a string:

    df = (pd.read_csv('input.csv', sep="^")
          .apply(lambda x: x.str.strip() if isinstance(x, str) else x))
    
    0 讨论(0)
  • 2020-12-11 19:51

    Use regex \s*\^ which means 0 or more whitespace and ^, you have to specify the python engine here to avoid a warning about regex support:

    In [152]:
    
    t="""A           ^B^C^D^E  ^F          ^G           ^H^I^J^K^L^M^N"""
    df= pd.read_csv(io.StringIO(t), sep='\s*\^', engine='python')
    df.columns
    Out[152]:
    Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N'], dtype='object')
    
    0 讨论(0)
  • Can't you supply regex as a seperator?

    sep = re.compile(r'[\^\s]+')
    
    0 讨论(0)
  • 2020-12-11 20:02

    Your separator can be a regular expression, so try something like this:

    df = pd.read_csv('input.csv', sep="[ ^]+")
    

    The regular expression should use any number of spaces or carets (^) in a row as a single separator.

    0 讨论(0)
  • 2020-12-11 20:02

    If the only whitespace in your file is the extra whitespace between columns (i.e. no columns have raw text with spaces), an easy fix would be to simply remove all the spaces in the file. An example command to do that would be:

    <input.csv tr -d '[[:blank:]]' > new_input.txt
    
    0 讨论(0)
提交回复
热议问题