Ignore character while importing with pandas

前端 未结 4 1199
刺人心
刺人心 2021-01-16 19:52

I could not find such an option in the documentation. A measuring device spits out everything in Excel:

    <>         


        
相关标签:
4条回答
  • 2021-01-16 20:22

    I have the same problem. My first line is:

    # id ra dec ...
    

    Where # is the commenting-character in Python. import_csv thinks that # is a column header, but it's not. The workaround I used was to define the headers manually:

    headerlist = ['id', 'ra', 'dec', ...]  
    df = pd.read_csv('data.txt', index_col=False, header=0, names=headerlist)
    

    Note that index_col is optional in regards to this problem.

    If there is any option to ignore a certain character in header line, I haven't found it. Hope this solution can be improved upon.

    0 讨论(0)
  • 2021-01-16 20:26

    Another option would be:

    f = open(fname, 'r')
    line1 = f.readline()
    data1 = pd.read_csv(f, sep='\s+', names=line1.replace(' #', '').split(), dtype=np.float)
    

    You might have a different separator though.

    0 讨论(0)
  • 2021-01-16 20:34

    I have the same problem. My first line is

    # id x y ...
    

    So pandas header keyword doesn't work. I did the following by reading it twice:

    cos_phot_header = pd.read_csv(table, delim_whitespace=True, header=None, engine='python', nrows=1)
    cos_plot_text_header = cos_phot_header.drop(0, axis=1).values.tolist()
    cos_phot_data = pd.read_csv(table, skip_blank_lines=True, comment='#', 
                   delim_whitespace=True, header=None, engine='python', names=cos_plot_text_header[0])
    

    I don't understand why there is no such option in pandas to do this, it is a very common problem that everyone encounters. You can also read the table with no lines (nrows=0) and use .columns, but honestly I think it is an equally ugly solution to the problem.

    0 讨论(0)
  • 2021-01-16 20:43

    Pandas read_csv() supports regex. You can avoid matching the white space if it is preceded by something (in your case #). Just as an example, avoiding "!":

    sep='(?<!\\!)\s+'
    

    if you want you could rename the column to remove the initial character and white space.

    cheers

    0 讨论(0)
提交回复
热议问题