pandas.read_csv from string or package data

后端 未结 2 1100
旧巷少年郎
旧巷少年郎 2020-12-09 14:50

I have some csv text data in a package which I want to read using read_csv. I was doing this by

from pkgutil import get_data
from StringIO import StringIO

         


        
相关标签:
2条回答
  • 2020-12-09 15:27

    The following worked for me in 3.3:

    >>> import numpy as np, pandas as pd
    >>> import io, pkgutil
    >>> wells = pkgutil.get_data('pymc.examples', 'data/wells.dat')
    >>> type(wells)
    <class 'bytes'>
    >>> df = pd.read_csv(io.BytesIO(wells), encoding='utf8', sep=" ", index_col="id", dtype={"switch": np.int8})
    >>> df.head()
        switch  arsenic       dist  assoc  educ
    id                                         
    1        1     2.36  16.826000      0     0
    2        1     0.71  47.321999      0     0
    3        0     2.07  20.966999      0    10
    4        1     1.15  21.486000      0    12
    5        1     1.10  40.874001      1    14
    
    [5 rows x 5 columns]
    

    N.B. I had to manually put wells.dat in that location, so I can't swear I copied it correctly and that there isn't terminal whitespace, because I deleted some. But passing read_csv a BytesIO object and an encoding parameter should work. (Actually, you can probably get away without it, but it's a good habit. io.TextIOWrapper might be another option.)

    0 讨论(0)
  • 2020-12-09 15:27

    To pass a string to pandas read_csv(), you can use io.StringIO, i.e.:

    import pandas as pd
    from io import StringIO
    df = pd.read_csv(StringIO("csv string..."))
    
    0 讨论(0)
提交回复
热议问题