How to get Text from b'Text' in the pandas object type after using read_sas?

前端 未结 3 471
死守一世寂寞
死守一世寂寞 2021-02-04 08:21

I\'m trying to read the data from .sas7bdat format of SAS using pandas function read_sas:

import pandas as pd
df = pd.read_sas(\'D:/input/houses.sas7bdat\', form         


        
相关标签:
3条回答
  • 2021-02-04 08:54

    add this encoding="utf-8"

    so the line would be as follows:

    df = pd.read_sas('D:/input/houses.sas7bdat', format = 'sas7bdat', encoding="utf-8")
    
    0 讨论(0)
  • 2021-02-04 09:07

    First, figure out your sas dataset encoding. In SAS, run proc contents on the dataset. Check the "Encoding". In my case, my encoding was "latin1 Western (ISO)". Then enter your encoding as such:

    df = pd.read_sas('filename', format = 'sas7bdat', encoding = 'latin-1')
    
    0 讨论(0)
  • 2021-02-04 09:10

    The encoding argument in pd.read_sas() leads me to have very large dataframes which lead me to have memory related errors.

    An other way to deal with the problem would be to convert the byte strings to an other encoding (e.g. utf8).

    Example:

    Example dataframe:

    
    df = pd.DataFrame({"A": [1, 2, 3], 
                       "B": [b"a", b"b", b"c"], 
                       "C": ["a", "b", "c"]})
    

    Transform byte strings to strings:

    for col in df:
        if isinstance(df[col][0], bytes):
            print(col, "will be transformed from bytestring to string")
            df[col] = df[col].str.decode("utf8")  # or any other encoding
    print(df)
    

    output:

       A  B  C
    0  1  a  a
    1  2  b  b
    2  3  c  c
    

    Useful links:

    1. Pandas Series.str.decode() page of GeeksforGeeks (where I found my solution)

    2. What is the difference between a string and a byte string?

    0 讨论(0)
提交回复
热议问题