Access specifics of ValueError in pandas.read_excel() converters

社会主义新天地 提交于 2019-12-08 12:52:44

问题


I'm using the following to ensure a dataframe column has the correct data type before I proceed with operations:

>>> cfun = lambda x: float(x)
>>> df = pd.read_excel(xl, converters={'column1': cfun})

Using converters instead of dtype so that the traceback will tell me explicitly what value caused the issue:

ValueError: could not convert string to float: '100%'

What I would like to do is take that information (that the string "100%" was the problem) and tell the user where it occurred in the dataframe/file. How can I get that information from the exception in order to get a row index and, say, print the entire row?

Note: Adding the percent sign isn't the only mistake my users make, otherwise I'd just replace any '%' with ''.


回答1:


I think you can check by first reading in the csv, and then checking which rows wouldn't convert. This finds them all at once, instead of one by one with the ValueError.

Just remember, python begins numbering at 0 and wont include the header so the row indices of the df will be off from those in the csv (by 1 or 2).

import pandas as pd
df = pd.read_excel(xl)

# Example df
   column1 column2
0      100       A
1     100%       B
2  112,312       C
3      171       D
4  123.123       E
5      NaN       F

df['column1_num'] = pd.to_numeric(df.column1, errors='coerce')
bad_mask = (df.column1_num.isnull()) & ~(df.column1.astype('str').str.lower().isin(['nan']))

bad_rows = df[bad_mask].index.values
#array([1, 2], dtype=int64)

df[bad_mask]
#   column1 column2  column1_num
#1     100%       B          NaN
#2  112,312       C          NaN

I updated the mask because float is able to handle the 'NaN' string, so it wont actually show up as an issue in your read, though pd.to_numeric still coerces it to NaN.

float('NaN')
#nan
pd.to_numeric('NaN')
#ValueError: Unable to parse string "NaN" at position 0


来源:https://stackoverflow.com/questions/49902930/access-specifics-of-valueerror-in-pandas-read-excel-converters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!