问题
I have this data set with 78 columns and 5707 rows. Almost every column has missing values and I would like to impute them with IterativeImputer. If I understood it correctly, it will make a "smarter" imputation on each column based on the information from other columns.
However, when imputing, I do not want the imputed values to be less than the observed minimum or more than the observed maximum. I realize there are max_value
and min_value
parameters, but I do not want to impose a "global" limit to the imputations, instead, I want each column to have its own max_value
and min_value
(which is the already observed maximum and minimum values). Because otherwise, the values in the columns do not make sense (negative values for headcounts, negative values for rates, etc.)
Is there a way to implement that?
回答1:
So if you want to set max and min different for each column then you can go in a loop and in each iteration select the column using sklearn.compose.make_column_selector
or sklearn.compose.make_column_transformer
and then apply iterative imputer to that column giving max and min of that column as parameter.
来源:https://stackoverflow.com/questions/60228714/max-value-and-min-value-for-each-column-in-scikit-iterativeimputer