Specify list of possible values for Pandas get_dummies

后端 未结 4 1299
半阙折子戏
半阙折子戏 2021-02-14 17:36

Suppose I have a Pandas DataFrame like the below and I\'m encoding categorical_1 for training in scikit-learn:

data = {\'numeric_1\':[12.1, 3.2, 5.5, 6.8, 9.9],          


        
4条回答
  •  [愿得一人]
    2021-02-14 17:55

    Isn't this a better answer?

    data = pd.DataFrame({
        "values": [1, 2, 3, 4, 5, 6, 7],
        "categories": ["A", "A", "B", "B", "C", "C", "D"]
    })
    
    possibilites = ["A", "B", "C", "D", "E", "F"]
    
    exists = data["categories"].tolist()
    
    difference = pd.Series([item for item in possibilites if item not in exists])
    
    target = data["categories"].append(pd.Series(difference))
    
    target = target.reset_index(drop=True)
    
    dummies = pd.get_dummies(
        target
    )
    
    dummies = dummies.drop(dummies.index[list(range(len(dummies)-len(difference), len(dummies)))])
    

提交回复
热议问题