ValueError: Length of values does not match length of index | Pandas DataFrame.unique()

后端 未结 1 1005
余生分开走
余生分开走 2020-12-02 14:27

I am trying to get a new dataset, or change the value of the current dataset columns to their unique values. Here is an example of what I am trying to get :



        
相关标签:
1条回答
  • 2020-12-02 14:42

    The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:

    A data frame of four rows:

    df = pd.DataFrame({'A': [1,2,3,4]})
    

    Now trying to assign a list/array of two elements to it:

    df['B'] = [3,4]   # or df['B'] = np.array([3,4])
    

    Both errors out:

    ValueError: Length of values does not match length of index

    Because the data frame has four rows but the list and array has only two elements.

    Work around Solution (use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:

    df['B'] = pd.Series([3,4])
    
    df
    #   A     B
    #0  1   3.0
    #1  2   4.0
    #2  3   NaN          # NaN because the value at index 2 and 3 doesn't exist in the Series
    #3  4   NaN
    

    For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:

    df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))
    
    #   A     B
    #0  1   1.0
    #1  2   5.0
    #2  7   9.0
    #3  8   NaN
    
    0 讨论(0)
提交回复
热议问题