Can sub-columns be created in a pandas data frame?

后端 未结 3 1372
梦毁少年i
梦毁少年i 2021-02-06 17:43

Data frame

I am working with a data frame in Jupyter Notebooks and I am having some difficulty with it. The data frame consists of locations and these are represented by

3条回答
  •  抹茶落季
    2021-02-06 17:45

    I think need here MultiIndex created by MultiIndex.from_product:

    mux = pd.MultiIndex.from_product([['Start','Intermediary','End'], ['lat','lng']])
    df = pd.DataFrame(data, columns=mux)
    

    EDIT:

    Setup:

    temp=u"""                          start                                   intermediary                           end
    ('54.957055',' -7.740156')        ('54.956915136264', ' -7.753690062122')     ('54.957055','-7.740156')
    ('54.8913208', '-7.5740475')    ('54.864402885577', '-7.653445692445'),('54','0')   ('54.8913208','-7.5740475')
    ('55.2375819', '-7.2357427')     ('55.253936739337', '-7.259624609577'), ('54','2'),('54','1')   ('55.2375819','-7.2357427')
    ('54.5298806', '-8.1350247')    ('54.504374314741', '-8.188334960168')      ('54.5298806','-8.1350247')
    ('54.2810187',  ' -7.896937')   ('54.303836850038', '-8.180136033695'), ('54','3')       ('54.2810187','-7.896937')
    
    """
    #after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
    df = pd.read_csv(pd.compat.StringIO(temp), sep="\s{3,}")
    

    print (df)
                               start  \
    0     ('54.957055',' -7.740156')   
    1   ('54.8913208', '-7.5740475')   
    2   ('55.2375819', '-7.2357427')   
    3   ('54.5298806', '-8.1350247')   
    4  ('54.2810187',  ' -7.896937')   
    
                                            intermediary  \
    0            ('54.956915136264', ' -7.753690062122')   
    1  ('54.864402885577', '-7.653445692445'),('54','0')   
    2  ('55.253936739337', '-7.259624609577'), ('54',...   
    3             ('54.504374314741', '-8.188334960168')   
    4  ('54.303836850038', '-8.180136033695'), ('54',...   
    
                               end  
    0    ('54.957055','-7.740156')  
    1  ('54.8913208','-7.5740475')  
    2  ('55.2375819','-7.2357427')  
    3  ('54.5298806','-8.1350247')  
    4   ('54.2810187','-7.896937') 
    

    import ast
    
    #convert string values to tuples
    df = df.applymap(lambda x: ast.literal_eval(x))
    #convert onpy pairs values to nested lists
    df['intermediary'] = df['intermediary'].apply(lambda x: list(x) if isinstance(x[1], tuple) else [x])
    
    #DataFrame by first Start column
    df1 = pd.DataFrame(df['start'].values.tolist(), columns=['lat','lng'])
    
    #DataFrame by intermediary column with reshape for 2 columns df
    df2 = (pd.concat([pd.DataFrame(x, columns=['lat','lng']) for x in df['intermediary']], keys=df.index)
           .reset_index(level=1, drop=True)
           .add_prefix('intermediary_'))
    print (df2)
    
    #join all DataFrames together
    df3 = df1.add_prefix('start_').join(df2).join(df1.add_prefix('end_'))
    
    #create MultiIndex by split
    df3.columns = df3.columns.str.split('_', expand=True)
    

    print (df3)
    
            start                 intermediary                           end  \
              lat         lng              lat               lng         lat   
    0   54.957055   -7.740156  54.956915136264   -7.753690062122   54.957055   
    1  54.8913208  -7.5740475  54.864402885577   -7.653445692445  54.8913208   
    1  54.8913208  -7.5740475               54                 0  54.8913208   
    2  55.2375819  -7.2357427  55.253936739337   -7.259624609577  55.2375819   
    2  55.2375819  -7.2357427               54                 2  55.2375819   
    2  55.2375819  -7.2357427               54                 1  55.2375819   
    3  54.5298806  -8.1350247  54.504374314741   -8.188334960168  54.5298806   
    4  54.2810187   -7.896937  54.303836850038   -8.180136033695  54.2810187   
    4  54.2810187   -7.896937               54                 3  54.2810187   
    
    
              lng  
    0   -7.740156  
    1  -7.5740475  
    1  -7.5740475  
    2  -7.2357427  
    2  -7.2357427  
    2  -7.2357427  
    3  -8.1350247  
    4   -7.896937  
    4   -7.896937  
    

提交回复
热议问题