PYTHON: Splitting One Column into Multiple After Deleting some rows

前端 未结 3 416
轮回少年
轮回少年 2021-01-27 05:12

I am pretty new to Python and I am trying to cleanse some data. I\'ve attached a link to the data file (Two tabs: Raw data and desired outcome). Please help!

What I am t

相关标签:
3条回答
  • 2021-01-27 05:49

    To split a column into 2 columns with pandas

    d = pd.read_csv('file.csv')

       col_1
        "val1-val2"
        "valA-valB"
    
    df = pd.DataFrame(d.col_1.str.split("-",1).tolist(),columns = ['A','B'])
    
          A     B
    0  val1  val2
    1  valA  valB
    
    0 讨论(0)
  • 2021-01-27 06:14

    Use:

    # Read the excel file with sheet_name='Raw data' and skiprows=23 which are not necessary
    data_xls = pd.read_excel("Example2.xlsx", sheet_name='Raw data', skiprows=23)
    
    # Create the dummy columns names which are similar to desired output column
    dummy_col_names = ['Internal Link Tracking (non','Campaign Name','Creative','Action','Action 2']
    # Use str.split with expand=True to create a dataframe
    dummy_df = data_xls['Internal Link Tracking (non-promotions) - ENT (c20)'].str.split('-',expand = True)
    # Rename columns as per dummy column list
    dummy_df.columns = dummy_col_names
    
    # Drop the column which is not necessary
    data_xls.drop('Internal Link Tracking (non-promotions) - ENT (c20)', axis=1, inplace=True)
    
    # Use pd.concat along axis=1 to concat both data_xls and dummy_df along columns
    data_xls = pd.concat((data_xls,dummy_df),sort=False,axis=1)
    
    # To preserve oreder similar to desired output column use the following code
    col_names = data_xls.columns.tolist()
    data_xls = data_xls[col_names[:1]+dummy_col_names+col_names[1:-5]]
    
    0 讨论(0)
  • 2021-01-27 06:15

    Try this:

    1.)Delete Row 1-23

    df = pd.read_excel('/home/mayankp/Downloads/Example2.xlsx', sheet_name=0, index_col=None, header=None, skiprows=23)
    

    2.) Split Column B into multiple columns using '-' as a delimiter and 3.)Assign Column names to the new columns

    Both these steps can be done in 1 go:

    sub_df = df[1].str.split('-', expand=True).rename(columns = lambda x: "string"+str(x+1))
    
    In [179]: sub_df
    Out[179]: 
                           string1       string2             string3      string4     string5
    1                           us      campaign            article1   scrolldown  findoutnow
    2                           us      campaign            article1  scrollright        None
    3                           us      campaign            article1   findoutnow        None
    4                           us      campaign  payablesmanagement   findoutnow        None
    

    Above is how the sample looks like after splitting on -.

    Now drop the actual column from df and insert these new columns in it:

    df = df.drop(1, axis=1)
    df = pd.concat([df,sub_df], axis=1)
    

    4.)Keep the numeric columns

    Remaining columns are already intact. No change needed for this.

    Let me know if this helps.

    0 讨论(0)
提交回复
热议问题