Split column into separate columns based on separator strings

后端 未结 2 2012
野性不改
野性不改 2021-01-28 19:40

For example we have a csv file with

name age address john 25 koramangala banglore #@ sales maneger

相关标签:
2条回答
  • 2021-01-28 20:01

    Passing regular expression in the sep of read_csv

    import io
    t = """name ,age , address
    john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
    harshuth rao ,36,belandur banglore +ACMAQA-  maneger +ACUAJA- india 
    vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
    suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
    mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india"""
    
    df = pd.read_csv(io.StringIO(t), 
                     sep='\s*\+ACMAQA-\s*|\s*\+ACUAJA-\s*|\s*,\s*', engine='python')
    df = df.reset_index()
    df.columns = ["name", "age", "city", "position", "country"]
    
    
        name          age                   city    position      country
    0   john           25   koramangala banglore    sales maneger   india
    1   harshuth rao   36   belandur banglore       maneger         india
    2   vijay kumar    45   ulsoor banglore sales   maneger         india
    3   suhas          25   koramangala banglore    analist         india
    4   mithun         22   venkatapura banglore    execitive       india
    
    0 讨论(0)
  • 2021-01-28 20:12

    First, load your data using pd.read_csv:

    import pandas as pd
    
    df = pd.read_csv("/home/vipul/Desktop/example.csv", sep=',')
    

    print(df)
               name   age                                             address
    0           john    25  koramangala banglore +ACMAQA- sales maneger +A...
    1  harshuth rao     36  belandur banglore +ACMAQA-  maneger +ACUAJA- i...
    2    vijay kumar    45  ulsoor banglore +ACMAQA- sales maneger +ACUAJA...
    3          suhas    25  koramangala banglore +ACMAQA-analist +ACUAJA- ...
    4         mithun    22  venkatapura banglore +ACMAQA- execitive +ACUAJ...
    

    Next, use str.split to separate the data + pd.concat to join with the original:

    v = df.pop('address').str.split('\s*\+.*?-\s*', expand=True)
    v.columns = ['city', 'position', 'country']
    
    df = pd.concat([df, v], 1)
    

    print(df)
               name   age                   city       position country
    0           john    25  koramangala banglore  sales maneger   india
    1  harshuth rao     36     belandur banglore        maneger  india 
    2    vijay kumar    45       ulsoor banglore  sales maneger   india
    3          suhas    25  koramangala banglore        analist   india
    4         mithun    22  venkatapura banglore      execitive   india
    

    Finally, save to CSV:

    df.to_csv("/home/vipul/Desktop/new.csv")
    
    0 讨论(0)
提交回复
热议问题