How to extract first 8 characters from a string in pandas

前端 未结 3 1578
清酒与你
清酒与你 2020-11-30 15:01

I have column in a dataframe and i am trying to extract 8 digits from a string. How can I do it

    Input
 Shipment ID
20180504-S-20000
20180514-S-20537
201         


        
相关标签:
3条回答
  • 2020-11-30 15:47

    You are close, need indexing with str which is apply for each value of Series:

    data['Order_Date'] = data['Shipment ID'].str[:8]
    

    For better performance if no NaNs values:

    data['Order_Date'] = [x[:8] for x in data['Shipment ID']]
    

    print (data)
            Shipment ID Order_Date
    0  20180504-S-20000   20180504
    1  20180514-S-20537   20180514
    2  20180514-S-20541   20180514
    3  20180514-S-20644   20180514
    4  20180514-S-20644   20180514
    5  20180516-S-20009   20180516
    6  20180516-S-20009   20180516
    7  20180516-S-20009   20180516
    8  20180516-S-20009   20180516
    

    If omit str code filter column by position, first N values like:

    print (data['Shipment ID'][:2])
    0    20180504-S-20000
    1    20180514-S-20537
    Name: Shipment ID, dtype: object
    
    0 讨论(0)
  • 2020-11-30 15:50

    You can also use str.extract

    Ex:

    import pandas as pd
    
    df = pd.DataFrame({'Shipment ID': ['20180504-S-20000', '20180514-S-20537', '20180514-S-20541', '20180514-S-20644', '20180514-S-20644', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009']})
    df["Order_Date"] = df["Shipment ID"].str.extract(r"(\d{8})")
    print(df)
    

    Output:

           Shipment ID Order_Date
    0  20180504-S-20000   20180504
    1  20180514-S-20537   20180514
    2  20180514-S-20541   20180514
    3  20180514-S-20644   20180514
    4  20180514-S-20644   20180514
    5  20180516-S-20009   20180516
    6  20180516-S-20009   20180516
    7  20180516-S-20009   20180516
    8  20180516-S-20009   20180516
    
    0 讨论(0)
  • 2020-11-30 15:50

    You can also decide to delete from -S to the end

    df["Order_Date"]=df['Shipment ID'].replace(regex=r"\-.*",value="")
    df
            Shipment ID Order_Date
    0  20180504-S-20000   20180504
    1  20180514-S-20537   20180514
    2  20180514-S-20541   20180514
    3  20180514-S-20644   20180514
    4  20180514-S-20644   20180514
    5  20180516-S-20009   20180516
    6  20180516-S-20009   20180516
    7  20180516-S-20009   20180516
    8  20180516-S-20009   20180516
    

    Also you can capture the first 8 digits then delete everything and replace back with a backreference of the captured group:

    df['Shipment ID'].replace(regex=r"(\d{8}).*",value="\\1")
    
    0 讨论(0)
提交回复
热议问题