format phone number in csv using pandas

后端 未结 2 792
心在旅途
心在旅途 2021-01-21 11:20

Python/pandas n00b. I have code that is processing event data stored in csv files. Data from df[\"CONTACT PHONE NUMBER\"] is outputting the phone number as `5555551

2条回答
  •  南笙
    南笙 (楼主)
    2021-01-21 11:50

    I think the problem is that the phone numbers are stored as float64, so, adding a few things will fix your inner loop:

    In [75]:
    
    df['Phone_no']
    Out[75]:
    0    5554443333
    1    1114445555
    Name: Phone_no, dtype: float64
    In [76]:
    
    for phone_no in df['Phone_no']:
        contactphone = "(%c%c%c)%c%c%c-%c%c%c%c" % tuple(map(ord,list(str(phone_no)[:10])))
        print contactphone
    (555)444-3333
    (111)444-5555
    

    However, I think it is easier just to have the phone numbers as string (@Andy_Hayden made a good point on missing values, so I made up the following dataset:)

    In [121]:
    
    print df
         Phone_no   Name
    0  5554443333   John
    1  1114445555   Jane
    2         NaN  Betty
    
    [3 rows x 2 columns]
    In [122]:
    
    df.dtypes
    Out[122]:
    Phone_no    float64
    Name         object
    dtype: object
    #In [123]: You don't need to convert the entire DataFrame, only the 'Phone_no' needs to be converted.
    #
    #df=df.astype('S4')
    In [124]:
    
    df['PhoneNumber']=df['Phone_no'].astype(str).apply(lambda x: '('+x[:3]+')'+x[3:6]+'-'+x[6:10])
    In [125]:
    
    print df
           Phone_no   Name    PhoneNumber
    0  5554443333.0   John  (555)444-3333
    1  1114445555.0   Jane  (111)444-5555
    2           NaN  Betty         (nan)-
    
    [3 rows x 3 columns]
    

    In [134]:
    import numpy as np
    df['PhoneNumber']=df['Phone_no'].astype(str).apply(lambda x: np.where((len(x)>=10)&set(list(x)).issubset(list('.0123456789')),
                                                                          '('+x[:3]+')'+x[3:6]+'-'+x[6:10],
                                                                          'Phone number not in record'))
    In [135]:
    
    print df
         Phone_no   Name                 PhoneNumber
    0  5554443333   John               (555)444-3333
    1  1114445555   Jane               (111)444-5555
    2         NaN  Betty  Phone number not in record
    
    [3 rows x 3 columns]
    

提交回复
热议问题