Python/pandas n00b. I have code that is processing event data stored in csv files. Data from df[\"CONTACT PHONE NUMBER\"]
is outputting the phone number as `5555551
I think the problem is that the phone numbers are stored as float64
, so, adding a few things will fix your inner loop:
In [75]:
df['Phone_no']
Out[75]:
0 5554443333
1 1114445555
Name: Phone_no, dtype: float64
In [76]:
for phone_no in df['Phone_no']:
contactphone = "(%c%c%c)%c%c%c-%c%c%c%c" % tuple(map(ord,list(str(phone_no)[:10])))
print contactphone
(555)444-3333
(111)444-5555
However, I think it is easier just to have the phone numbers as string
(@Andy_Hayden made a good point on missing values, so I made up the following dataset:)
In [121]:
print df
Phone_no Name
0 5554443333 John
1 1114445555 Jane
2 NaN Betty
[3 rows x 2 columns]
In [122]:
df.dtypes
Out[122]:
Phone_no float64
Name object
dtype: object
#In [123]: You don't need to convert the entire DataFrame, only the 'Phone_no' needs to be converted.
#
#df=df.astype('S4')
In [124]:
df['PhoneNumber']=df['Phone_no'].astype(str).apply(lambda x: '('+x[:3]+')'+x[3:6]+'-'+x[6:10])
In [125]:
print df
Phone_no Name PhoneNumber
0 5554443333.0 John (555)444-3333
1 1114445555.0 Jane (111)444-5555
2 NaN Betty (nan)-
[3 rows x 3 columns]
In [134]:
import numpy as np
df['PhoneNumber']=df['Phone_no'].astype(str).apply(lambda x: np.where((len(x)>=10)&set(list(x)).issubset(list('.0123456789')),
'('+x[:3]+')'+x[3:6]+'-'+x[6:10],
'Phone number not in record'))
In [135]:
print df
Phone_no Name PhoneNumber
0 5554443333 John (555)444-3333
1 1114445555 Jane (111)444-5555
2 NaN Betty Phone number not in record
[3 rows x 3 columns]