I\'ve a array of data in Pandas and I\'m trying to print second character of every string in col1. I can\'t figure out how to do it. I can easily print the second character
As of Pandas 0.23.0, if your data is clean, you will find Pandas "vectorised" string methods via pd.Series.str
will generally underperform simple iteration via a list comprehension or use of map
.
For example:
from operator import itemgetter
df = pd.DataFrame(['foo', 'bar', 'baz'], columns=['col1'])
df = pd.concat([df]*100000, ignore_index=True)
%timeit pd.Series([i[1] for i in df['col1']]) # 33.7 ms
%timeit pd.Series(list(map(itemgetter(1), df['col1']))) # 42.2 ms
%timeit df['col1'].str[1] # 214 ms
A special case is when you have a large number of repeated strings, in which case you can benefit from converting your series to a categorical:
df['col1'] = df['col1'].astype('category')
%timeit df['col1'].str[1] # 4.9 ms
You can use str to access the string methods for the column/Series and then slice the strings as normal:
>>> df = pd.DataFrame(['foo', 'bar', 'baz'], columns=['col1'])
>>> df
col1
0 foo
1 bar
2 baz
>>> df.col1.str[1]
0 o
1 a
2 a
This str
attribute also gives you access variety of very useful vectorised string methods, many of which are instantly recognisable from Python's own assortment of built-in string methods (split
, replace
, etc.).