I have a Pandas Dataframe with a column that looks like this:
Car_Make
0 2017 Abarth 124 Spider ManualConvertible
1 2017 Abarth 124 Spider AutoConver
Here's a shot at your three questions:
1) Why does df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1]))
work?
Break it down:
df['Car_Make']
- the column with the data you want to operate on.apply()
- a pandas
DataFrame and Series method that will apply a function to either every column, or every row, in a DataFrame, or to every row in a Series.lambda x:
- the function that will be applied by the .apply()
method to every row of the Series. x
represents the record object, which in your case is the string containing the Car_Make
entries.pd.Series()
- this will convert the value inside it into a pandas
Series.x.split()
- As mentioned in point 3, x
is your string object, and split()
is a string method that, when passed with no parameters, defaults to splitting a string by its spaces and returning each split object into a list.[::-1]
- A handy list iterator that reverses a list, such as that returned by x.split()
. The syntax for list iteration is [start_index:end_index:step]
. Using a -1 step iterates through the list backwards.Put that all together, and that code is iterating through every record in df['Car_Make']
, splitting them, reversing the order of the split items, and returning the reversed list as a pandas Series object.
2) Replicating that with a defined function.
You are really close, only that the function needs to take a row/record as its argument, and needs to be called in the .apply()
method. What you want to do is replace the lambda x
, not the way it is applied.
Using what you have so far:
def f(x):
return pd.Series(x.split()[::-1])
df['Car_Make'].apply(f)
3) Is there a better way?
If you want to split a string and then reverse the order of the items, no, this is a great way. If you only want to split a certain part of a string starting from the right, then rsplit()
is a good method.