I have a Pandas Dataframe with a column that looks like this:
Car_Make
0 2017 Abarth 124 Spider ManualConvertible
1 2017 Abarth 124 Spider AutoConver
Is this what you are looking for:
df.car_make.str.rsplit(' ', 1, expand=True)
# returns:
0 1
0 2017 Abarth 124 Spider ManualConvertible
1 2017 Abarth 124 Spider AutoConvertible
2 2017 Abarth 124 Spider ManualConvertible
3 2017 Abarth 124 Spider AutoConvertible
4 2017 Abarth 595 ManualHatch
5 2017 Abarth 595 AutoHatch
The code you're asking here:
df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1]))
There are several things going on here:
1.) First, lambda are basically impromptu functions. In this case, it's an unnamed function taking the argument x
, and returns pd.Series(x.split()[::-1]
. More on x
later.
2.) pd.Series(...)
as you know creates a pandas Series object much like your original data.
3.) x.split()
is splitting the string x
with space as a separator by default.
4.) The [::-1]
bit is a slice.. Much like range()
, it takes 3 params, [start: end: steps]
. In this case, it's saying to get the string from start to end, but use -1
as steps, i.e. in reverse. Note that only the end
param is mandatory.
5.) The main function here is apply()
on your df['Car_Make']
series, which is essentially a list of strings. apply()
takes a function (much like map()
) and apply it to the df['Car_Make']
series. In this case, it's applying the lambda, which takes the data of your series and use it as argument x
for the function.
6.) Putting everything back together. The statement is:
df['Car_Make']
string data as x
to the lambda
lambda
then process the x.split()
to split the string data into list. [::-1]
. pd.Series()
now convert the list into a Series
object. Series
object is then returned by lambda to your apply()
function. apply()
function then return the resulting Series
object, which conveniently, is the reverse sorted string you wanted in a Series.If all you care about is the very last split though, you really don't need to do the reverse split and all that. You could easily have done the following and it would have returned the very last item in the split right away:
data['Car Make'].apply(lambda x: pd.Series({'Car_Make':x.split()[-1]}))
Car_Make
0 ManualConvertible
1 AutoConvertible
2 ManualConvertible
3 AutoConvertible
4 ManualHatch
5 AutoHatch
Thank you for asking this question, I learned a few stuff about pandas
during this answer as well.
Here's a shot at your three questions:
1) Why does df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1]))
work?
Break it down:
df['Car_Make']
- the column with the data you want to operate on.apply()
- a pandas
DataFrame and Series method that will apply a function to either every column, or every row, in a DataFrame, or to every row in a Series.lambda x:
- the function that will be applied by the .apply()
method to every row of the Series. x
represents the record object, which in your case is the string containing the Car_Make
entries.pd.Series()
- this will convert the value inside it into a pandas
Series.x.split()
- As mentioned in point 3, x
is your string object, and split()
is a string method that, when passed with no parameters, defaults to splitting a string by its spaces and returning each split object into a list.[::-1]
- A handy list iterator that reverses a list, such as that returned by x.split()
. The syntax for list iteration is [start_index:end_index:step]
. Using a -1 step iterates through the list backwards.Put that all together, and that code is iterating through every record in df['Car_Make']
, splitting them, reversing the order of the split items, and returning the reversed list as a pandas Series object.
2) Replicating that with a defined function.
You are really close, only that the function needs to take a row/record as its argument, and needs to be called in the .apply()
method. What you want to do is replace the lambda x
, not the way it is applied.
Using what you have so far:
def f(x):
return pd.Series(x.split()[::-1])
df['Car_Make'].apply(f)
3) Is there a better way?
If you want to split a string and then reverse the order of the items, no, this is a great way. If you only want to split a certain part of a string starting from the right, then rsplit()
is a good method.