Pandas Split Function in Reverse

后端 未结 3 762
遥遥无期
遥遥无期 2021-01-15 08:46

I have a Pandas Dataframe with a column that looks like this:

    Car_Make
0   2017 Abarth 124 Spider ManualConvertible
1   2017 Abarth 124 Spider AutoConver         


        
相关标签:
3条回答
  • 2021-01-15 09:12

    Is this what you are looking for:

    df.car_make.str.rsplit(' ', 1, expand=True)
    # returns:
                            0                  1
    0  2017 Abarth 124 Spider  ManualConvertible
    1  2017 Abarth 124 Spider    AutoConvertible
    2  2017 Abarth 124 Spider  ManualConvertible
    3  2017 Abarth 124 Spider    AutoConvertible
    4         2017 Abarth 595        ManualHatch
    5         2017 Abarth 595          AutoHatch
    
    0 讨论(0)
  • 2021-01-15 09:28

    The code you're asking here:

    df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1]))
    

    There are several things going on here:

    1.) First, lambda are basically impromptu functions. In this case, it's an unnamed function taking the argument x, and returns pd.Series(x.split()[::-1]. More on x later.

    2.) pd.Series(...) as you know creates a pandas Series object much like your original data.

    3.) x.split() is splitting the string x with space as a separator by default.

    4.) The [::-1] bit is a slice.. Much like range(), it takes 3 params, [start: end: steps]. In this case, it's saying to get the string from start to end, but use -1 as steps, i.e. in reverse. Note that only the end param is mandatory.

    5.) The main function here is apply() on your df['Car_Make'] series, which is essentially a list of strings. apply() takes a function (much like map()) and apply it to the df['Car_Make'] series. In this case, it's applying the lambda, which takes the data of your series and use it as argument x for the function.

    6.) Putting everything back together. The statement is:

    • passing the df['Car_Make'] string data as x to the lambda
    • lambda then process the x.split() to split the string data into list.
    • The list is then sorted in reverse order by the slice [::-1].
    • pd.Series() now convert the list into a Series object.
    • The Series object is then returned by lambda to your apply() function.
    • The apply() function then return the resulting Series object, which conveniently, is the reverse sorted string you wanted in a Series.

    If all you care about is the very last split though, you really don't need to do the reverse split and all that. You could easily have done the following and it would have returned the very last item in the split right away:

    data['Car Make'].apply(lambda x: pd.Series({'Car_Make':x.split()[-1]}))

                Car_Make
    0  ManualConvertible
    1    AutoConvertible
    2  ManualConvertible
    3    AutoConvertible
    4        ManualHatch
    5          AutoHatch
    

    Thank you for asking this question, I learned a few stuff about pandas during this answer as well.

    0 讨论(0)
  • 2021-01-15 09:33

    Here's a shot at your three questions:

    1) Why does df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1])) work?

    Break it down:

    1. df['Car_Make'] - the column with the data you want to operate on
    2. .apply() - a pandas DataFrame and Series method that will apply a function to either every column, or every row, in a DataFrame, or to every row in a Series.
    3. lambda x: - the function that will be applied by the .apply() method to every row of the Series. x represents the record object, which in your case is the string containing the Car_Make entries.
    4. pd.Series() - this will convert the value inside it into a pandas Series.
    5. x.split() - As mentioned in point 3, x is your string object, and split() is a string method that, when passed with no parameters, defaults to splitting a string by its spaces and returning each split object into a list.
    6. [::-1] - A handy list iterator that reverses a list, such as that returned by x.split(). The syntax for list iteration is [start_index:end_index:step]. Using a -1 step iterates through the list backwards.

    Put that all together, and that code is iterating through every record in df['Car_Make'], splitting them, reversing the order of the split items, and returning the reversed list as a pandas Series object.

    2) Replicating that with a defined function.

    You are really close, only that the function needs to take a row/record as its argument, and needs to be called in the .apply() method. What you want to do is replace the lambda x, not the way it is applied.

    Using what you have so far:

    def f(x):
        return pd.Series(x.split()[::-1])
    
    df['Car_Make'].apply(f)
    

    3) Is there a better way?

    If you want to split a string and then reverse the order of the items, no, this is a great way. If you only want to split a certain part of a string starting from the right, then rsplit() is a good method.

    0 讨论(0)
提交回复
热议问题