Slice/split string Series at various positions

前端 未结 3 1633
借酒劲吻你
借酒劲吻你 2021-01-18 21:48

I\'m looking to split a string Series at different points depending on the length of certain substrings:

In [47]: df = pd.DataFrame([\'group9class1\', \'grou         


        
相关标签:
3条回答
  • 2021-01-18 22:27

    You can also use zip together with a list comprehension.

    df['group'], df['class'] = zip(
        *[(string[:n], string[n:]) 
          for string, n in zip(df.group_class, split_locations)])
    
    >>> df
          group_class    group    class
    0    group9class1   group9   class1
    1   group10class2  group10   class2
    2  group11class20  group11  class20
    
    0 讨论(0)
  • 2021-01-18 22:33

    This works, by using double [[]] you can access the index value of the current element so you can index into the split_locations series:

    In [119]:    
    df[['group_class']].apply(lambda x: pd.Series([x.str[split_locations[x.name]:][0], x.str[:split_locations[x.name]][0]]), axis=1)
    Out[119]:
             0        1
    0   class1   group9
    1   class2  group10
    2  class20  group11
    

    Or as @ajcr has suggested you can extract:

    In [106]:
    
    df['group_class'].str.extract(r'(?P<group>group[0-9]+)(?P<class>class[0-9]+)')
    Out[106]:
         group    class
    0   group9   class1
    1  group10   class2
    2  group11  class20
    

    EDIT

    Regex explanation:

    the regex came from @ajcr (thanks!), this uses str.extract to extract groups, the groups become new columns.

    So ?P<group> here identifies an id for a specific group to look for, if this is missing then an int will be returned for the column name.

    so the rest should be self-explanatory: group[0-9] looks for the string group followed by the digits in range [0-9] which is what the [] indicate, this is equivalent to group\d where \d means digit.

    So it could be re-written as:

    df['group_class'].str.extract(r'(?P<group>group\d+)(?P<class>class\d+)')
    
    0 讨论(0)
  • 2021-01-18 22:40

    Use a regular expression to split the string

     import re
    
     regex = re.compile("(class)")
     str="group1class23"
     # this will split the group and the class string by adding a space between them, and using a simple split on space.
     split_string = re.sub(regex, " \\1", str).split(" ")
    

    This will return the array:

     ['group9', 'class23']
    

    So to append two new columns to your DataFrame you can do:

    new_cols = [re.sub(regex, " \\1", x).split(" ") for x in df.group_class]
    df['group'], df['class'] = zip(*new_cols)
    

    Which results in:

          group_class    group    class
    0    group9class1   group9   class1
    1   group10class2  group10   class2
    2  group11class20  group11  class20
    
    0 讨论(0)
提交回复
热议问题