Remove Part of String Before the Last Forward Slash

后端 未结 6 1966
终归单人心
终归单人心 2021-02-05 19:04

The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.

So, if the first elemen

相关标签:
6条回答
  • 2021-02-05 19:39

    That doesn't need regex.

    import os
    
    for link in link_list:
        file_names.append(os.path.basename(link))
    
    0 讨论(0)
  • 2021-02-05 19:40

    Have a look at str.rsplit.

    >>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
    >>> s.rsplit('/',1)
    ['https://docs.python.org/3.4/tutorial', 'interpreter.html']
    >>> s.rsplit('/',1)[1]
    'interpreter.html'
    

    And to use RegEx

    >>> re.search(r'(.*)/(.*)',s).group(2)
    'interpreter.html'
    

    Then match the 2nd group which lies between the last / and the end of String. This is a greedy usage of the greedy technique in RegEx.

    Regular expression visualization

    Debuggex Demo

    Small Note - The problem with link.rpartition('//')[-1] in your code is that you are trying to match // and not /. So remove the extra / as in link.rpartition('/')[-1].

    0 讨论(0)
  • 2021-02-05 19:48

    You can use rpartition():

    >>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
    >>> s.rpartition('/')
    ('https://docs.python.org/3.4/tutorial', '/', 'interpreter.html')
    

    And take the last part of the 3 element tuple that is returned:

    >>> s.rpartition('/')[2]
    'interpreter.html'
    
    0 讨论(0)
  • 2021-02-05 19:50

    This should work if you plan to use regex

     for link in link_list:
        file_names.append(link.replace('.*/',''))
     print(file_names)
    
    0 讨论(0)
  • 2021-02-05 19:52

    Just use string.split:

    url = "/some/url/with/a/file.html"
    
    print url.split("/")[-1]
    
    # Result should be "file.html"
    

    split gives you an array of strings that were separated by "/". The [-1] gives you the last element in the array, which is what you want.

    0 讨论(0)
  • 2021-02-05 20:00

    Here's a more general, regex way of doing this:

        re.sub(r'^.+/([^/]+)$', r'\1', "http://test.org/3/files/interpreter.html")
        'interpreter.html'
    
    0 讨论(0)
提交回复
热议问题