Slicing URL with Python

后端 未结 10 1109
予麋鹿
予麋鹿 2020-12-15 01:17

I am working with a huge list of URL\'s. Just a quick question I have trying to slice a part of the URL out, see below:

http://www.domainname.com/page?CONTEN         


        
相关标签:
10条回答
  • 2020-12-15 01:27
    import re
    url = 'http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3'
    m = re.search('(.*?)&', url)
    print m.group(1)
    
    0 讨论(0)
  • 2020-12-15 01:31

    I figured it out below is what I needed to do:

    url = "http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3"
    url = url[: url.find("&")]
    print url
    'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
    
    0 讨论(0)
  • 2020-12-15 01:32

    This method isn't dependent on the position of the parameter within the url string. This could be refined, I'm sure, but it gets the point across.

    url = 'http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3'
    parts = url.split('?')
    id = dict(i.split('=') for i in parts[1].split('&'))['CONTENT_ITEM_ID']
    new_url = parts[0] + '?CONTENT_ITEM_ID=' + id
    
    0 讨论(0)
  • 2020-12-15 01:33

    Another option would be to use the split function, with & as a parameter. That way, you'd extract both the base url and both parameters.

       url.split("&") 
    

    returns a list with

      ['http://www.domainname.com/page?CONTENT_ITEM_ID=1234', 'param2', 'param3']
    
    0 讨论(0)
  • 2020-12-15 01:35

    The quick and dirty solution is this:

    >>> "http://something.com/page?CONTENT_ITEM_ID=1234&param3".split("&")[0]
    'http://something.com/page?CONTENT_ITEM_ID=1234'
    
    0 讨论(0)
  • 2020-12-15 01:37

    Parsin URL is never as simple I it seems to be, that's why there are the urlparse and urllib modules.

    E.G :

    import urllib
    url ="http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3"
    query = urllib.splitquery(url)
    result = "?".join((query[0], query[1].split("&")[0]))
    print result
    'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
    

    This is still not 100 % reliable, but much more than splitting it yourself because there are a lot of valid url format that you and me don't know and discover one day in error logs.

    0 讨论(0)
提交回复
热议问题