Slicing URL with Python

后端 未结 10 1110
予麋鹿
予麋鹿 2020-12-15 01:17

I am working with a huge list of URL\'s. Just a quick question I have trying to slice a part of the URL out, see below:

http://www.domainname.com/page?CONTEN         


        
相关标签:
10条回答
  • 2020-12-15 01:37

    An ancient question, but still, I'd like to remark that query string paramenters can also be separated by ';' not only '&'.

    0 讨论(0)
  • 2020-12-15 01:39

    Look at the urllib2 file name question for some discussion of this topic.

    Also see the "Python Find Question" question.

    0 讨论(0)
  • 2020-12-15 01:41

    beside urlparse there is also furl, which has IMHO better API.

    0 讨论(0)
  • 2020-12-15 01:46

    Use the urlparse module. Check this function:

    import urlparse
    
    def process_url(url, keep_params=('CONTENT_ITEM_ID=',)):
        parsed= urlparse.urlsplit(url)
        filtered_query= '&'.join(
            qry_item
            for qry_item in parsed.query.split('&')
            if qry_item.startswith(keep_params))
        return urlparse.urlunsplit(parsed[:3] + (filtered_query,) + parsed[4:])
    

    In your example:

    >>> process_url(a)
    'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
    

    This function has the added bonus that it's easier to use if you decide that you also want some more query parameters, or if the order of the parameters is not fixed, as in:

    >>> url='http://www.domainname.com/page?other_value=xx&param3&CONTENT_ITEM_ID=1234&param1'
    >>> process_url(url, ('CONTENT_ITEM_ID', 'other_value'))
    'http://www.domainname.com/page?other_value=xx&CONTENT_ITEM_ID=1234'
    
    0 讨论(0)
提交回复
热议问题