I am working with a huge list of URL\'s. Just a quick question I have trying to slice a part of the URL out, see below:
http://www.domainname.com/page?CONTEN
An ancient question, but still, I'd like to remark that query string paramenters can also be separated by ';' not only '&'.
Look at the urllib2 file name question for some discussion of this topic.
Also see the "Python Find Question" question.
beside urlparse there is also furl, which has IMHO better API.
Use the urlparse module. Check this function:
import urlparse
def process_url(url, keep_params=('CONTENT_ITEM_ID=',)):
parsed= urlparse.urlsplit(url)
filtered_query= '&'.join(
qry_item
for qry_item in parsed.query.split('&')
if qry_item.startswith(keep_params))
return urlparse.urlunsplit(parsed[:3] + (filtered_query,) + parsed[4:])
In your example:
>>> process_url(a)
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
This function has the added bonus that it's easier to use if you decide that you also want some more query parameters, or if the order of the parameters is not fixed, as in:
>>> url='http://www.domainname.com/page?other_value=xx¶m3&CONTENT_ITEM_ID=1234¶m1'
>>> process_url(url, ('CONTENT_ITEM_ID', 'other_value'))
'http://www.domainname.com/page?other_value=xx&CONTENT_ITEM_ID=1234'