How do you parse and match the keyword in search engine url using python re module?

 ̄綄美尐妖づ 提交于 2019-12-12 05:29:42

问题


Example from Google:

http://www.google.com.co/url?sa=t&rct=j&q=pedro%20gomez%20proyecto%20en%20la%20ciudad%20de%20valledupar&source=web&cd=10&ved=0CFsQFjAJ&url=http%3A%2F%2Fwww.21molino.com%2F1410%2F8911.html

or from Bing search:

http://www.bing.com/search?q=10%2F30+Sand&src=IE-SearchBox&FORM=IE8SRC

I want parse and match ?q= or q= keywords, using (?<=)? with the python re module. How can you can pass the multiple parameters in encode the ascii url to utf-8 so that it can be read?

Need some help here, thanks very much : )


回答1:


Try this:

[?&]q=([^&#]*)

Or, better yet:

import urlparse
pr = urlparse.urlparse(url)
qs = urlparse.parse_qs(pr.query)['q']

The latter automatically decodes %-escapes, too.



来源:https://stackoverflow.com/questions/12831537/how-do-you-parse-and-match-the-keyword-in-search-engine-url-using-python-re-modu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!