How to pull out CSS attributes from inline styles with BeautifulSoup

前端 未结 1 505
無奈伤痛
無奈伤痛 2020-12-20 15:01

I have something like this:

 


        
相关标签:
1条回答
  • 2020-12-20 15:45

    You've got a couple options- quick and dirty or the Right Way. The quick and dirty way (which will break easily if the markup is changed) looks like

    >>> from BeautifulSoup import BeautifulSoup
    >>> import re
    >>> soup = BeautifulSoup('<html><body><img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/></body></html>')
    >>> style = soup.find('img')['style']
    >>> urls = re.findall('url\((.*?)\)', style)
    >>> urls
    [u'/theRealImage.jpg']
    

    Obviously, you'll have to play with that to get it to work with multiple img tags.

    The Right Way, since I'd feel awful suggesting someone use regex on a CSS string :), uses a CSS parser. cssutils, a library I just found on Google and available on PyPi, looks like it might do the job.

    0 讨论(0)
提交回复
热议问题