How to pull out CSS attributes from inline styles with BeautifulSoup

前端 未结 1 506
無奈伤痛
無奈伤痛 2020-12-20 15:01

I have something like this:

 
         


        
1条回答
  •  生来不讨喜
    2020-12-20 15:45

    You've got a couple options- quick and dirty or the Right Way. The quick and dirty way (which will break easily if the markup is changed) looks like

    >>> from BeautifulSoup import BeautifulSoup
    >>> import re
    >>> soup = BeautifulSoup('')
    >>> style = soup.find('img')['style']
    >>> urls = re.findall('url\((.*?)\)', style)
    >>> urls
    [u'/theRealImage.jpg']
    

    Obviously, you'll have to play with that to get it to work with multiple img tags.

    The Right Way, since I'd feel awful suggesting someone use regex on a CSS string :), uses a CSS parser. cssutils, a library I just found on Google and available on PyPi, looks like it might do the job.

    0 讨论(0)
提交回复
热议问题