html-parsing | 易学教程

Find elements which have a specific child with BeautifulSoup

阅读更多关于 Find elements which have a specific child with BeautifulSoup

问题 With BeautifulSoup, how to access to a <li> which has a specific div as child? Example: How to access to the text (i.e. info@blah.com) of the li which has Email as child div? <li> <div>Country</div> Germany </li> <li> <div>Email</div> info@blah.com </li> I tried to do it manually: looping on all li , and for each of them, relooping on all child div to check if text is Email, etc. but I'm sure there exists a more clever version with BeautifulSoup. 回答1: There are multiple ways to approach the

Find elements which have a specific child with BeautifulSoup

阅读更多关于 Find elements which have a specific child with BeautifulSoup

Find elements which have a specific child with BeautifulSoup

阅读更多关于 Find elements which have a specific child with BeautifulSoup

Beautiful Soup 4: How to replace a tag with text and another tag?

阅读更多关于 Beautiful Soup 4: How to replace a tag with text and another tag?

问题 I want to replace a tag with another tag and put the contents of the old tag before the new one. For example: I want to change this: <html> <body> <p>This is the <span id="1">first</span> paragraph</p> <p>This is the <span id="2">second</span> paragraph</p> </body> </html> into this: <html> <body> <p>This is the first<sup>1</sup> paragraph</p> <p>This is the second<sup>2</sup> paragraph</p> </body> </html> I can easily find all spans with find_all() , get the number from the id attribute and

Django: Parse HTML (containing form) to dictionary

阅读更多关于 Django: Parse HTML (containing form) to dictionary

问题 I create a html form on the server side. <form action="." method="POST"> <input type="text" name="foo" value="bar"> <textarea name="area">long text</textarea> <select name="your-choice"> <option value="a" selected>A</option> <option value="b">B</option> </select> </form> Desired result: { "foo": "bar", "area": "long text", "your-choice": "a", } The method ( parse_form() ) I am looking for could be used like this: response = client.get('/foo/') # response contains <form> ...</form> data =

Django: Parse HTML (containing form) to dictionary

阅读更多关于 Django: Parse HTML (containing form) to dictionary

Django: Parse HTML (containing form) to dictionary

阅读更多关于 Django: Parse HTML (containing form) to dictionary

Parsing an HTML Document with python

阅读更多关于 Parsing an HTML Document with python

问题 I am totally new on python and i am trying to parse an HTML document to remove the tags and I just want to keep the title and the body from a newspaper website I have previously downloaded on my computer. I am using the class HTML Parser I found on the documentation, but I dont know how to use it very well, I dont understand this language very well :( This is my code: #importa la clase HTMLParser from html.parser import HTMLParser class HTMLCleaner(HTMLParser): container = "" def handle_data

Using beautifulsoup to parse string efficiently

阅读更多关于 Using beautifulsoup to parse string efficiently

问题 I am trying to parse this html to get the item title (e.g. Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW) <div style="" class=""> <h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about </span>Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW</h1> <h2 id="subTitle" class="it-sttl"> Brand New + Free Shipping, Satisfaction Guaranteed! </h2> <!-- DO NOT change linkToTagId="rwid" as the catalog response

Using beautifulsoup to parse string efficiently

阅读更多关于 Using beautifulsoup to parse string efficiently