html-parsing

Find elements which have a specific child with BeautifulSoup

拈花ヽ惹草 提交于 2021-01-23 04:49:46
问题 With BeautifulSoup, how to access to a <li> which has a specific div as child? Example: How to access to the text (i.e. info@blah.com) of the li which has Email as child div? <li> <div>Country</div> Germany </li> <li> <div>Email</div> info@blah.com </li> I tried to do it manually: looping on all li , and for each of them, relooping on all child div to check if text is Email, etc. but I'm sure there exists a more clever version with BeautifulSoup. 回答1: There are multiple ways to approach the

Find elements which have a specific child with BeautifulSoup

本秂侑毒 提交于 2021-01-23 04:49:34
问题 With BeautifulSoup, how to access to a <li> which has a specific div as child? Example: How to access to the text (i.e. info@blah.com) of the li which has Email as child div? <li> <div>Country</div> Germany </li> <li> <div>Email</div> info@blah.com </li> I tried to do it manually: looping on all li , and for each of them, relooping on all child div to check if text is Email, etc. but I'm sure there exists a more clever version with BeautifulSoup. 回答1: There are multiple ways to approach the

Find elements which have a specific child with BeautifulSoup

[亡魂溺海] 提交于 2021-01-23 04:49:00
问题 With BeautifulSoup, how to access to a <li> which has a specific div as child? Example: How to access to the text (i.e. info@blah.com) of the li which has Email as child div? <li> <div>Country</div> Germany </li> <li> <div>Email</div> info@blah.com </li> I tried to do it manually: looping on all li , and for each of them, relooping on all child div to check if text is Email, etc. but I'm sure there exists a more clever version with BeautifulSoup. 回答1: There are multiple ways to approach the

Beautiful Soup 4: How to replace a tag with text and another tag?

三世轮回 提交于 2021-01-21 03:58:06
问题 I want to replace a tag with another tag and put the contents of the old tag before the new one. For example: I want to change this: <html> <body> <p>This is the <span id="1">first</span> paragraph</p> <p>This is the <span id="2">second</span> paragraph</p> </body> </html> into this: <html> <body> <p>This is the first<sup>1</sup> paragraph</p> <p>This is the second<sup>2</sup> paragraph</p> </body> </html> I can easily find all spans with find_all() , get the number from the id attribute and

Django: Parse HTML (containing form) to dictionary

前提是你 提交于 2021-01-16 03:55:01
问题 I create a html form on the server side. <form action="." method="POST"> <input type="text" name="foo" value="bar"> <textarea name="area">long text</textarea> <select name="your-choice"> <option value="a" selected>A</option> <option value="b">B</option> </select> </form> Desired result: { "foo": "bar", "area": "long text", "your-choice": "a", } The method ( parse_form() ) I am looking for could be used like this: response = client.get('/foo/') # response contains <form> ...</form> data =

Django: Parse HTML (containing form) to dictionary

扶醉桌前 提交于 2021-01-16 03:54:50
问题 I create a html form on the server side. <form action="." method="POST"> <input type="text" name="foo" value="bar"> <textarea name="area">long text</textarea> <select name="your-choice"> <option value="a" selected>A</option> <option value="b">B</option> </select> </form> Desired result: { "foo": "bar", "area": "long text", "your-choice": "a", } The method ( parse_form() ) I am looking for could be used like this: response = client.get('/foo/') # response contains <form> ...</form> data =

Django: Parse HTML (containing form) to dictionary

一笑奈何 提交于 2021-01-16 03:52:47
问题 I create a html form on the server side. <form action="." method="POST"> <input type="text" name="foo" value="bar"> <textarea name="area">long text</textarea> <select name="your-choice"> <option value="a" selected>A</option> <option value="b">B</option> </select> </form> Desired result: { "foo": "bar", "area": "long text", "your-choice": "a", } The method ( parse_form() ) I am looking for could be used like this: response = client.get('/foo/') # response contains <form> ...</form> data =

Parsing an HTML Document with python

喜欢而已 提交于 2021-01-07 01:31:10
问题 I am totally new on python and i am trying to parse an HTML document to remove the tags and I just want to keep the title and the body from a newspaper website I have previously downloaded on my computer. I am using the class HTML Parser I found on the documentation, but I dont know how to use it very well, I dont understand this language very well :( This is my code: #importa la clase HTMLParser from html.parser import HTMLParser class HTMLCleaner(HTMLParser): container = "" def handle_data

Using beautifulsoup to parse string efficiently

心不动则不痛 提交于 2021-01-04 07:27:31
问题 I am trying to parse this html to get the item title (e.g. Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW) <div style="" class=""> <h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about  </span>Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW</h1> <h2 id="subTitle" class="it-sttl"> Brand New + Free Shipping, Satisfaction Guaranteed! </h2> <!-- DO NOT change linkToTagId="rwid" as the catalog response

Using beautifulsoup to parse string efficiently

梦想与她 提交于 2021-01-04 07:27:08
问题 I am trying to parse this html to get the item title (e.g. Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW) <div style="" class=""> <h1 class="it-ttl" itemprop="name" id="itemTitle"><span class="g-hdn">Details about  </span>Big Boss Air Fryer - Healthy 1300-Watt Super Sized 16-Quart, Fryer 5 Colors -NEW</h1> <h2 id="subTitle" class="it-sttl"> Brand New + Free Shipping, Satisfaction Guaranteed! </h2> <!-- DO NOT change linkToTagId="rwid" as the catalog response