Beautiful soup multiple Span Extract Table

别等时光非礼了梦想. 提交于 2021-01-29 12:31:03

问题


I am currently working on my class assignment. I have to extract the data from the SPECS table from this webpage.

https://www.consumerreports.org/products/drip-coffee-maker/behmor-connected-alexa-enabled-temperature-control-396982/overview/

The data I need is stored as

<h2 class="crux-product-title">Specs</h2>
</div>
</div>
<div class="row">
<div class="col-xs-12">
<div class="product-model-features-specs-item">
<div class="row">
<div class='col-lg-6 col-md-6 col-sm-6 col-xs-12 product-model-features-specs-item-key'>
<span class="crux-body-copy crux-body-copy--small--bold">
Programmable
<span class="product-model-tooltip">
<span class="crux-icons crux-icons-help-information" aria-hidden="true"></span>
<span class="product-model-tooltip-window">
<span class="crux-icons crux-icons-close" aria-hidden="true"></span>
<span class="crux-body-copy crux-body-copy--small--bold">Programmable</span>
<span class="crux-body-copy crux-body-copy--small">Programmable models have a clock and can be set to brew at a specified time.</span>
</span>
</span>
</span>
</div>
<div class="col-lg-6 col-md-6 col-sm-6 col-xs-12 product-model-features-specs-item-value">
<span class='crux-body-copy crux-body-copy--small'>Yes</span>
</div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-xs-12">
<div class="product-model-features-specs-item">
<div class="row">
<div class='col-lg-6 col-md-6 col-sm-6 col-xs-12 product-model-features-specs-item-key'>
<span class="crux-body-copy crux-body-copy--small--bold">
Thermal carafe/mug
<span class="product-model-tooltip">
<span class="crux-icons crux-icons-help-information" aria-hidden="true"></span>
<span class="product-model-tooltip-window">
<span class="crux-icons crux-icons-close" aria-hidden="true"></span>
<span class="crux-body-copy crux-body-copy--small--bold">Thermal carafe/mug</span>
<span class="crux-body-copy crux-body-copy--small">Keeps coffee warm for about four hours; thermal mugs don&#039;t hold heat as well.</span>
</span>
</span>
</span>

I need to create Lists for the three span class

class="crux-body-copy crux-body-copy--small--bold
crux-body-copy crux-body-copy--small
crux-body-copy crux-body-copy--small

The problem with extracting the table is because of multiple span used in the table.

I used BEAUTIFUL SOUP and used find_all and find and used the span name to call it.

I always got the first value.

How do I do this?


回答1:


I don't know if this will work for you.

from simplified_scrapy import SimplifiedDoc,req,utils
html = ''' ''' # Your html
doc = SimplifiedDoc(html)
spans = doc.selects('span.crux-body-copy crux-body-copy--small--bold')
for span in spans:
    # print (span.firstText())
    print (span.select('span.crux-body-copy crux-body-copy--small--bold').text)
    print (span.select('span.crux-body-copy crux-body-copy--small').unescape())

Result:

Programmable
Programmable models have a clock and can be set to brew at a specified time.
Thermal carafe/mug
Keeps coffee warm for about four hours; thermal mugs don't hold heat as well.


来源:https://stackoverflow.com/questions/60556113/beautiful-soup-multiple-span-extract-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!