问题
I'm trying to extract prices from a website.
The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price.
This is an example of the code without the old price (which my code returns as a string)
<div class="xl-price rangePrice">
535.000 €
</div>
This is an example of the code WITH the old price (which my code returns as "none")
< div
class ="xl-price rangePrice" >
487.000 €
< span
class ="old-price" > 497.000 € < br > < / span >
< / div >
The page I'm trying to extract code from: pagelink
My code:
prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
prices.append(items.string)
print(prices)
and another issue I'm having is that it returns the values like this:
'\r\n\t\t\t\t\t\t\t\t298.000 € \r\n\t\t\t\t\t\t\t', '\r\n\t\t\t\t\t\t\t\t145.000 € \r\n\t\t\t\t\t\t\t'
when I only want the numbers.
Would appreciate the help!
回答1:
import requests
from bs4 import BeautifulSoup
r = requests.get(
'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000')
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
item = item.contents[0]
print(item.strip()[0:-1])
Output:
298.000
145.000
275.000
535.000
487.000
159.000
325.000
189.000
139.000
499.000
520.000
249.500
448.000
215.000
225.000
210.000
215.000
218.000
232.000
689.000
228.000
299.500
169.000
135.000
549.000
125.000
160.000
395.000
430.000
210.000
回答2:
Here is the sample code for your question.
import re
import requests
page = requests.get("https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000")
print(page.content)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
if items.string:
result = re.findall(r'\d+.\d+', items.string)
prices.append(result[0])
else:
soup1 = BeautifulSoup(str(items), 'html.parser')
for item in soup1.find("div", {"class": "xl-price rangePrice"}):
if item.string:
result = re.findall(r'\d+.\d+', item.string)
if len(result)>0:
prices.append(result[0])
print(prices)
回答3:
I don’t have access to a computer right now, so consider this quasi-pseudocode:
new_price = div_elem.find(text=True, recursive=False)
find_res = div_elem.find('span', attrs={'class': 'old-price'})
if find_res:
old_price = find_res.get_text(strip=True)
I tried to keep things as easy to understand as possible.
Let me know if you have any questions :)
来源:https://stackoverflow.com/questions/59123337/how-can-i-get-the-first-string-from-a-div-that-has-a-div-embedded-beautifulsoup4