I want to extract \"SNG_TITLE\" and \"ART_NAME\" values from the code in \"script\" tag using BeautifulSoup in Python. (the whole script is too long to paste)
If my understanding is correct, you want only the script element with "SNG_TITLE" in it.
You can use re
and get only the script element with the fields of your interest as follows:
import requests
from bs4 import BeautifulSoup
import re
base_url = 'https://www.deezer.com/en/profile/1589856782/loved'
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
user_name = soup.find(class_='user-name')
print(user_name.text)
for script in soup(text=re.compile(r'SNG_TITLE' )):
print(script.parent)
EDIT:
@furas answer is the complete solution using json
to find the 'SNG_TITLE' and 'ART_TITLE'. My answer help you find only the script with 'SNG_TITLE'. You can combine both to get better code.
Scripts don't change places in code so you can count them and use index to get correct script.
all_scripts[6]
Script is normal string so you can also use standard string functions ie.
if '{"loved"' in script.text:
Code with both methods - I use [:100]
to display only part of string.
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.deezer.com/en/profile/1589856782/loved'
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
all_scripts = soup.find_all('script')
print('--- first method ---')
print(all_scripts[6].text[:100])
print('--- second method ---')
for number, script in enumerate(all_scripts):
if '{"loved"' in script.text:
print(number, script.text[:100])
Result:
--- first method ---
window.__DZR_APP_STATE__ = {"TAB":{"loved":{"data":[{"SNG_ID":"126884459","PRODUCT_TRACK_ID":"360276
--- second method ---
6 window.__DZR_APP_STATE__ = {"TAB":{"loved":{"data":[{"SNG_ID":"126884459","PRODUCT_TRACK_ID":"360276
EDIT: When you have correct script then you can use slicing to get only JSON
string and use module json
to convert it to python dictionary and then tou can get data
import requests
from bs4 import BeautifulSoup
import json
base_url = 'https://www.deezer.com/en/profile/1589856782/loved'
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
all_scripts = soup.find_all('script')
data = json.loads(all_scripts[6].get_text()[27:])
print('key:', data.keys())
print('key:', data['TAB'].keys())
print('key:', data['DATA'].keys())
print('---')
for item in data['TAB']['loved']['data']:
print('ART_NAME:', item['ART_NAME'])
print('SNG_TITLE:', item['SNG_TITLE'])
print('---')
Result:
key: dict_keys(['TAB', 'DATA'])
key: dict_keys(['loved'])
key: dict_keys(['USER', 'FOLLOW', 'FOLLOWING', 'HAS_BLOCKED', 'IS_BLOCKED', 'IS_PUBLIC', 'CURATOR', 'IS_PERSONNAL', 'NB_FOLLOWER', 'NB_FOLLOWING'])
---
ART_NAME: Twenty One Pilots
SNG_TITLE: Heathens
---
ART_NAME: Twenty One Pilots
SNG_TITLE: Stressed Out
---
ART_NAME: Linkin Park
SNG_TITLE: Numb
---
ART_NAME: Three Days Grace
SNG_TITLE: Animal I Have Become
---
ART_NAME: Three Days Grace
SNG_TITLE: Painkiller
---
ART_NAME: Slipknot
SNG_TITLE: Before I Forget
---
ART_NAME: Slipknot
SNG_TITLE: Duality
---
ART_NAME: Skrillex
SNG_TITLE: Make It Bun Dem
---
ART_NAME: Skrillex
SNG_TITLE: Bangarang (feat. Sirah)
---
ART_NAME: Limp Bizkit
SNG_TITLE: Break Stuff
---
ART_NAME: Three Days Grace
SNG_TITLE: I Hate Everything About You
---
ART_NAME: Three Days Grace
SNG_TITLE: Time of Dying
---
ART_NAME: Three Days Grace
SNG_TITLE: I Am Machine
---
ART_NAME: Three Days Grace
SNG_TITLE: Riot
---
ART_NAME: Three Days Grace
SNG_TITLE: So What
---
ART_NAME: Three Days Grace
SNG_TITLE: Pain
---
ART_NAME: Three Days Grace
SNG_TITLE: Tell Me Why
---
ART_NAME: Three Days Grace
SNG_TITLE: Chalk Outline
---
ART_NAME: Three Days Grace
SNG_TITLE: Gone Forever
---
ART_NAME: Slipknot
SNG_TITLE: The Devil In I
---
ART_NAME: Linkin Park
SNG_TITLE: No More Sorrow
---
ART_NAME: Linkin Park
SNG_TITLE: Bleed It Out
---
ART_NAME: The Doors
SNG_TITLE: Roadhouse Blues
---
ART_NAME: The Doors
SNG_TITLE: Riders On The Storm
---
ART_NAME: The Doors
SNG_TITLE: Break On Through (To The Other Side)
---
ART_NAME: The Doors
SNG_TITLE: Alabama Song (Whisky Bar)
---
ART_NAME: The Doors
SNG_TITLE: People Are Strange
---
ART_NAME: My Chemical Romance
SNG_TITLE: Welcome to the Black Parade
---
ART_NAME: My Chemical Romance
SNG_TITLE: Teenagers
---
ART_NAME: My Chemical Romance
SNG_TITLE: Na Na Na [Na Na Na Na Na Na Na Na Na]
---
ART_NAME: My Chemical Romance
SNG_TITLE: Famous Last Words
---
ART_NAME: The Doors
SNG_TITLE: Soul Kitchen
---
ART_NAME: The Black Keys
SNG_TITLE: Lonely Boy
---
ART_NAME: Katy Perry
SNG_TITLE: I Kissed a Girl
---
ART_NAME: Katy Perry
SNG_TITLE: Hot N Cold
---
ART_NAME: Katy Perry
SNG_TITLE: E.T.
---
ART_NAME: Linkin Park
SNG_TITLE: Given Up
---
ART_NAME: My Chemical Romance
SNG_TITLE: Dead!
---
ART_NAME: My Chemical Romance
SNG_TITLE: Mama
---
ART_NAME: My Chemical Romance
SNG_TITLE: The Sharpest Lives
---