I am trying to scrape data from this public Tableau dashboard. The ineterest is in the time series plotted data. If i click in a spcific state in the map, the time series change
When you click on the map, it triggers a call on :
POST https://public.tableau.com/{vizql_root}/sessions/{session_id}/commands/tabdoc/select
with some form data like the following :
worksheet: map_state_mobile
dashboard: Visão Geral
selection: {"objectIds":[17],"selectionType":"tuples"}
selectOptions: select-options-simple
It has the state index (here 17) and the worksheet name. I've noticed that the sheet name is either map_state_mobile
or map_state (2)
when you click a state.
So, it's necessary to :
The state are sorted alphabetically (reversed) so the method below is not necessary if you are ok with hardcoding them and sort them like this :
['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']
In other case when we don't want to harcode them (for other tableau usecase), execute the method below :
Extracting the state name list is not straightforward since the data is presented as following :
{
"secondaryInfo": {
"presModelMap": {
"dataDictionary": {
"presModelHolder": {
"genDataDictionaryPresModel": {
"dataSegments": {
"0": {
"dataColumns": []
}
}
}
}
},
"vizData": {
"presModelHolder": {
"genPresModelMapPresModel": {
"presModelMap": {
"map_state (2)": {},
"map_state_mobile": {},
"time_line_BR": {},
"time_line_BR_mobile": {},
"total de casos": {},
"total de mortes": {}
}
}
}
}
}
}
}
My method is to get into "vizData" and into a worksheet inside presModelMap
which has the following structure :
"presModelHolder": {
"genVizDataPresModel": {
"vizColumns": [],
"paneColumnsData": {
"vizDataColumns": [],
"paneColumnsList": []
}
}
}
vizDataColumns
has a collection of object with property localBaseColumnName
. Find the localBaseColumnName
with value [state_name]
with fieldRole
as measure
:
{
"fn": "[federated.124ags61tmhyti14im1010h1elsu].[attr:state_name:nk]",
"fnDisagg": "",
"localBaseColumnName": "[state_name]", <============================= MATCH THIS
"baseColumnName": "[federated.124ags61tmhyti14im1010h1elsu].[state_name]",
"fieldCaption": "ATTR(State Name)",
"formatStrings": [],
"datasourceCaption": "federated.124ags61tmhyti14im1010h1elsu",
"dataType": "cstring",
"aggregation": "attr",
"stringCollation": {
"name": "LEN_RUS_S2",
"charsetId": 0
},
"fieldRole": "measure", <=========================================== MATCH THIS
"isAutoSelect": true,
"paneIndices": [
0 <=========================================== EXTRACT THIS
],
"columnIndices": [
7 <=========================================== EXTRACT THIS
]
}
paneIndices
match the index in the paneColumnsList
array. And columnIndices
match the index in the vizPaneColumns
array. vizPaneColumns
array is located just in the item selected in paneColumnsList
array
From there you get the index to search like this :
[222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248]
In the dataDictionary
object, get the dataValues (like you've extracted in your question) and extract the state name from the range above
Then you get the state list :
['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']
You just need the worksheet name and the index of the field (state index in the list above)
r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
data = {
"worksheet": "map_state (2)",
"dashboard": "Visão Geral",
"selection": json.dumps({
"objectIds":[int(selected_index)],
"selectionType":"tuples"
}),
"selectOptions": "select-options-simple"
})
The code below extract the tableau data, extract the state name with the method above (not necessary if you prefer to hardcode the list), prompt user to enter state index, call the select endpoint and extract the data for this state :
import requests
from bs4 import BeautifulSoup
import json
import re
data_host = "https://public.tableau.com"
# get the second tableau link
r = requests.get(
f"{data_host}/views/MKTScoredeisolamentosocial/VisoGeral",
params= {
":showVizHome":"no"
}
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
"sheet_id": tableauData["sheetId"],
})
dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))
stateIndexInfo = [
(t["fieldRole"], {
"paneIndices": t["paneIndices"][0],
"columnIndices": t["columnIndices"][0],
"dataType": t["dataType"]
})
for t in data["secondaryInfo"]["presModelMap"]["vizData"]["presModelHolder"]["genPresModelMapPresModel"]["presModelMap"]["map_state (2)"]["presModelHolder"]["genVizDataPresModel"]["paneColumnsData"]["vizDataColumns"]
if t.get("localBaseColumnName") and t["localBaseColumnName"] == "[state_name]"
]
stateNameIndexInfo = [t[1] for t in stateIndexInfo if t[0] == 'dimension'][0]
panelColumnList = data["secondaryInfo"]["presModelMap"]["vizData"]["presModelHolder"]["genPresModelMapPresModel"]["presModelMap"]["map_state (2)"]["presModelHolder"]["genVizDataPresModel"]["paneColumnsData"]["paneColumnsList"]
stateNameIndices = panelColumnList[stateNameIndexInfo["paneIndices"]]["vizPaneColumns"][stateNameIndexInfo["columnIndices"]]["valueIndices"]
# print [222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248]
#print(stateNameIndices)
dataValues = [
t
for t in data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"]
if t["dataType"] == stateNameIndexInfo["dataType"]
][0]["dataValues"]
stateNames = [dataValues[t] for t in stateNameIndices]
# print ['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']
#print(stateNames)
for idx, val in enumerate(stateNames):
print(f"{val} - {idx+1}")
selected_index = input("Please select a state by indices : ")
print(f"selected : {stateNames[int(selected_index)-1]}")
r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
data = {
"worksheet": "map_state (2)",
"dashboard": "Visão Geral",
"selection": json.dumps({
"objectIds":[int(selected_index)],
"selectionType":"tuples"
}),
"selectOptions": "select-options-simple"
})
dataSegments = r.json()["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]["dataDictionary"]["dataSegments"]
print(dataSegments[max([*dataSegments])]["dataColumns"])
Try this on repl.it
The code with hardcoding of the state name list is more straightforward :
import requests
from bs4 import BeautifulSoup
import json
data_host = "https://public.tableau.com"
r = requests.get(
f"{data_host}/views/MKTScoredeisolamentosocial/VisoGeral",
params= {
":showVizHome":"no"
}
)
soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)
dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'
r = requests.post(dataUrl, data= {
"sheet_id": tableauData["sheetId"],
})
stateNames = ['Tocantins', 'Sergipe', 'São Paulo', 'Santa Catarina', 'Roraima', 'Rondônia', 'Rio Grande do Sul', 'Rio Grande do Norte', 'Rio de Janeiro', 'Piauí', 'Pernambuco', 'Paraná', 'Paraíba', 'Pará', 'Minas Gerais', 'Mato Grosso do Sul', 'Mato Grosso', 'Maranhão', 'Goiás', 'Espírito Santo', 'Distrito Federal', 'Ceará', 'Bahia', 'Amazonas', 'Amapá', 'Alagoas', 'Acre']
for idx, val in enumerate(stateNames):
print(f"{val} - {idx+1}")
selected_index = input("Please select a state by indices : ")
print(f"selected : {stateNames[int(selected_index)-1]}")
r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
data = {
"worksheet": "map_state (2)",
"dashboard": "Visão Geral",
"selection": json.dumps({
"objectIds":[int(selected_index)],
"selectionType":"tuples"
}),
"selectOptions": "select-options-simple"
})
dataSegments = r.json()["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]["dataDictionary"]["dataSegments"]
print(dataSegments[max([*dataSegments])]["dataColumns"])
Try this on repl.it
Note that, in this case, even if we don't care about the output of the first call (/bootstrapSession/sessions/{tableauData["sessionid"]}
). It's needed to validate the session_id and call the select call afterwards (otherwise the select doesn't return anything)