web-scraping

Specifying multiple conditions in xpath

旧时模样 提交于 2021-02-18 11:42:05
问题 I want to select all the tags with <td class='blob-code blob-code-addition'> and <td class='blob-code blob-code-deletion'> . So I am trying to include or condition here between the two predicates. It does not work. However, if I include only one of the two classes it works . What is the problem here? Something is wrong with the syntax. By getChanges = By.xpath("//td[@class='blob-code blob-code-addition'] or //td[@class='blob-code blob-code-deletion']"); 回答1: You want to specify that like the

Scrapy parse javascript

眉间皱痕 提交于 2021-02-18 11:22:39
问题 I have a javascript on the page like below: new Shopify.OptionSelectors("product-select", { product: {"id":185310341,"title":"10. Design | Siyah \u0026 beyaz kalpli", i want to get "185310341". I am searching on google about a few hours but couldn't find anything, I hope u can help me. How can i scrape that javascript and get that id? I tried that code : id = sel.search('"id":(.*?),',text).group(1) print id but i got: exceptions.AttributeError: 'Selector' object has no attribute 'search' 回答1:

Web scraping in Investing.com with Excel vba

自古美人都是妖i 提交于 2021-02-18 08:48:34
问题 I have no knowledge of vba. Only the macro recorder is used. I need to download the data from a web page to an Excel spreadsheet and with my knowledge of vba I am not capable. In particular, what I want to do a macro to download to Excel a data table of the page: https://www.investing.com/equities/cellnex-telecom-historical-data This download would have to be configured in terms of time, date range and ordering. The steps would be the following: 1.- The objective is to copy the data from the

Clicking a drop down option in iFrame with VBA excel

淺唱寂寞╮ 提交于 2021-02-17 02:15:09
问题 I'm trying to click the drop down option "Approve the Deal" in the below code. I have solid VBA backround but this is my first attempt at HTML automation with VBA excel. Thus far I'm able to navigate and login to the website, select the deal that needs to be approved, and show the drop down. However, I cannot figure out how to click the drop down option/trigger the event. I believe my gaps are dealing with the iFrame and/or table (something I did not have to deal with in the code to this

Why isn't the replace() function working?

别来无恙 提交于 2021-02-17 01:57:07
问题 I'm scraping a website using Selenium. When I get the text of a list of elements (headers), this is what it prints: ['Countyarrow_upward Reportingarrow_upward Totalarrow_upward Bennet (D)arrow_upward Biden (D)arrow_upward Bloomberg (D)arrow_upward Booker (D)arrow_upward Boyd (D)arrow_upward Buttigieg (D)arrow_upward Castro (D)arrow_upward De La Fuente III (D)arrow_upward Delaney (D)arrow_upward Ellinger (D)arrow_upward Gabbard (D)arrow_upward Greenstein (D)arrow_upward Klobuchar (D)arrow

Condition to check if Selenium is done scrolling based on web element?

℡╲_俬逩灬. 提交于 2021-02-17 01:54:08
问题 Currently I have a script that will go to TripAdvisor and try to scrape every image in that particular filter. I was wondering what conditional I should set my if statement to in order for it to break out of the while loop and then parse the list of urls to give me clear url links to each image. I am just confused at how I can tell if I have reached the end once I have reached the last web element. The if statement is right at the end before the last printing loop. Any help is greatly

Click a button defined as a DIV with a SVG linked to a path with VBA

大憨熊 提交于 2021-02-16 19:50:18
问题 I am making an Excel sheet that takes pictures from a webshop and places them on Pinterest. When I try to submit the image URL I can't find the element to perform a click event. Full HTML code from Pinterest. <div data-test-id="website-link-submit-button" class="_50 _5a _6e _h _z7 _4q _j" style="height: 100%;"> <svg class="_u0 _3a _u1 _45" height="20" width="20" viewBox="0 0 24 24" aria-label="Verzenden" role="img"> <title>Verzenden</title> <path d="M6.72 24c.57 0 1.14-.22 1.57-.66L19.5 12 8

Looping through rows of a table while clicking links in selenium (python)

送分小仙女□ 提交于 2021-02-16 15:27:45
问题 The sample page source looks like this <div class='div1'> <table class="foot-market"> <tbody data-live="false"> <td class='today-name'/> </tbody> <tbody data-live="false"> <td class='today-name'/> </tbody> <tbody data-live="false"> <td class='today-name'/> </tbody> </table <table class="foot-market"> <tbody data-live="false"> <td class='today-name'/> </tbody> <tbody data-live="false"> <td class='today-name'/> </tbody> <tbody data-live="false"> <td class='today-name'/> </tbody> </table </div>

Setting proxy in Goutte

我的未来我决定 提交于 2021-02-16 08:56:16
问题 I've tried using Guzzle's docs to set proxy but it's not working. The official Github page for Goutte is pretty dead so can't find anything there. Anyone know how to set a proxy? This is what I've tried: $client = new Client(); $client->setHeader('User-Agent', $user_agent); $crawler = $client->request('GET', $request, ['proxy' => $proxy]); 回答1: You thinking rigth, but in Goutte\Client::doRequest(), when create Guzzle client $guzzleRequest = $this->getClient()->createRequest( $request-

Setting proxy in Goutte

余生颓废 提交于 2021-02-16 08:56:00
问题 I've tried using Guzzle's docs to set proxy but it's not working. The official Github page for Goutte is pretty dead so can't find anything there. Anyone know how to set a proxy? This is what I've tried: $client = new Client(); $client->setHeader('User-Agent', $user_agent); $crawler = $client->request('GET', $request, ['proxy' => $proxy]); 回答1: You thinking rigth, but in Goutte\Client::doRequest(), when create Guzzle client $guzzleRequest = $this->getClient()->createRequest( $request-