scrape an angularjs website with java

后端 未结 2 773
梦谈多话
梦谈多话 2021-01-03 16:47

I need to scrape a website with content \'inserted\' by Angular. And it needs to be done with java.

I have tried Selenium Webdriver (as I have used Selenium before

2条回答
  •  时光说笑
    2021-01-03 17:12

    In the end, I have followed Madusudanan 's excellent advise and I looked into PhantomJS / Selenium combination. And there actually is a solution! Its called PhantomJSDriver.

    You can find the maven dependency here. Here is more info on ghost driver.

    The setup in Maven- I have added the following:

    
            net.sourceforge.htmlunit
            htmlunit
            2.41.0
        
        
            com.github.detro
            phantomjsdriver
            1.2.0
        
    

    It also runs with Selenium version 2.45 which is the latest version up until now. I am mentioning this, because of some articles I read in which people say that the Phantom driver isn't compatible with every version of Selenium, but I guess they addressed that problem in the meantime.

    If you are already using a Selenium/Phantomdriver combination and you are getting 'strict javascript errors' on a certain site, update your version of selenium. That will fix it.

    And here is some sample code:

    public void testPhantomDriver() throws Exception {
        DesiredCapabilities options = new DesiredCapabilities();
        // the website i am scraping uses ssl, but I dont know what version
        options.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, new String[] {
              "--ssl-protocol=any"
          });
    
        PhantomJSDriver driver = new PhantomJSDriver(options);
    
        driver.get("https://www.mywebsite");
    
        List elements = driver.findElementsByClassName("media-title");
    
        for(WebElement element : elements ){
            System.out.println(element.getText());
        }
    
        driver.quit();
    }
    

提交回复
热议问题