Selendroid as a web scraper

雨燕双飞 提交于 2020-08-22 04:24:07

问题


I intend to create an Android application that performs a headless login to a website and then scrape some content from the subsequent page while maintaining the logged-in session.

I first used HtmlUnit in a normal Java project and it worked just fine. But later found that HtmlUnit is not compatible with Android.

Then I tried JSoup library by sending HTTP “POST” request to the login form. But the resulting page does not load up completely since JSoup won't support JavaScript.

I was then suggested to have a look on Selendroid which actually is an android test automation framework. But what I actually need is an Html parser that supports both JavaScript and Android. I find Selendroid quite difficult to understand which I can't even figure out which dependencies to use.

  • selendroid-client
  • selendroid-standalone
  • selendroid-server

With Selenium WebDriver, the code would be as simple as the following. But can somebody show me a similar code example for Selendroid as well?

    WebDriver driver = new FirefoxDriver();
    driver.get("https://mail.google.com/");

    driver.findElement(By.id("email")).sendKeys(myEmail);
    driver.findElement(By.id("pass")).sendKeys(pass);

    // Click on 'Sign In' button
    driver.findElement(By.id("signIn")).click();

And also,

  1. What dependencies to add to my Gradle.Build file?
  2. Which Selendroid libraries to import?

回答1:


Unfortunately I didn't get Selendroid to work. But I find a workaround to scrape dynamic content by using just Android's built in WebView with JavaScript enabled.

mWebView = new WebView();
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.addJavascriptInterface(new HtmlHandler(), "HtmlHandler");

mWebView.setWebViewClient(new WebViewClient() {
   @Override
   public void onPageFinished(WebView view, String url) {
       super.onPageFinished(view, url);

       if (url == urlToLoad) {
       // Pass html source to the HtmlHandler
       WebView.loadUrl("javascript:HtmlHandler.handleHtml(document.documentElement.outerHTML);");

   }
});

The JS method document.documentElement.outerHTML will retrieve the full html contained in the loaded url. Then the retrived html string is sent to handleHtml method in HtmlHandler class.

class HtmlHandler {
        @JavascriptInterface
        @SuppressWarnings("unused")
        public void handleHtml(String html) {
            // scrape the content here

        }
    }

You may use a library like Jsoup to scrape the necessary content from the html String.




回答2:


I never had used Selendroid so I'm not really sure about that but searching by the net I found this example and, according to it, I suppose that your code translation from Selenium to Selendroid would be:

Translation code (in my opinion)

public class MobileWebTest {
  private SelendroidLauncher selendroidServer = null;
  private WebDriver driver = null;

  @Test
  public void doTest() {
    
     driver.get("https://mail.google.com/");

     WebElement email = driver.findElement(By.id("email")).sendKeys(myEmail);
     WebElement password = driver.findElement(By.id("pass")).sendKeys(pass);

     WebElement button = driver.findElement(By.id("signIn")).click();

     driver.quit();
  }

  @Before
  public void startSelendroidServer() throws Exception {
    if (selendroidServer != null) {
      selendroidServer.stopSelendroid();
    }

    SelendroidConfiguration config = new SelendroidConfiguration();

    selendroidServer = new SelendroidLauncher(config);
    selendroidServer.launchSelendroid();

    DesiredCapabilities caps = SelendroidCapabilities.android();

    driver = new SelendroidDriver(caps);
  }

  @After
  public void stopSelendroidServer() {
    if (driver != null) {
      driver.quit();
    }
    if (selendroidServer != null) {
      selendroidServer.stopSelendroid();
    }
  }
}

What do you have to add to your project

It seems that you have to add to your project the Selendroid standalone jar file. If you have doubts about how to add a external jar in an Android project you can see this question: How can I use external JARs in an Android project?

Here you can download the jar file: jar file

Also, it seems that it is not enough just to add the jar file to your project. You should add too the selendroid-client jar file of the version of standalone that you have.

You can download it from here: client jar file

I expect it will be helpful for you!




回答3:


I would suggest you use WebdriverIO since you want to use Javascript. It uses NodeJs so it will be easy to require other plugins to scrape the HTML.

Appium is also an alternative but it's more focused on front-end testing.



来源:https://stackoverflow.com/questions/30058692/selendroid-as-a-web-scraper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!