问题
I am brand new to Java and need to write various java applications to do web scraping and web page interaction.
I started using Selenium but because it interacts directly with a browser, it is not practical for my use.
I need to do the following tasks: 1. Go to a specific URL 2. Enter a post code in a input field 3. Click submit button 4. Parse and save results from specific div tag or re-query page.
I am using HTMLUnit and Eclipse. I can access a webpage and enter a post code in an input by referencing the form and then the input name. However when I try to click the submit button, I get an ElementNotFoundException error.
Here is a sample of how the submit button is implemented on the page:
type="submit" value="submit" name="submit">Enter post code
Here is what my code looks like:
package htmlunittest;
import java.io.IOException;
import java.net.URL;
import junit.framework.TestCase;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.RefreshHandler;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlDivision;
import com.gargoylesoftware.htmlunit.html.HtmlButtonInput;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlImage;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
public class htmlunittest extends TestCase{
@SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception
{
final WebClient webClient = new WebClient();
final HtmlPage startPage = webClient.getPage("http://www.testpage.com");
final HtmlForm form = (HtmlForm) startPage.getForms().get(2);
final HtmlTextInput textField = form.getInputByName("address");
textField.setValueAttribute("my post code");
//throws ElementNotFoundException
final HtmlSubmitInput button = form.getInputByName("submit");
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
System.out.println(page2.getHtmlElementById("mainContent"));
webClient.closeAllWindows();
}
}
Can someone please point me in the right direction as to how I should click on the submit button via HTMLUNIT?
Thanks
回答1:
It is a bit difficult to find out why that is not working without the whole page that you're trying to fetch.
I bet you are not getting the right form with the .get(2)
, which by the way, is usually a bad idea to get forms that way because if the target page slightly changes its source code just to add a form above that one your scraper won't work again because the index will be different.
来源:https://stackoverflow.com/questions/16061533/java-and-htmlunit-how-to-click-on-submit-button