Using Mechanize, is it possible to find a phrase in the HTML of a page, for example, \"email\", and find the next after that, and fill in that input
For a well-formed HTML page, an input
element should have a label
showing what the input
is for. In this case, you can iterate all label
, finding the one containing text "email"
, and get the associated input
by the for
attribute of the label
.
However, not all HTML page are well-formed. No label
, no for
attribute, or other ill-formed issues.
If you mean the input
right after some element in the DOM. You can do some DOM traversal to find whether an element containing "email"
has an input
element next to it.
If you mean the input
next to an element in the rendered page, you should define what is "next to". And I think you cannot get what you want without great efforts. Some element located after the element "email" might be placed before it with some CSS trick. You need some graphical API to find that input
. However, I don't see that in watir
's API documentation.
Mechanize uses Nokogiri internally to handle its DOM parsing, which is the basis of its ability to locate different elements in a page.
It's possible to access the parsed DOM, and, through it use Nokogiri to locate elements Mechanize doesn't normally let us find. For instance:
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.example.com')
# Use Nokogiri to find the content of the <h1> tag...
puts page.at('h1').content # => "Example Domain"
For your search you'd want to use an XPath accessor to locate where "email" is in the page. Once you've done that you can locate the next <input>
tag.
Starting from a simple HTML fragment, we'll pretend this comes from Mechanize:
page = Nokogiri::HTML('<div><form><p>email</p><input name="email"></form></div>')
puts page.to_html
Which looks like:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div><form>
<p>email</p>
<input name="email">
</form></div></body></html>
Searching for "email":
page.at("//*[contains(text(),'email')]")
#<Nokogiri::XML::Element:0x3ff50d0c4bc0 name="p" children=[#<Nokogiri::XML::Text:0x3ff50d0c497c "email">]>
Building upon that, this gets the <input>
tag:
input_tag = page.at("//*[contains(text(),'email')]/following-sibling::input")
#<Nokogiri::XML::Element:0x3ff50d09b75c name="input" attributes=[#<Nokogiri::XML::Attr:0x3ff50d09b5f4 name="name" value="email">]>
Once you've found that input tag, you can get the "name" from the tag using Nokogiri, and then tell Mechanize to locate and fill in that particular input field:
input_tag['name']
=> "email"
For a web form to function correctly, it has to have names for the elements. Those get passed to the server when the form is submitted. Without the names it'd take a lot of work to determine which input sent a particular piece of data, and, programmers being lazy, we don't want to work hard, so you can count on having a name to work with.
See "Ruby Mechanize, Nokogiri and Net::HTTP" for more information, plus a search of Stack Overflow, and reading the Nokogiri documenation and tutorials will give you lots of needed information for figuring out how to do the rest.
First find the element with the phrase text:
el = page.at('*[text()*="some phrase"]')
From there you can get the first following input:
input = el.at('./following::input')
Now, find the ancestor form node of that input:
form_node = input.ancestors('form')[0]
Then use that to get the Mechanize::Form object
form = page.form_with(:form_node => form_node)
And now you can fill out the value
form[input[:name]] = 'foo'