问题
I am in need of parsing field name and values from an html form to add to my db. I know i can go and do a find "input name='" then start another find to find the closing "'" and get the data via mid function then do the same for value via find "value='" I was wondering if there is an easier way to loop the doc and extract all input names and the associated values ?
Below is a sample of what my page to parse looks like
<input name='a_glare' value='B' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_testani' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_tksig' value='EC' class='inputbox-highlighted-false' size='2' maxlength='2'>
</td>
<td align="center">
<input name='a_sacnon' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_ot' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_ovlp' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
回答1:
For parsing html, I would recommend using JSoup instead of regular expressions. I just started using JSoup and found it extremely simple to use. Just download the jar and add it to your application class path.
I am not an expert by any means, but was able to print all of the "input" fields from your sample html page using this snippet:
<cfscript>
// parse html string into document
jsoup = createObject("java", "org.jsoup.Jsoup");
doc = jsoup.parse( yourHTMLContentString );
// grab all "input" fields
fields = doc.select("input");
for (elem in fields) {
// get attributes of each field
fieldName = elem.attr("name");
fieldValue = elem.attr("value");
fieldType = elem.attr("type");
// display values
WriteOutput("<br>type: "& fieldType
&" name: "& fieldName
&" value: "& fieldValue
);
}
</cfscript>
(.. and yes, despite your moniker, I am suggesting "JSoup4You" )
Update:
The fields
variable is an array. So you can loop through it in cfml the same way. It seems like double work, but if you prefer, you can extract the input names and values into your own array of structures (or whatever CF construct you like). For example:
// initialize storage array
yourArray = [];
for (elem in fields) {
// extract field properties into a structure
data = { name=elem.attr("name")
, value=elem.attr("value")
, type=elem.attr("type")
};
// store in array
arrayAppend(yourArray, data);
}
// display array contents
WriteDump(yourArray);
回答2:
You could try parsing it using two regular expressions to get the field names and field values. This is what I came up with using your example HTML.
<cfsavecontent variable="foo">
<input name='a_glare' value='B' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_testani' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_tksig' value='EC' class='inputbox-highlighted-false' size='2' maxlength='2'>
</td>
<td align="center">
<input name='a_sacnon' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_ot' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
</td>
<td align="center">
<input name='a_ovlp' value='' class='inputbox-highlighted-false' size='1' maxlength='1'>
</cfsavecontent>
<!--- extract the fieldnames and field values attributes --->
<cfset fieldnames = rematch("name='[a-z_]+'", foo)>
<cfset fieldvalues = rematch("value='[^']*'", foo)>
<!--- extract the values and build a struct of fieldname : value --->
<cfset keys = {}>
<cfloop from="1" to="#arraylen(fieldnames)#" index="index">
<cfset keys[rereplace(fieldnames[index], "name='|'", "", "all")] = rereplace(fieldvalues[index], "value='|'", "", "all")>
</cfloop>
<cfdump var="#keys#">
回答3:
Well here's one idea that may not be any better than simply regexing out every thing.
1) Add a closing slash to each of your input values so they look like so:
<input name='a_ot'
value=''
class='inputbox-highlighted-false'
size='1'
maxlength='1'/>
2) extract the whole table starting with the <table>
tag and ending with the </table>
tag.
3) Parse the table into an XML object using XMLParse as in:
Now you have an XML object with an array of TD tages each of which would have an INPUT child with attributes of name and value. You could use cfdump and loop code to extract or clean it up.
Again, this may not save you any time depending on how messy the HTML is and how hard you have to work to figure out the XML. Good luck.
来源:https://stackoverflow.com/questions/26917042/how-can-i-extract-field-name-and-values-on-a-form-easily