问题
I read about DOMParser
and SAXParser
in Java. I have no doubts in DOMParser and people prefer SAXParser than DOMParser, because of the memory it takes. However I understand the concept of SAXParser, i could not able to under this code:
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLFileSAX {
public static void main(String args[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
boolean bnname = false;
boolean bsalary = false;
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
}
public void endElement(String uri, String localName,
String qName)
throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (bfname) {
System.out.println("First Name : "
+ new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : "
+ new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : "
+ new String(ch, start, length));
bnname = false;
}
if (bsalary) {
System.out.println("Salary : "
+ new String(ch, start, length));
bsalary = false;
}
}
};
saxParser.parse("/home/anto/Groovy/Java/file.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
And the .xml file is :
<?xml version="1.0"?>
<company>
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff>
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
</company>
And when i run the program i get the output like this:
Start Element :company
Start Element :staff
Start Element :firstname
First Name : yong
End Element :firstname
Start Element :lastname
Last Name : mook kim
End Element :lastname
Start Element :nickname
Nick Name : mkyong
End Element :nickname
Start Element :salary
Salary : 100000
End Element :salary
End Element :staff
Start Element :staff
Start Element :firstname
First Name : low
End Element :firstname
Start Element :lastname
Last Name : yin fong
End Element :lastname
Start Element :nickname
Nick Name : fong fong
End Element :nickname
Start Element :salary
Salary : 200000
End Element :salary
End Element :staff
End Element :company
The output looks very fine, but i'm confused with the output! How the order of the output is been printed? Which handles this? Since this is the first time I have read SAX And DOM, i could not able to figure it, kindly help me.
回答1:
SAX is event-based. So, each time it sees a start tag, attribute, characters within a tag, end tag, ... it calls the appropriate function of the handler.
So the flow here is:
- See the
company
tag, callstartElement
for it - See the
staff
tag, callstartElement
for it - See the
firstname
tag, callstartElement
for it (which sets a boolean) - See characters ("yong"), call the
characters
function for them (which sees which boolean is set and prints the appropriate message and clears the flag) - See the closing
firstname
tag, call theendElement
function
...
回答2:
By calling saxParser.parse("/home/anto/Groovy/Java/file.xml", handler);
, The SAX Parser uses your DefaultHandler
(which is your handler
that you passed as parameter) that you implemented to do XML parsing.
SAX is event-based, these event is encountered when the parser traverses in your XML document. When SAX parser encounters a start of an element, example <firstname>
, it calls the startElement
method. It then, traverse to the body of the start element, and sees yong
. Since it's not enclosed in a <>
tag, it's considered a text node, therefore it calls the characters
method. If there was another XML element, it would call the startElement
again for the new XML element.
Finally, the SAX Parser traverses till it sees the end element </firstname>
and calls the endElement
method.
All these 3 methods startElement
, characters
and endElement
are implemented by the developer (in your case, YOU).
Don't forget, SAX traverses through your XML document only. It doesn't keep record of which node is a parent or child of which node.
Hope this helps!
回答3:
The power of SAX parser is its events. All you need to do is to override/implement the proper methods and the onus is on the parsing library to call the events in the order.
回答4:
The order looks fine to me. What's the issue?
If you're talking about the start and end elements, that just shows the XML tag nesting. You see that "company" comes before "staff", and "staff" before "firstname".
Finally that you have the data itself, inside the individual tags. That's why the last three lines are:
End Element :salary
End Element :staff
End Element :company
Because it's leaving the salary, salary is the last element of staff, and that's the final staff of the company.
回答5:
As parser reads input XML it calls startElement
on every opening tag, and it calls endElement
on every closing tag. If parser meets contents of tag, like yong
, it calls characters
.
Code you posted tracks which tag is currently parsed by using state variables bfname
, bsalary
, etc. Once characters
is called, your code knows which entity it's called for -- first name, last name or salary, so it can decipher raw characters string properly.
So, while writing your SAX parser, in fact you writing callbacks for tracking state of your parser inside XML -- which part of XML it's currently reads.
On the contrary, while using DOM parser, you get whole XML document converted to tree, so you can navigate from it's root to nodes, or backwards -- from nodes to root, in any manner you like.
回答6:
Near the end, you'll notice that the saxParser.parse()
method is given handler
as a parameter. The handler is an instance of DefaultHandler
that was defined earlier in the code. The SAXParser calls the appropriate method on the handler as it parses the XML document. Here is some Javadoc on DefaultHandler and SAXParser (see the documentation on the parse
methods). As the XML document is parsed and each method in the handler is called in turn, the handler method prints out the values that were processed.
回答7:
A SAX parser just iterates through a document, one character at a time. The parse()
method of the Parser takes a Handler
object. Various methods of this object get called by the parser when the parser encounters certain characters in the document (an "event"). So every time the parser encounters a start tag, it calls the startElement
method of the Handler, when it encounters an end tag it calls the endElement
method and so on. These methods in the DefaultHandler are empty. It is up to you to sub-class this class and provide your own implementation of these methods (in your code example above the Defaulthandler has been anonymously subclassed).
Unlike a DOM Parser a SAX Parser does not construct elements - it just fires the various handler methods as it encounters start and end tags and content characters. It is up to you to, within these methods, provide the logic the maps an end tag to a start tag and so on, which is what the condition statements are doing in the startElement and endElement methods. And the class variables blname
etc are just keeping track of what element the parser is currently in - so that you know what the characters relate to that are passed into the characters()
method.
来源:https://stackoverflow.com/questions/5540998/how-does-this-java-program-run