Piccolo claims to be pretty fast. Can't say I've used it myself though. You might also try JDOM. As ever, benchmark with representative data of your real load.
It partly depends on what you're trying to do. Do you need to pull the whole document into memory, or can you operate in a streaming manner? Different approaches have different trade-offs and are better for different situations.