I use a JFLEX lexer to parse large files (~150 GB). As the parsing progresses, small documents that extracted from the files and passed as argument to an external method.