问题
I'm using an antlr for a simple CSV parser. I'd like to use it on a 29gig file, but it runs out of memory on the ANTLRInputStream call:
CharStream cs = new ANTLRInputStream(new BufferedInputStream(input,8192));
CSVLexer lexer = new CSVLexer(cs);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CSVParser parser = new CSVParser(tokens);
ParseTree tree = parser.file();
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(myListener, tree);
I tried to change it to be an unbuffered stream
CharStream cs= new UnbufferedCharStream(input)
CSVLexer lexer = new CSVLexer(cs);
lexer.setTokenFactory(new CommonTokenFactory(true));
TokenStream tokens = new UnbufferedTokenStream(lexer);
CSVParser parser = new CSVParser(tokens);
When I run the walker.walk() function it does not process any records. If I try something like
parser.setBuildParseTree(false);
parser.addParseListener(myListener);
It also fails. It seems like I have to parse the file differently if I don't build a parse tree, so I would like documentation or examples of how to do this.
If I don't use unbuffered char stream but I do use unbuffered token stream it gives error: Unbuffered stream cannot know its size. I tried different permutations but usually there is a java heap error or a "GC overhead limit exceeded".
I'm using this csv grammar
回答1:
I already answered a similar question here: https://stackoverflow.com/a/26120662/4094678
It seems like I have to parse the file differently if I don't build a parse tree, so I would like documentation or examples of how to do this.
Look for grammar actions in antlr book - like said in the linked answer, forget listener and visitor and building a parse tree. Even if this is not enough, split the file in a number of smaller ones and then parse each of them.
And of course as mentioned in the comments increase java vm memory.
来源:https://stackoverflow.com/questions/36602692/are-there-any-good-examples-to-references-where-setbuildparsetree-false