问题
I have a 10gb file and I need to parse it in Java, whereas the following error arises when I attempt to do this.
java.lang.NegativeArraySizeException
at java.util.Arrays.copyOf(Arrays.java:2894)
at org.antlr.v4.runtime.ANTLRInputStream.load(ANTLRInputStream.java:123)
at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:86)
at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:82)
at org.antlr.v4.runtime.ANTLRInputStream.<init>(ANTLRInputStream.java:90)
How can I solve this problem properly? How can I adjust such an input stream to handle this error?
回答1:
It looks like ANTLR v4 has a pervasive hard-wired limitation that input stream size is less that 2^31 characters. Removing this limitation would not be a small task.
Take a look at the source code for the ANTLRInputStream
class - here.
As you can see, it attempts to hold the entire stream contents in a single char[]
. That ain't going to work ... for huge input files. But simply fixing that by buffering the data in a larger data structure isn't going to be the answer either. If you look further down the file, there are a number of other methods that use int
as the type for indexing the stream. They would need to be changed to use long
... and the changes will ripple out.
How can I solve this problem properly? How can I adjust such an input stream to handle this error?
Two approaches spring to mind:
Create your own version of ANTLR that supports large input files. This is a non-trivial project. I expect that the 32 bit assumption reaches into the code that ANTLR generates, etc.
Split your input files into smaller files before you attempt to parse them. Whether this is viable depends on the input syntax.
My recommendation would be the 2nd alternative. The problem with "supporting" huge input files (by in-memory buffering) is that it is going to be inefficient and memory wasteful ... and it ultimately doesn't scale.
You could also create an issue here, or ask on antlr-discussion.
回答2:
i never stumbled upon this error, but i guess your array gets too big and it's index overflows (e.g., the integer wraps around and becomes negative). use another data structure, and most importantly, don't load all of the file at once (use lazy loading instead, that means, load only those parts that are being accessed)
回答3:
I hope this will help http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
You might want to have some kind of buffer to read big files.
来源:https://stackoverflow.com/questions/24225568/negativearraysizeexception-antlrv4