I am reading a very large file and extracting some small portions of text from each line. However at the end of the operation, I am left with very little memory to work with
Make sure to not keep references you don't need any more.
You still have references to al
and in
.
Try adding al = null; in = null;
before calling the garbage collector.
Also, you need to realize how substring
is implemented. substring
keeps the original string, and just uses a different offset and length to the same char[]
array.
al.add(new String(s.substring(0,1)));
Not sure if there is a more elegant way of copying a substring. Maybe s.getChars()
is more useful for you, too.
As of Java 8, substring does now copy the characters. You can verify yourself that the constructor calls Arrays.copyOfRange
.
When making a substring, your substring keeps a reference to the char array of the original string (this optimization makes handling many substring of a string very fast). And so, as you keep your substrings in the al
list, you're keeping your whole file in memory. To avoid this, create a new String using the constructor that takes a string as argument.
So basically I'd suggest you do
while(in.hasNextLine()) {
String s = in.nextLine();
al.add(new String(s.substring(0,1))); // extracts first 1 character
}
The source code of the String(String) constructor explicitly states that its usage is to trim "the baggage" :
164 public String(String original) {
165 int size = original.count;
166 char[] originalValue = original.value;
167 char[] v;
168 if (originalValue.length > size) {
169 // The array representing the String is bigger than the new
170 // String itself. Perhaps this constructor is being called
171 // in order to trim the baggage, so make a copy of the array.
172 int off = original.offset;
173 v = Arrays.copyOfRange(originalValue, off, off+size);
174 } else {
175 // The array representing the String is the same
176 // size as the String, so no point in making a copy.
177 v = originalValue;
178 }
179 this.offset = 0;
180 this.count = size;
181 this.value = v;
Update : this problem is gone with OpenJDK 7, Update 6. People with a more recent version don't have the problem.
System.gc() is not a guarantee that JVM will garbage collect - it is only an advise to the JVM that it can try and garbage collect. As there is a lot of memory already available, JVM may ignore the advise and keep running till it feels the need to do so.
Read more at the documentation http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()
Another question that talks about it is available at When does System.gc() do anything