final Multimap terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection C = new HashSet(t
The Iterator
for the HashSet
class is a fail-fast iterator. From the documentation of the HashSet class:
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Note the last sentence - the fact that you are catching a ConcurrentModificationException
implies that another thread is modifying the collection. The same Javadoc API page also states:
If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
I believe the references to the Javadoc are self explanatory in what ought to be done next.
Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms
object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms
method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.
Note, that although a ConcurrentModificationException
can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet
is probably created in one thread, while the underlying collection for the MultiMap
is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.
Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.
Edit: ConcurrentModificationException
s thrown on Iterator.next
in a single-threaded scenario
This is with respect to the statement: if(c.isSomething()) C.remove(c);
that was introduced in the edited question.
Invoking Collection.remove
changes the nature of the question, for it now becomes possible to have ConcurrentModificationException
s thrown even in a single-threaded scenario.
The possibility arises out of the use of the method itself, in conjunction with the use of the Collection
's iterator, in this case the variable it
that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();
.
The Iterator
it
that iterates over Collection
C
stores state pertinent to the current state of the Collection
. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator
(an internal inner class of the HashMap
class that is used by the HashSet
) is used to iterate through the Collection
. A particular characteristic of this Iterator
is that it tracks the number of structural modifications performed on the Collection
(the HashMap
in this case) via it's Iterator.remove
method.
When you invoke remove
on the Collection
directly, and then follow it up with an invocation of Iterator.next
, the iterator throws a ConcurrentModificationException
, as Iterator.next
verifies whether any structural modifications of the Collection
have occurred that the Iterator
is unaware of. In this case, Collection.remove
causes a structural modification, that is tracked by the Collection
, but not by the Iterator
.
To overcome this part of the problem, you must invoke Iterator.remove
and not Collection.remove
, for this ensures that the Iterator
is now aware of the modification to the Collection
. The Iterator
in this case, will track the structural modification occurring through the remove
method. Your code should therefore look like the following:
final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
}
}
}
Vineet Reynolds has explained in great details the reasons why collections throw a ConcurrentModificationException
(thread-safety, concurrency). Swagatika has explained in great details the implementation details of this mechanism (how collection and iterator keep count of the number of modifications).
Their answers were interesting, and I upvoted them. But, in your case, the problem does not come from concurrency (you have only one thread), and implementation details, while interesting, should not be considered here.
You should only consider this part of the HashSet
javadoc:
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
In your code, you iterate over your HashSet using its iterator, but you use the HashSet's own remove method to remove elements ( C.remove(c)
), which causes the ConcurrentModificationException
. Instead, as explained in the javadoc, you should use the Iterator
's own remove()
method, which removes the element being currently iterated from the underlying collection.
Replace
if(c.isSomething()) C.remove(c);
with
if(c.isSomething()) it.remove();
If you want to use a more functional approach, you could create a Predicate and use Guava's Iterables.removeIf() method on the HashSet
:
Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}
PS: note that in both cases, this will only remove elements from the temporary HashSet
. The Multimap
won't be modified.
The reason is that you are trying to modify the collection outside iterator.
How it works :
When you create an iterator the collection maintains a modificationNum-variable for both the collection and the iterator independently. 1. The variable for collection is being incremented for each change made to the collection and and iterator. 2. The variable for iterator is being incremented for each change made to the iterator.
So when you call it.remove()
through iterator that increases the value of both the modification-number-variable by 1.
But again when you call collection.remove()
on collection directly, that increments only the value of the modification-numbervariable for the collection, but not the variable for the iterator.
And rule is : whenever the modification-number value for the iterator does not match with the original collection modification-number value, it gives ConcurrentModificationException.