I have quite large List named items (>= 1,000,000 items) and some condition denoted by
Since speed is the most important metric, there's the possibility of using more memory and doing less recreation of lists (as mentioned in my comment). Actual performance impact would be fully dependent on how the functionality is used, though.
The algorithm assumes that at least one of the following is true:
Disclaimer: There's prolly syntax errors - I didn't try compiling anything.
First, subclass the ArrayList
public class ConditionalArrayList extends ArrayList { public Iterator iterator(Condition condition) { return listIterator(condition); } public ListIterator listIterator(Condition condition) { return new ConditionalArrayListIterator(this.iterator(),condition); } public ListIterator listIterator(){ return iterator(); } public iterator(){ throw new InvalidArgumentException("You must specify a condition for the iterator"); } }
Then we need the helper classes:
public class ConditionalArrayListIterator implements ListIterator { private ListIterator listIterator; Condition condition; // the two following flags are used as a quick optimization so that // we don't repeat tests on known-good elements unnecessarially. boolean nextKnownGood = false; boolean prevKnownGood = false; public ConditionalArrayListIterator(ListIterator listIterator, Condition condition) { this.listIterator = listIterator; this.condition = condition; } public void add(Object o){ listIterator.add(o); } /** * Note that this it is extremely inefficient to * call hasNext() and hasPrev() alternatively when * there's a bunch of non-matching elements between * two matching elements. */ public boolean hasNext() { if( nextKnownGood ) return true; /* find the next object in the list that * matches our condition, if any. */ while( ! listIterator.hasNext() ) { Object next = listIterator.next(); if( condition.matches(next) ) { listIterator.set(next); nextKnownGood = true; return true; } } nextKnownGood = false; // no matching element was found. return false; } /** * See hasPrevious for efficiency notes. * Copy & paste of hasNext(). */ public boolean hasPrevious() { if( prevKnownGood ) return true; /* find the next object in the list that * matches our condition, if any. */ while( ! listIterator.hasPrevious() ) { Object prev = listIterator.next(); if( condition.matches(prev) ) { prevKnownGood = true; listIterator.set(prev); return true; } } // no matching element was found. prevKnwonGood = false; return false; } /** see hasNext() for efficiency note **/ public Object next() { if( nextKnownGood || hasNext() ) { prevKnownGood = nextKnownGood; nextKnownGood = false; return listIterator.next(); } throw NoSuchElementException("No more matching elements"); } /** see hasNext() for efficiency note; copy & paste of next() **/ public Object previous() { if( prevKnownGood || hasPrevious() ) { nextKnownGood = prevKnownGood; prevKnownGood = false; return listIterator.previous(); } throw NoSuchElementException("No more matching elements"); } /** * Note that nextIndex() and previousIndex() return the array index * of the value, not the number of results that this class has returned. * if this isn't good for you, just maintain your own current index and * increment or decriment in next() and previous() */ public int nextIndex(){ return listIterator.previousIndex(); } public int previousIndex(){ return listIterator.previousIndex(); } public remove(){ listIterator.remove(); } public set(Object o) { listIterator.set(o); } }
and, of course, we need the condition interface:
/** much like a comparator... **/ public interface Condition { public boolean matches(Object obj); }
And a condition with which to test
public class IsEvenCondition { { public boolean matches(Object obj){ return (Number(obj)).intValue() % 2 == 0; }
and we're finally ready for some test code
Condition condition = new IsEvenCondition(); System.out.println("preparing items"); startMillis = System.currentTimeMillis(); List<Integer> items = new ArrayList<Integer>(); // Integer is for demo for (int i = 0; i < 1000000; i++) { items.add(i * 3); // just for demo } endMillis = System.currentTimeMillis(); System.out.println("It took " + (endmillis-startmillis) + " to prepare the list. "); System.out.println("deleting items"); startMillis = System.currentTimeMillis(); // we don't actually ever remove from this list, so // removeMany is effectively "instantaneous" // items = removeMany(items); endMillis = System.currentTimeMillis(); System.out.println("after remove: items.size=" + items.size() + " and it took " + (endMillis - startMillis) + " milli(s)"); System.out.println("--> NOTE: Nothing is actually removed. This algorithm uses extra" + " memory to avoid modifying or duplicating the original list."); System.out.println("About to iterate through the list"); startMillis = System.currentTimeMillis(); int count = iterate(items, condition); endMillis = System.currentTimeMillis(); System.out.println("after iteration: items.size=" + items.size() + " count=" + count + " and it took " + (endMillis - startMillis) + " milli(s)"); System.out.println("--> NOTE: this should be somewhat inefficient." + " mostly due to overhead of multiple classes." + " This algorithm is designed (hoped) to be faster than " + " an algorithm where all elements of the list are used."); System.out.println("About to iterate through the list"); startMillis = System.currentTimeMillis(); int total = addFirst(30, items, condition); endMillis = System.currentTimeMillis(); System.out.println("after totalling first 30 elements: total=" + total + " and it took " + (endMillis - startMillis) + " milli(s)"); ... private int iterate(List<Integer> items, Condition condition) { // the i++ and return value are really to prevent JVM optimization // - just to be safe. Iterator iter = items.listIterator(condition); for( int i=0; iter.hasNext()); i++){ iter.next(); } return i; } private int addFirst(int n, List<Integer> items, Condition condition) { int total = 0; Iterator iter = items.listIterator(condition); for(int i=0; i<n;i++) { total += ((Integer)iter.next()).intValue(); } }
Rather than muddying my first answer, which is already rather long, here's a second, related option: you can create your own ArrayList, and flag things as "removed". This algoritm makes the assumptions:
Also, this is, again, not tested so there's prlolly syntax errors.
public class FlaggedList extends ArrayList { private Vector<Boolean> flags = new ArrayList(); private static final String IN = Boolean.TRUE; // not removed private static final String OUT = Boolean.FALSE; // removed private int removed = 0; public MyArrayList(){ this(1000000); } public MyArrayList(int estimate){ super(estimate); flags = new ArrayList(estimate); } public void remove(int idx){ flags.set(idx, OUT); removed++; } public boolean isRemoved(int idx){ return flags.get(idx); } }
and the iterator - more work may be needed to keep it synchronized, and many methods are left out, this time:
public class FlaggedListIterator implements ListIterator { int idx = 0; public FlaggedList list; public FlaggedListIterator(FlaggedList list) { this.list = list; } public boolean hasNext() { while(idx<list.size() && list.isRemoved(idx++)) ; return idx < list.size(); } }
I'm sorry, but all these answers are missing the point, I think: You probably don't have to, and probably shouldn't, use a List.
If this kind of "query" is common, why not build an ordered data structure that eliminates the need to traverse all the data nodes? You don't tell us enough about the problem, but given the example you provide a simple tree could do the trick. There's an insertion overhead per item, but you can very quickly find the subtree containing nodes that match , and you therefore avoid most of the comparisons you're doing now.
Furthermore:
Depending on the exact problem, and the exact data structure you set up, you can speed up deletion -- if the nodes you want to kill do reduce to a subtree or something of the sort, you just drop that subtree, rather than updating a whole slew of list nodes.
Each time you remove a list item, you are updating pointers -- eg lastNode.next and nextNode.prev or something -- but if it turns out you also want to remove the nextNode, then the pointer update you just caused is thrown away by a new update.)
Removing a lot of elements from an ArrayList
is an O(n^2)
operation. I would recommend simply using a LinkedList
that's more optimized for insertion and removal (but not for random access). LinkedList has a bit of a memory overhead.
If you do need to keep ArrayList
, then you are better off creating a new list.
Update: Comparing with creating a new list:
Reusing the same list, the main cost is coming from deleting the node and updating the appropriate pointers in LinkedList. This is a constant operation for any node.
When constructing a new list, the main cost is coming from creating the list, and initializing array entries. Both are cheap operations. You might incurre the cost of resizing the new list backend array as well; assuming that the final array is larger than half of the incoming array.
So if you were to remove only one element, then LinkedList
approach is probably faster. If you were to delete all nodes except for one, probably the new list approach is faster.
There are more complications when you bring memory management and GC. I'd like to leave these out.
The best option is to implement the alternatives yourself and benchmark the results when running your typical load.
I would imagine that building a new list, rather than modifying the existing list, would be more performant - especially when the number of items is as large as you indicate. This assumes, your list is an ArrayList
, not a LinkedList
. For a non-circular LinkedList
, insertion is O(n), but removal at an existing iterator position is O(1); in which case your naive algorithm should be sufficiently performant.
Unless the list is a LinkedList
, the cost of shifting the list each time you call remove()
is likely one of the most expensive parts of the implementation. For array lists, I would consider using:
public static <T> List<T> removeMany(List<T> items) {
List<T> newList = new ArrayList<T>(items.size());
Iterator<T> iter = items.iterator();
while (iter.hasNext()) {
T item = iter.next();
// <cond> goes here
if (/*<cond>: */i++ % 2 != 0) {
newList.add(item);
}
}
return newList;
}
Use Apache Commons Collections. Specifically this function. This is implemented in essentially the same way that people are suggesting that you implement it (i.e. create a new list and then add to it).