How to efficiently (performance) remove many items from List in Java?

后端 未结 12 642
迷失自我
迷失自我 2021-01-31 09:00

I have quite large List named items (>= 1,000,000 items) and some condition denoted by that selects items to be deleted and is true for many (maybe hal

12条回答
  •  既然无缘
    2021-01-31 09:28

    Since speed is the most important metric, there's the possibility of using more memory and doing less recreation of lists (as mentioned in my comment). Actual performance impact would be fully dependent on how the functionality is used, though.

    The algorithm assumes that at least one of the following is true:

    • all elements of the original list do not need to be tested. This could happen if we're really looking for the first N elements that match our condition, rather than all elements that match our condition.
    • it's more expensive to copy the list into new memory. This could happen if the original list uses more than 50% of allocated memory, so working in-place could be better or if memory operations turn out to be slower (that would be an unexpected result).
    • the speed penalty of removing elements from the list is too large to accept all at once, but spreading that penalty across multiple operations is acceptable, even if the overall penalty is larger than taking it all at once. This is like taking out a $200K mortgage: paying $1000 per month for 30 years is affordable on a monthly basis and has the benefits of owning a home and equity, even though the overall payment is 360K over the life of the loan.

    Disclaimer: There's prolly syntax errors - I didn't try compiling anything.

    First, subclass the ArrayList

    public class ConditionalArrayList extends ArrayList {
    
      public Iterator iterator(Condition condition)
      { 
        return listIterator(condition);
      }
    
      public ListIterator listIterator(Condition condition)
      {
        return new ConditionalArrayListIterator(this.iterator(),condition); 
      }
    
      public ListIterator listIterator(){ return iterator(); }
      public iterator(){ 
        throw new InvalidArgumentException("You must specify a condition for the iterator"); 
      }
    }
    

    Then we need the helper classes:

    public class ConditionalArrayListIterator implements ListIterator
    {
      private ListIterator listIterator;
      Condition condition;
    
      // the two following flags are used as a quick optimization so that 
      // we don't repeat tests on known-good elements unnecessarially.
      boolean nextKnownGood = false;
      boolean prevKnownGood = false;
    
      public ConditionalArrayListIterator(ListIterator listIterator, Condition condition)
      {
        this.listIterator = listIterator;
        this.condition = condition;
      }
    
      public void add(Object o){ listIterator.add(o); }
    
      /**
       * Note that this it is extremely inefficient to 
       * call hasNext() and hasPrev() alternatively when
       * there's a bunch of non-matching elements between
       * two matching elements.
       */
      public boolean hasNext()
      { 
         if( nextKnownGood ) return true;
    
         /* find the next object in the list that 
          * matches our condition, if any.
          */
         while( ! listIterator.hasNext() )
         {
           Object next = listIterator.next();
           if( condition.matches(next) ) {
             listIterator.set(next);
             nextKnownGood = true;
             return true;
           }
         }
    
         nextKnownGood = false;
         // no matching element was found.
         return false;
      }
    
      /**
       *  See hasPrevious for efficiency notes.
       *  Copy & paste of hasNext().
       */
      public boolean hasPrevious()
      { 
         if( prevKnownGood ) return true;
    
         /* find the next object in the list that 
          * matches our condition, if any.
          */
         while( ! listIterator.hasPrevious() )
         {
           Object prev = listIterator.next();
           if( condition.matches(prev) ) {
             prevKnownGood = true;
             listIterator.set(prev);
             return true;
           }
         }
    
         // no matching element was found.
         prevKnwonGood = false;
         return false;
      }
    
      /** see hasNext() for efficiency note **/
      public Object next()
      {
         if( nextKnownGood || hasNext() ) 
         { 
           prevKnownGood = nextKnownGood;
           nextKnownGood = false;
           return listIterator.next();
         }
    
         throw NoSuchElementException("No more matching elements");
      }
    
      /** see hasNext() for efficiency note; copy & paste of next() **/
      public Object previous()
      {
         if( prevKnownGood || hasPrevious() ) 
         { 
           nextKnownGood = prevKnownGood;
           prevKnownGood = false;
           return listIterator.previous();                        
         }
         throw NoSuchElementException("No more matching elements");
      }
    
      /** 
       * Note that nextIndex() and previousIndex() return the array index
       * of the value, not the number of results that this class has returned.
       * if this isn't good for you, just maintain your own current index and
       * increment or decriment in next() and previous()
       */
      public int nextIndex(){ return listIterator.previousIndex(); }
      public int previousIndex(){ return listIterator.previousIndex(); }
    
      public remove(){ listIterator.remove(); }
      public set(Object o) { listIterator.set(o); }
    }
    

    and, of course, we need the condition interface:

    /** much like a comparator... **/
    public interface Condition
    {
      public boolean matches(Object obj);
    }
    

    And a condition with which to test

    public class IsEvenCondition {
    {
      public boolean matches(Object obj){ return (Number(obj)).intValue() % 2 == 0;
    }
    

    and we're finally ready for some test code

    
        Condition condition = new IsEvenCondition();
    
        System.out.println("preparing items");
        startMillis = System.currentTimeMillis();
        List items = new ArrayList(); // Integer is for demo
        for (int i = 0; i < 1000000; i++) {
            items.add(i * 3); // just for demo
        }
        endMillis = System.currentTimeMillis();
        System.out.println("It took " + (endmillis-startmillis) + " to prepare the list. ");
    
        System.out.println("deleting items");
        startMillis = System.currentTimeMillis();
        // we don't actually ever remove from this list, so 
        // removeMany is effectively "instantaneous"
        // items = removeMany(items);
        endMillis = System.currentTimeMillis();
        System.out.println("after remove: items.size=" + items.size() + 
                " and it took " + (endMillis - startMillis) + " milli(s)");
        System.out.println("--> NOTE: Nothing is actually removed.  This algorithm uses extra"
                           + " memory to avoid modifying or duplicating the original list.");
    
        System.out.println("About to iterate through the list");
        startMillis = System.currentTimeMillis();
        int count = iterate(items, condition);
        endMillis = System.currentTimeMillis();
        System.out.println("after iteration: items.size=" + items.size() + 
                " count=" + count + " and it took " + (endMillis - startMillis) + " milli(s)");
        System.out.println("--> NOTE: this should be somewhat inefficient."
                           + " mostly due to overhead of multiple classes."
                           + " This algorithm is designed (hoped) to be faster than "
                           + " an algorithm where all elements of the list are used.");
    
        System.out.println("About to iterate through the list");
        startMillis = System.currentTimeMillis();
        int total = addFirst(30, items, condition);
        endMillis = System.currentTimeMillis();
        System.out.println("after totalling first 30 elements: total=" + total + 
                " and it took " + (endMillis - startMillis) + " milli(s)");
    
    ...
    
    private int iterate(List items, Condition condition)
    {
      // the i++ and return value are really to prevent JVM optimization
      // - just to be safe.
      Iterator iter = items.listIterator(condition);
      for( int i=0; iter.hasNext()); i++){ iter.next(); }
      return i;
    }
    
    private int addFirst(int n, List items, Condition condition)
    {
      int total = 0;
      Iterator iter = items.listIterator(condition);
      for(int i=0; i
        

提交回复
热议问题