How to process logically related rows after ItemReader in SpringBatch?

前端 未结 5 2139
忘掉有多难
忘掉有多难 2020-12-15 05:51

Scenario

To make it simple, let\'s suppose I have an ItemReader that returns me 25 rows.

  1. The first 10 rows belong to student A

  2. The

相关标签:
5条回答
  • 2020-12-15 06:16

    because you changed your question i add a new answer

    if the students are ordered then there is no need for list/map, you could use exactly one studentObject on the processor to keep the "current" and aggregate on it until there is a new one (read: id change)

    if the students are not ordered you will never know when a specific student is "finished" and you'd have to keep all students in a map which can't be written until the end of the complete read sequence

    beware:

    • the processor needs to know when the reader is exhausted
    • its hard to get it working with any commit-rate and "id" concept if you aggregate items that are somehow identical the processor just can't know if the currently processed item is the last one
    • basically the usecase is either solved at reader level completely or at writer level (see other answer)
    private SimpleItem currentItem;
    private StepExecution stepExecution;
    
    @Override
    public SimpleItem process(SimpleItem newItem) throws Exception {
        SimpleItem returnItem = null;
    
        if (currentItem == null) {
            currentItem = new SimpleItem(newItem.getId(), newItem.getValue());
        } else if (currentItem.getId() == newItem.getId()) {
            // aggregate somehow
            String value = currentItem.getValue() + newItem.getValue();
            currentItem.setValue(value);
        } else {
            // "clone"/copy currentItem
            returnItem = new SimpleItem(currentItem.getId(), currentItem.getValue());
            // replace currentItem
            currentItem = newItem;
        }
    
        // reader exhausted?
        if(stepExecution.getExecutionContext().containsKey("readerExhausted")
                && (Boolean)stepExecution.getExecutionContext().get("readerExhausted")
                && currentItem.getId() == stepExecution.getExecutionContext().getInt("lastItemId")) {
            returnItem = new SimpleItem(currentItem.getId(), currentItem.getValue());
        }
    
        return returnItem;
    }
    
    0 讨论(0)
  • 2020-12-15 06:17

    I always follow this pattern:

    1. I make my reader scope to be "step", and in @PostConstruct I fetch the results, and put them in a Map
    2. In processor, I convert the associatedCollection into writable list, and send the writable list
    3. In ItemWriter, I persist the writable item(s) depending on the case
    0 讨论(0)
  • 2020-12-15 06:19

    In my application I created a CollectingJdbcCursorItemReader that extends the standard JdbcCursorItemReader and performs exactly what you need. Internally it uses my CollectingRowMapper: an extension of the standard RowMapper that maps multiple related rows to one object.

    Here is the code of the ItemReader, the code of CollectingRowMapper interface, and an abstract implementation of it, is available in another answer of mine.

    import java.sql.ResultSet;
    import java.sql.SQLException;
    
    import org.springframework.batch.item.ReaderNotOpenException;
    import org.springframework.batch.item.database.JdbcCursorItemReader;
    import org.springframework.jdbc.core.RowMapper;
    
    /**
     * A JdbcCursorItemReader that uses a {@link CollectingRowMapper}.
     * Like the superclass this reader is not thread-safe.
     * 
     * @author Pino Navato
     **/
    public class CollectingJdbcCursorItemReader<T> extends JdbcCursorItemReader<T> {
    
        private CollectingRowMapper<T> rowMapper;
        private boolean firstRead = true;
    
    
        /**
         * Accepts a {@link CollectingRowMapper} only.
         **/
        @Override
        public void setRowMapper(RowMapper<T> rowMapper) {
            this.rowMapper = (CollectingRowMapper<T>)rowMapper;
            super.setRowMapper(rowMapper);
         }
    
    
        /**
         * Read next row and map it to item.
         **/
        @Override
        protected T doRead() throws Exception {
            if (rs == null) {
                throw new ReaderNotOpenException("Reader must be open before it can be read.");
            }
    
            try {
                if (firstRead) {
                    if (!rs.next()) {  //Subsequent calls to next() will be executed by rowMapper
                        return null;
                    }
                    firstRead = false;
                } else if (!rowMapper.hasNext()) {
                    return null;
                }
                T item = readCursor(rs, getCurrentItemCount());
                return item;
            }
            catch (SQLException se) {
                throw getExceptionTranslator().translate("Attempt to process next row failed", getSql(), se);
            }
        }
    
        @Override
        protected T readCursor(ResultSet rs, int currentRow) throws SQLException {
            T result = super.readCursor(rs, currentRow);
            setCurrentItemCount(rs.getRow());
            return result;
        }
    
    }
    

    You can use it just like the classic JdbcCursorItemReader: the only requirement is that you provide it a CollectingRowMapper instead of the classic RowMapper.

    0 讨论(0)
  • 2020-12-15 06:29

    Use Step Execution Listener and store the records as map to the StepExecutionContext , you can then group them in the writer or writer listener and write it at a time

    0 讨论(0)
  • 2020-12-15 06:33

    basically you talk about batch processing with changing IDs(1), where the batch has to keep track of the change

    for spring/spring-batch we talk about:

    • ItemWriter which checks the list of items for an id change
    • before the change the items are stored in a temporary datastore(2) (List, Map, whatever), and are not written out
    • when the id changes, the aggregating/flattening business code runs on the items in the datastore and one item should be written, now the datastore can be used for the next items with the next id
    • this concept needs a reader which tells the step "i'm exhausted" to properly flush the temporary datastore on end of items (file/database)

    here a rough and simple code example

    @Override
    public void write(List<? extends SimpleItem> items) throws Exception {
    
        // setup with first sharedId at startup
        if (currentId == null){
            currentId = items.get(0).getSharedId();
        }
    
        // check for change of sharedId in input
        // keep items in temporary dataStore until id change of input
        // call delegate if there is an id change or if the reader is exhausted
        for (SimpleItem item : items) {
            // already known sharedId, add to tempData
            if (item.getSharedId() == currentId) {
                tempData.add(item);
            } else {
                // or new sharedId, write tempData, empty it, keep new id
                // the delegate does the flattening/aggregating
                delegate.write(tempData);
                tempData.clear();
                currentId = item.getSharedId();
                tempData.add(item);
            }
        }
    
        // check if reader is exhausted, flush tempData
        if ((Boolean) stepExecution.getExecutionContext().get("readerExhausted")
                && tempData.size() > 0) {
            delegate.write(tempData);
            // optional delegate.clear(); 
        }
    }
    

    (1)assuming the items are ordered by an ID (can be composite too)

    (2)a hashmap spring bean for thread safety

    0 讨论(0)
提交回复
热议问题