How to persist data to a MySql database from a 32k row Excel using JPA Pagination?

问题

I have a large Excel file, with 32k rows, and a Java-Spring code that persist the Excel data to mySQL database. My code works for about 6k row's, but not for the entire Excel due to JPA limitation. I read that it can be done with JPA Pagination but all so far I found only info that collect data from DB (already persisted with data) and render to a UI. The Excel file contain 32k medicines, and this rows will be persisted into DB.

I have this Controller layer with the following method:

    public ResponseEntity<ResponseMessage> uploadFile(@RequestParam("file") MultipartFile file,
                                                    @RequestParam(defaultValue = "0") int page,
                                                    @RequestParam(defaultValue = "6000") int size) {

        String message = "";

        if (ExcelHelper.hasExcelFormat(file)) {
            try {
// the following 6 row are my patetic attempt to resolve with pagination
               List<Medicine> medicines = new ArrayList<>();
                Pageable paging = PageRequest.of(page, size);
                Page<Medicine> pageMedicamente = medicineRepositoryDao.save(paging);

                medicines = pageMedicamente.getContent();
                medicineService.save(file);

                message = "Uploaded the file successfully: " + file.getOriginalFilename();
                return ResponseEntity.status(HttpStatus.OK).body(new ResponseMessage(message));
            } catch (Exception e) {
                message = "Could not upload the file: " + file.getOriginalFilename() + "!";
                return ResponseEntity.status(HttpStatus.EXPECTATION_FAILED).body(new ResponseMessage(message));
            }
        }

And the Repository layer:

@Repository
public interface MedicineRepositoryDao extends JpaRepository<Medicine, Long> {


    Page<Medicine> save( Pageable pageable);

}

And also the Service layer:

        try {
            List<Medicine> medicines = ExcelHelper.excelToMedicine(file.getInputStream());
            medicineRepositoryDao.saveAll(medicines);
        } catch (IOException e) {
            throw new RuntimeException("fail to store excel data: " + e.getMessage());
        }
    }

回答1:

I think you have a couple of thinks mixed up here.

I don't think Spring has any relevant limitation on the number of rows you may persist here. But JPA does. JPA does keeps are reference to any entity that you save in its first level cache. So for large number of rows/entities this hogs memory and also makes some operations slower since entities get looked up or processed one by one.
Pagination is for reading entities, not for saving.

You have a couple of options in this situation.

Don't use JPA. For simply writing data from a file and writing it into a database JPA does hardly offer any benefit. This can almost trivially performed using just a JdbcTemplate or NamedParameterJdbcTemplate and will be much faster, since the overhead of JPA is skipped which you don't benefit from anyway in this scenario. If you want to use an ORM you might want to take a look at Spring Data JDBC which is conceptually simpler and doesn't keep references to entities and therefore should show better characteristics in this scenario. I recommend not using an ORM here since you don't seem to benefit from having entities, so creating them and then having the ORM extract the data from it is really a waste of time.
Break your import into batches. This means you persist e.g. 1000 rows at time, write them to the database and commit the transaction, before you continue with the next 1000 rows. For JPA this is pretty much a necessity for the reasons laid out above. With JDBC (i.e. JdbcTemplate&Co) this probably isn't necessary for 32K rows but might improve performance and might be useful for recoverability if an insert fails. Spring Batch will help you implement that.
While the previous point talks about batching in the sense of breaking your import into chunks you should also look into batching on the JDBC side, where you send multiple statements, or a single statements with multiple sets of parameters in one go to the database, which again should improve performance.
Finally there are often alternatives outside of the Javaverse that might be more suitable for the job. Some databases have tools to load flat files extremely efficient.

来源：https://stackoverflow.com/questions/65116990/how-to-persist-data-to-a-mysql-database-from-a-32k-row-excel-using-jpa-paginatio

标签

java

spring-boot

jpa

spring-data-jpa

pagination