Insert 1000s of records with relationship and ignore duplicates using JDBC & MySQL

让人想犯罪 __ 提交于 2019-12-24 08:16:36

问题


I am refactoring some code that was horribly inefficient but am still seeing huge load on both my MySQL and Java servers. We have an endpoint that allows a user to upload a CSV file containing contacts with a first name, last name, phone number, and email address. The phone number and email address need to be unique for a location. The phone number however is stored in separate table from a contact as they can have more than one. The CSV only allows one but they can update a contact manually to add more. Our users will likely upload files as big as 50,000 records.

This is my pertinent SQL structure:

Contact Table
+----+-----------+----------+------------------+------------+
| id | firstName | lastName | email            | locationId |
+----+-----------+----------+------------------+------------+
|  1 |      John |      Doe | jdoe@noemail.com |          1 |
+----+-----------+----------+------------------+------------+

Contact Phone Table
+----+-----------+--------------+---------+
| id | contactId |       number | primary |
+----+-----------+--------------+---------+
|  1 |         1 | +15555555555 |       1 |
+----+-----------+--------------+---------+
|  2 |         1 | +11231231234 |       0 |
+----+-----------+--------------+---------+

There are unique composite constraints on email & locationId in the contact table and contactId & number in the contact phone table.

The original programmer just created a loop in Java to loop through the CSV, query for the phone number and email (two separate queries) and insert if there wasn't a match one at a time. It was horrible and would just kill our server.


This is my latest attempt:

Stored Procedure:

DELIMITER $$
CREATE PROCEDURE save_bulk_contact(IN last_name VARCHAR(128), IN first_name VARCHAR(128), IN email VARCHAR(320), IN location_id BIGINT, IN organization_id BIGINT, IN phone_number VARCHAR(15))
  BEGIN
    DECLARE insert_id BIGINT;
    INSERT INTO contact
    (`lastName`, `firstName`, `primaryEmail`, `locationId`, `firstActiveDate`)
      VALUE (last_name, first_name, email, location_id, organization_id, UNIX_TIMESTAMP() * 1000);
    SET insert_id = LAST_INSERT_ID();
    INSERT INTO contact_phone
    (`contactId`, `number`, `type`, `primary`)
      VALUE (insert_id, phone_number, 'CELL', 1);
  END$$

DELIMITER ;

Then in Java I query for all of the contacts with phone numbers for the location, loop through them, remove the duplicates, and then use a batch update to insert them all.

Service Layer:

private ContactUploadJSON uploadContacts(ContactUploadJSON contactUploadJSON) throws HandledDataAccessException {
    List<ContactUploadData> returnList = new ArrayList<>();
    if (contactUploadJSON.getContacts() != null) {
        List<Contact> existingContacts = contactRepository.getContactsByLocationId(contactUploadJSON.getLocationId());
        List<ContactUploadData> uploadedContacts = contactUploadJSON.getContacts();

        Iterator<ContactUploadData> uploadedContactsIterator = uploadedContacts.iterator();

        while (uploadedContactsIterator.hasNext()) {
            ContactUploadData current = uploadedContactsIterator.next();

            boolean anyMatch = existingContacts.stream().anyMatch(existingContact -> {
                try {
                    boolean contactFound = contactEqualsContactUploadData(existingContact, current);
                    if(contactFound) {
                        contactUploadJSON.incrementExisted();
                        current.setError("Duplicate Contact: " + StringUtils.joinWith(" ", existingContact.getFirstName(), existingContact.getLastName()));
                        returnList.add(current);
                    }
                    return contactFound;
                } catch (PhoneParsingException | PhoneNotValidException e) {
                    contactUploadJSON.incrementFailed();
                    current.setError("Failed with error: " + e.getMessage());
                    returnList.add(current);
                    return true;
                }
            });

            if(anyMatch) {
                uploadedContactsIterator.remove();
            }
        }

        contactUploadJSON.setCreated(uploadedContacts.size());

        if(!uploadedContacts.isEmpty()){
            contactRepository.insertBulkContacts(uploadedContacts, contactUploadJSON.getLocationId());
        }
    }
    contactUploadJSON.setContacts(returnList);
    return contactUploadJSON;
}

private static boolean contactEqualsContactUploadData(Contact contact, ContactUploadData contactUploadData) throws PhoneParsingException, PhoneNotValidException {
    if(contact == null || contactUploadData == null) {
        return false;
    }

    String normalizedPhone = PhoneUtils.validatePhoneNumber(contactUploadData.getMobilePhone());

    List<ContactPhone> contactPhones = contact.getPhoneNumbers();
    if(contactPhones != null && contactPhones.stream().anyMatch(contactPhone -> StringUtils.equals(contactPhone.getNumber(), normalizedPhone))) {
        return true;
    }

    return (StringUtils.isNotBlank(contactUploadData.getEmail()) &&
            StringUtils.equals(contact.getPrimaryEmail(), contactUploadData.getEmail())) ||
            (contact.getPrimaryPhoneNumber() != null &&
                    StringUtils.equals(contact.getPrimaryPhoneNumber().getNumber(), normalizedPhone));
}

Repository Code:

public void insertBulkContacts(List<ContactUploadData> contacts, long locationId) throws HandledDataAccessException {

    String sql = "CALL save_bulk_contact(:last_name, :first_name, :email, :location_id, :phone_number)";

    try {
        List<Map<String, Object>> contactsList = new ArrayList<>();

        contacts.forEach(contact -> {
            Map<String, Object> contactMap = new HashMap<>();
            contactMap.put("last_name", contact.getLastName());
            contactMap.put("first_name", contact.getFirstName());
            contactMap.put("email", contact.getEmail());
            contactMap.put("location_id", locationId);
            contactMap.put("phone_number", contact.getMobilePhone());
            contactsList.add(contactMap);
        });

        Map<String, Object>[] paramList = contactsList.toArray(new Map[0]);

        namedJdbcTemplate.batchUpdate(sql, paramList);
    } catch (DataAccessException e) {
        log.severe("Failed to insert contacts:\n" + ExceptionUtils.getStackTrace(e));
        throw new HandledDataAccessException("Failed to insert contacts");
    }
}

The return ContactUploadJSON contains the contact list, the locationId, and metrics for add, already existing, and failed.

This solution works but I am wondering if there are better approaches? In the future we are going to want a mechanism for updating contacts, not just inserting new ones, so I have to plan accordingly. Is it possible to do this all in MySQL? Would it be more efficient? I think the one-to-many relationship with compound unique constraint makes it more difficult.

来源:https://stackoverflow.com/questions/50865897/insert-1000s-of-records-with-relationship-and-ignore-duplicates-using-jdbc-mys

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!