问题
I am refactoring some code that was horribly inefficient but am still seeing huge load on both my MySQL and Java servers. We have an endpoint that allows a user to upload a CSV file containing contacts with a first name, last name, phone number, and email address. The phone number and email address need to be unique for a location. The phone number however is stored in separate table from a contact as they can have more than one. The CSV only allows one but they can update a contact manually to add more. Our users will likely upload files as big as 50,000 records.
This is my pertinent SQL structure:
Contact Table
+----+-----------+----------+------------------+------------+
| id | firstName | lastName | email | locationId |
+----+-----------+----------+------------------+------------+
| 1 | John | Doe | jdoe@noemail.com | 1 |
+----+-----------+----------+------------------+------------+
Contact Phone Table
+----+-----------+--------------+---------+
| id | contactId | number | primary |
+----+-----------+--------------+---------+
| 1 | 1 | +15555555555 | 1 |
+----+-----------+--------------+---------+
| 2 | 1 | +11231231234 | 0 |
+----+-----------+--------------+---------+
There are unique composite constraints on email & locationId in the contact table and contactId & number in the contact phone table.
The original programmer just created a loop in Java to loop through the CSV, query for the phone number and email (two separate queries) and insert if there wasn't a match one at a time. It was horrible and would just kill our server.
This is my latest attempt:
Stored Procedure:
DELIMITER $$
CREATE PROCEDURE save_bulk_contact(IN last_name VARCHAR(128), IN first_name VARCHAR(128), IN email VARCHAR(320), IN location_id BIGINT, IN organization_id BIGINT, IN phone_number VARCHAR(15))
BEGIN
DECLARE insert_id BIGINT;
INSERT INTO contact
(`lastName`, `firstName`, `primaryEmail`, `locationId`, `firstActiveDate`)
VALUE (last_name, first_name, email, location_id, organization_id, UNIX_TIMESTAMP() * 1000);
SET insert_id = LAST_INSERT_ID();
INSERT INTO contact_phone
(`contactId`, `number`, `type`, `primary`)
VALUE (insert_id, phone_number, 'CELL', 1);
END$$
DELIMITER ;
Then in Java I query for all of the contacts with phone numbers for the location, loop through them, remove the duplicates, and then use a batch update to insert them all.
Service Layer:
private ContactUploadJSON uploadContacts(ContactUploadJSON contactUploadJSON) throws HandledDataAccessException {
List<ContactUploadData> returnList = new ArrayList<>();
if (contactUploadJSON.getContacts() != null) {
List<Contact> existingContacts = contactRepository.getContactsByLocationId(contactUploadJSON.getLocationId());
List<ContactUploadData> uploadedContacts = contactUploadJSON.getContacts();
Iterator<ContactUploadData> uploadedContactsIterator = uploadedContacts.iterator();
while (uploadedContactsIterator.hasNext()) {
ContactUploadData current = uploadedContactsIterator.next();
boolean anyMatch = existingContacts.stream().anyMatch(existingContact -> {
try {
boolean contactFound = contactEqualsContactUploadData(existingContact, current);
if(contactFound) {
contactUploadJSON.incrementExisted();
current.setError("Duplicate Contact: " + StringUtils.joinWith(" ", existingContact.getFirstName(), existingContact.getLastName()));
returnList.add(current);
}
return contactFound;
} catch (PhoneParsingException | PhoneNotValidException e) {
contactUploadJSON.incrementFailed();
current.setError("Failed with error: " + e.getMessage());
returnList.add(current);
return true;
}
});
if(anyMatch) {
uploadedContactsIterator.remove();
}
}
contactUploadJSON.setCreated(uploadedContacts.size());
if(!uploadedContacts.isEmpty()){
contactRepository.insertBulkContacts(uploadedContacts, contactUploadJSON.getLocationId());
}
}
contactUploadJSON.setContacts(returnList);
return contactUploadJSON;
}
private static boolean contactEqualsContactUploadData(Contact contact, ContactUploadData contactUploadData) throws PhoneParsingException, PhoneNotValidException {
if(contact == null || contactUploadData == null) {
return false;
}
String normalizedPhone = PhoneUtils.validatePhoneNumber(contactUploadData.getMobilePhone());
List<ContactPhone> contactPhones = contact.getPhoneNumbers();
if(contactPhones != null && contactPhones.stream().anyMatch(contactPhone -> StringUtils.equals(contactPhone.getNumber(), normalizedPhone))) {
return true;
}
return (StringUtils.isNotBlank(contactUploadData.getEmail()) &&
StringUtils.equals(contact.getPrimaryEmail(), contactUploadData.getEmail())) ||
(contact.getPrimaryPhoneNumber() != null &&
StringUtils.equals(contact.getPrimaryPhoneNumber().getNumber(), normalizedPhone));
}
Repository Code:
public void insertBulkContacts(List<ContactUploadData> contacts, long locationId) throws HandledDataAccessException {
String sql = "CALL save_bulk_contact(:last_name, :first_name, :email, :location_id, :phone_number)";
try {
List<Map<String, Object>> contactsList = new ArrayList<>();
contacts.forEach(contact -> {
Map<String, Object> contactMap = new HashMap<>();
contactMap.put("last_name", contact.getLastName());
contactMap.put("first_name", contact.getFirstName());
contactMap.put("email", contact.getEmail());
contactMap.put("location_id", locationId);
contactMap.put("phone_number", contact.getMobilePhone());
contactsList.add(contactMap);
});
Map<String, Object>[] paramList = contactsList.toArray(new Map[0]);
namedJdbcTemplate.batchUpdate(sql, paramList);
} catch (DataAccessException e) {
log.severe("Failed to insert contacts:\n" + ExceptionUtils.getStackTrace(e));
throw new HandledDataAccessException("Failed to insert contacts");
}
}
The return ContactUploadJSON contains the contact list, the locationId, and metrics for add, already existing, and failed.
This solution works but I am wondering if there are better approaches? In the future we are going to want a mechanism for updating contacts, not just inserting new ones, so I have to plan accordingly. Is it possible to do this all in MySQL? Would it be more efficient? I think the one-to-many relationship with compound unique constraint makes it more difficult.
来源:https://stackoverflow.com/questions/50865897/insert-1000s-of-records-with-relationship-and-ignore-duplicates-using-jdbc-mys