Unique key with NULLs

前端 未结 10 784
一整个雨季
一整个雨季 2020-12-03 00:48

This question requires some hypothetical background. Let\'s consider an employee table that has columns name, date_of_birth, tit

相关标签:
10条回答
  • 2020-12-03 01:05

    There is a another way to do it. Adding a column(non-nullable) to represent the String value of date_of_birth column. The new column value would be ""(empty string) if date_of_birth is null.

    We name the column as date_of_birth_str and create a unique constraint employee(name, date_of_birth_str). So when two recoreds come with the same name and null date_of_birth value, the unique constraint still works.

    But the efforts of maintenance for the two same-meaning columns, and, the performance harm of new column, should be considered carefully.

    0 讨论(0)
  • 2020-12-03 01:06

    You can add a generated column where the NULL value is replaced by an unused constant, e.g. zero. Then you can apply the unique constraint to this column:

    CREATE TABLE employee ( 
      name VARCHAR(50) NOT NULL, 
      date_of_birth DATE, 
      uq_date_of_birth DATE AS (IFNULL(date_of_birth, '0000-00-00')) UNIQUE
    );
    
    0 讨论(0)
  • 2020-12-03 01:07

    Your problem of not having duplicates based on name is not solvable because you do not have a natural key. Putting a fake date in for people whose date of birth is unknown will not solve your problem. John Smith born 1900/01/01 is still going to be a differnt person than John Smithh born 1960/03/09.

    I work with name data from large and small organizations every day and I can assure you they have two different people with the same name all the time. Sometimes with the same job title. Birthdate is no guarantee of uniqueness either, plenty of John Smiths born on the same date. Heck when we work with physicians office data we have often have two doctors with the same name, address and phone number (father and son combinations)

    Your best bet is to have an employee ID if you are inserting employee data to identify each employee uniquely. Then check for the uniquename in the user interface and if there are one or more matches, ask the user if he meant them and if he says no, insert the record. Then build a deupping process to fix problems if someone gets assigned two ids by accident.

    0 讨论(0)
  • 2020-12-03 01:08

    I recommend to create additional table column checksum which will contain md5 hash of name and date_of_birth. Drop unique key (name, date_of_birth) because it doesn't solve the problem. Create one unique key on checksum.

    ALTER TABLE employee 
        ADD COLUMN checksum CHAR(32) NOT NULL;
    
    UPDATE employee 
    SET checksum = MD5(CONCAT(name, IFNULL(date_of_birth, '')));
    
    ALTER TABLE employee 
        ADD UNIQUE (checksum);
    

    This solution creates small technical overhead, cause for every inserted pairs you need to generate hash (same thing for every search query). For further improvements you can add trigger that will generate hash for you in every insert:

    CREATE TRIGGER before_insert_employee 
    BEFORE INSERT ON employee
    FOR EACH ROW
        IF new.checksum IS NULL THEN
          SET new.checksum = MD5(CONCAT(new.name, IFNULL(new.date_of_birth, '')));
        END IF;
    
    0 讨论(0)
  • 2020-12-03 01:13

    In simple words,the role of Unique constraint is to make the field or column. The null destroys this property as database treats null as unknown

    Inorder to avoid duplicates and allow null:

    Make unique key as Primary key

    0 讨论(0)
  • 2020-12-03 01:22

    A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property.

    There are two possible solutions to your problem:

    • One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell.

    • A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you know are unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.

    0 讨论(0)
提交回复
热议问题