ID Best Practices for Databases

↘锁芯ラ 提交于 2019-11-30 04:03:43

问题


I was wondering what the best practices were for building and storing IDs. A few years ago, a professor told me about the dangers of a poorly constructed ID system, using the Social Security Number as an example. In particular, because SSNs do not have any error detection... it is impossible to tell the difference between a 9-digit string and a valid SSN. And now government agencies need things like Last Name + SSN or Birthday + SSN to keep track of your data and ensure its verification. Plus, your Social Security number is somewhat predictable based on where you were born.

Now I'm building a User database... and based off of this advice "userid mediumint auto_increment" would be unacceptable. Especially if I plan to use this ID as the primary identification for the user. (for example, if I allow the users to change their username, then the username would be more difficult to keep track than the numerical userid... requiring cascading foreign keys and whatnot.) Emails change, usernames can change, passwords change... but a userid should remain constant forever.

Clearly, auto_increment is only designed for surrogate_keys. That is, its a useful shortcut only when you already have a primary identification mechanism, but it shouldn't be used as an "innate identifier" for the data. Creating random UUID looks interesting, but the randomness turns me off.

And so I ask: whats the best practices for creating a "primary key" identification number?


回答1:


You are confusing internal database functionality with external search criteria.

Auto-increment surrogate keys are useful for internal application use. Never pass those on to the user. Identifying business objects, whether it is a user or an invoice, are done with unique information about the object, like SSN, CCN or DOB. Use as much info as necessary to uniquely identify the object.

I highly recommend that if you must supply some newly invented ID value to each customer, that it NOT be the field you link all the customer data tables on.




回答2:


The best practice is to use an auto-increment integer. There's no real reason it shouldn't be used as an "innate identifier". It'll provide the most compact usage in foreign keys and fastest searches. Almost any other value can change and is inappropriate for use as a key.




回答3:


Comparing SSNs to auto-incremented integers is apples and oranges. Personally, I avoid GUIDs / UUIDs / UIDs unless there will be so many records in the table that it becomes inefficient or unreasonable to use an integer.

It's very rare that you will find a true natural key. What seems unique today may change tomorrow based on business requirements / laws.




回答4:


Based on our conversation above in the comments, I'm posting this as an answer. It seems as though you believe that having a random, unique ID assigned to your users would provide them with enough security that you could forgo normal methods of authentication.

At any rate, I'm confused by your comparisons between secured data and auto-incrementing, integer-based ID columns in user tables. These two types of data should never ever be intermingled. Your credit card company shouldn't be using a CCN as a primary key in a database table, and the government shouldn't be using your name or SSN as a primary key in its database tables either.

Why should you (or anyone) authenticate users with only knowledge of some secured data? Corporations are no longer allowed to authenticate users based on their SSNs, and I know my credit card company doesn't identify me based on my CCN (especially since I have more than one, and have had the card numbers on the accounts changed several times).

Even if you implemented a UUID and generated some arbitrary random number, it's still just that: a number. Active Directory authentication uses GUIDs for its IDs, but also requires users to provide usernames and passwords. Using a larger or smaller data type as an ID column doesn't mean I can wash my hands of some other type of authentication or security.




回答5:


This is what sequences where designed to solve. Create a object that can atomically be increased per insert. In some DBs that is auto incremented integer and in others it's a sequence object but the idea is the same, ie create a key that can't conflict and is unique.

Also UUIDs as a ID is fine and I have used it before for special reasons. Why does the randomness "turn you off"? There is virtually no chance of a conflict.




回答6:


At the end of the day, the way to verify whether a given user's identifier is valid is the system itself. I.e., your system is the authoritative source for those identifiers. Is 555-45-9999 a valid SSN? The only way to know for sure is to have Social Security look it up and match it to the name of the person claiming to have that number. Sure, we can use the SSN identifier scheme to place a preliminary guess as to whether it is valid. However, only a lookup in their system will tell us for sure. The need for check digits would arise in highly distributed systems where, for example, you might want to allow other people to generate numbers honored by your system (e.g. shipping companies that let customers generate their own tracking numbers). Since it is your system that is going to generate the identifiers in an automated fashion, the best a check digit does for you is to help, in a rudimentary fashion, with validation on data entry or searches.



来源:https://stackoverflow.com/questions/4350369/id-best-practices-for-databases

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!