Should I obscure primary key values?

后端 未结 10 2190
借酒劲吻你
借酒劲吻你 2021-02-13 02:33

I\'m building a web application where the front end is a highly-specialized search engine. Searching is handled at the main URL, and the user is passed off to a sub-directory wh

相关标签:
10条回答
  • 2021-02-13 02:50

    What you're doing is basically obfuscation. A reversible encrypted (and base64 doesn't really count as encryption) primary key is still a primary key.

    What you were reading comes down to this: you generally don't want to have your primary keys have any kind of meaning outside the system. This is called a technical primary key rather than a natural primary key. That's why you might use an auto number field for Patient ID rather than SSN (which is called a natural primary key).

    Technical primary keys are generally favoured over natural primary keys because things that seem constant do change and this can cause problems. Even countries can come into existence and cease to exist.

    If you do have technical primary keys you don't want to make them de facto natural primary keys by giving them meaning they didn't otherwise have. I think it's fine to put a primary key in a URL but security is a separate topic. If someone can change that URL and get access to something they shouldn't have access to then it's a security problem and needs to be handled by authentication and authorization.

    Some will argue they should never be seen by users. I don't think you need to go that far.

    0 讨论(0)
  • 2021-02-13 02:50

    Just send the primary keys. As long as your database operations are sealed off from the user interface, this is no problem.

    0 讨论(0)
  • 2021-02-13 02:50

    For your purposes (building a search engine) the security tradeoffs benefits of encrypting database primary keys is negligible. Base64 encoding isn't encryption - it's security through obscurity and won't even be a speedbump to an attacker.

    0 讨论(0)
  • 2021-02-13 02:56

    If you're trying to secure database query input just use parametrized queries. There's no reason at all to hide primary keys if they are manipulated by the public.

    When you see base64 in the URL, you are pretty much guaranteed the developers of that site don't know what they are doing and the site is vulnerable.

    0 讨论(0)
  • 2021-02-13 02:57

    When I need a query string parameter to be able to identify a single row in a column, I normally add a GUID column to that table, and then pass the GUID in the connection string instead of the row's primary key value.

    0 讨论(0)
  • 2021-02-13 03:00

    PostgreSQL provides multiple solutions for this problem, and that could be adapted for others RDBMs:

    • hashids : https://hashids.org/postgresql/

      Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers. It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”. You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.

    • optimus is similar to hashids but provides only integers as output: https://github.com/jenssegers/optimus

    • skip32 at https://wiki.postgresql.org/wiki/Skip32_(crypt_32_bits):

      It may be used to generate series of unique values that look random, or to obfuscate a SERIAL primary key without loosing its unicity property.

    • pseudo_encrypt() at https://wiki.postgresql.org/wiki/Pseudo_encrypt:

      pseudo_encrypt(int) can be used as a pseudo-random generator of unique values. It produces an integer output that is uniquely associated to its integer input (by a mathematical permutation), but looks random at the same time, with zero collision. This is useful to communicate numbers generated sequentially without revealing their ordinal position in the sequence (for ticket numbers, URLs shorteners, promo codes...)

    • this article gives details on how this is done at Instagram: https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c and it boils down to:

      We’ve delegated ID creation to each table inside each shard, by using PL/PGSQL, Postgres’ internal programming language, and Postgres’ existing auto-increment functionality. Each of our IDs consists of: 41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch) 13 bits that represent the logical shard ID 10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond

    0 讨论(0)
提交回复
热议问题