How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?

后端 未结 6 578
走了就别回头了
走了就别回头了 2020-12-25 14:59

I\'ve seen a bunch of different solutions on StackOverflow that span many years and many Postgres versions, but with some of the newer features like gen_random_bytes I want

相关标签:
6条回答
  • 2020-12-25 15:15

    Figured this out, here's a function that does it:

    CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
    DECLARE
      characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
      bytes BYTEA := gen_random_bytes(size);
      l INT := length(characters);
      i INT := 0;
      output TEXT := '';
    BEGIN
      WHILE i < size LOOP
        output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
        i := i + 1;
      END LOOP;
      RETURN output;
    END;
    $$ LANGUAGE plpgsql VOLATILE;
    

    And then to run it simply do:

    generate_uid(10)
    -- '3Rls4DjWxJ'
    

    Warning

    When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10 for any reasonably commonly created object, I just used 10 as a simple example.


    Usage

    With the function defined, you can use it in a table definition, like so:

    CREATE TABLE collections (
      id TEXT PRIMARY KEY DEFAULT generate_uid(10),
      name TEXT NOT NULL,
      ...
    );
    

    And then when inserting data, like so:

    INSERT INTO collections (name) VALUES ('One');
    INSERT INTO collections (name) VALUES ('Two');
    INSERT INTO collections (name) VALUES ('Three');
    SELECT * FROM collections;
    

    It will automatically generate the id values:

        id     |  name  | ...
    -----------+--------+-----
    owmCAx552Q | ian    |
    ZIofD6l3X9 | victor |
    

    Usage with a Prefix

    Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:

    CREATE TABLE collections (
      id TEXT PRIMARY KEY DEFAULT ('col_' || generate_uid(10)),
      name TEXT NOT NULL,
      ...
    );
    
    INSERT INTO collections (name) VALUES ('One');
    INSERT INTO collections (name) VALUES ('Two');
    INSERT INTO collections (name) VALUES ('Three');
    SELECT * FROM collections;
    
          id       |  name  | ...
    ---------------+--------+-----
    col_wABNZRD5Zk | ian    |
    col_ISzGcTVj8f | victor |
    
    0 讨论(0)
  • 2020-12-25 15:20

    This query generate required string. Just change second parasmeter of generate_series to choose length of random string.

    SELECT
         string_agg(c, '')
    FROM (
         SELECT
              chr(r + CASE WHEN r > 25 + 9 THEN 97 - 26 - 9 WHEN r > 9 THEN 64 - 9 ELSE 48 END) AS c
         FROM (
               SELECT
                     i,
                     (random() * 60)::int AS r
               FROM
                     generate_series(0, 62) AS i
              ) AS a
          ORDER BY i
         ) AS A;
    
    0 讨论(0)
  • 2020-12-25 15:24

    Thanks to Evan Carroll answer, I took a look on hashids.org. For Postgres you have to compile the extension or run some TSQL functions. But for my needs, I created something simpler based on hashids ideas (short, unguessable, unique, custom alphabet, avoid curse words).

    Shuffle alphabet:

    CREATE OR REPLACE FUNCTION consistent_shuffle(alphabet TEXT, salt TEXT) RETURNS TEXT AS $$
    DECLARE
        SALT_LENGTH INT := length(salt);
        integer INT = 0;
        temp TEXT = '';
        j INT = 0;
        v INT := 0;
        p INT := 0;
        i INT := length(alphabet) - 1;
        output TEXT := alphabet;
    BEGIN
        IF salt IS NULL OR length(LTRIM(RTRIM(salt))) = 0 THEN
            RETURN alphabet;
        END IF;
        WHILE i > 0 LOOP
            v := v % SALT_LENGTH;
            integer := ASCII(substr(salt, v + 1, 1));
            p := p + integer;
            j := (integer + v + p) % i;
    
            temp := substr(output, j + 1, 1);
            output := substr(output, 1, j) || substr(output, i + 1, 1) || substr(output, j + 2);
            output := substr(output, 1, i) || temp || substr(output, i + 2);
    
            i := i - 1;
            v := v + 1;
        END LOOP;
        RETURN output;
    END;
    $$ LANGUAGE plpgsql VOLATILE;
    

    The main function:

    CREATE OR REPLACE FUNCTION generate_uid(id INT, min_length INT, salt TEXT) RETURNS TEXT AS $$
    DECLARE
        clean_alphabet TEXT := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
        curse_chars TEXT := 'csfhuit';
        curse TEXT := curse_chars || UPPER(curse_chars);
        alphabet TEXT := regexp_replace(clean_alphabet, '[' || curse  || ']', '', 'gi');
        shuffle_alphabet TEXT := consistent_shuffle(alphabet, salt);
        char_length INT := length(alphabet);
        output TEXT := '';
    BEGIN
        WHILE id != 0 LOOP
            output := output || substr(shuffle_alphabet, (id % char_length) + 1, 1);
            id := trunc(id / char_length);
        END LOOP;
        curse := consistent_shuffle(curse, output || salt);
        output := RPAD(output, min_length, curse);
        RETURN output;
    END;
    $$ LANGUAGE plpgsql VOLATILE;
    
    

    How-to use examples:

    -- 3: min-length
    select generate_uid(123, 3, 'salt'); -- output: "0mH"
    
    -- or as default value in a table
    CREATE SEQUENCE IF NOT EXISTS my_id_serial START 1;
    CREATE TABLE collections (
        id TEXT PRIMARY KEY DEFAULT generate_uid(CAST (nextval('my_id_serial') AS INTEGER), 3, 'salt')
    );
    insert into collections DEFAULT VALUES ;
    
    0 讨论(0)
  • 2020-12-25 15:31

    So I had my own use-case for something like this. I am not proposing a solution to the top question, but if you are looking for something similar like I am, then try this out.

    My use-case was that I needed to create a random external UUID (as a primary key) with as few characters as possible. Thankfully, the scenario did not have a requirement that a large amount of these would ever be needed (probably in the thousands only). Therefore a simple solution was a combination of using generate_uid() and checking to make sure that the next sequence was not already used.

    Here is how I put it together:

    CREATE OR REPLACE FUNCTION generate_id (
        in length INT
    ,   in for_table text
    ,   in for_column text
    ,   OUT next_id TEXT
    ) AS
    $$
    DECLARE
        id_is_used BOOLEAN;
        loop_count INT := 0;
        characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
        loop_length INT;
    BEGIN
    
    LOOP
        next_id := '';
        loop_length := 0;
        WHILE loop_length < length LOOP
        next_id := next_id || substr(characters, get_byte(gen_random_bytes(length), loop_length) % length(characters) + 1, 1);
        loop_length := loop_length + 1;
        END LOOP;
    
        EXECUTE format('SELECT TRUE FROM %s WHERE %s = %s LIMIT 1', for_table, for_column, quote_literal(next_id)) into id_is_used;
    
        EXIT WHEN id_is_used IS NULL;
    
        loop_count := loop_count + 1;
    
        IF loop_count > 100 THEN
            RAISE EXCEPTION 'Too many loops. Might be reaching the practical limit for the given length.';
        END IF;
    END LOOP;
    
    
    END
    $$
    LANGUAGE plpgsql
    STABLE
    ;
    

    here is an example table usage:

    create table some_table (
        id
            TEXT
            DEFAULT generate_id(6, 'some_table', 'id')
            PRIMARY KEY
    )
    ;
    

    and a test to see how it breaks:

    DO
    $$
    DECLARE
        loop_count INT := 0;
    
    BEGIN
    
    -- WHILE LOOP
    WHILE loop_count < 1000000
    LOOP
    
        INSERT INTO some_table VALUES (DEFAULT);
        loop_count := loop_count + 1;
    END LOOP;
    
    END
    $$ LANGUAGE plpgsql
    ;
    
    0 讨论(0)
  • 2020-12-25 15:40

    I'm looking for something that gives me "shortcodes" (similar to what Youtube uses for video IDs) that are as short as possible while still containing only alphanumeric characters.

    This is a fundamentally different question from what you first asked. What you want here then is to put a serial type on the table, and to use hashids.org code for PostgreSQL.

    • This returns 1:1 with the unique number (serial)
    • Never repeats or has a chance of collision.
    • Also base62 [a-zA-Z0-9]

    Code looks like this,

    SELECT id, hash_encode(foo.id)
    FROM foo; -- Result: jNl for 1001
    
    SELECT hash_decode('jNl') -- returns 1001
    

    This module also supports salts.

    0 讨论(0)
  • 2020-12-25 15:41

    Review,

    1. 26 characters in [a-z]
    2. 26 characters in [A-Z]
    3. 10 characters in [0-9]
    4. 62 characters in [a-zA-Z0-9] (base62)
    5. The function substring(string [from int] [for int]) looks useful.

    So it looks something like this. First we demonstrate that we can take the random-range and pull from it.

    SELECT substring(
      'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
      1, -- 1 is 'a', 62 is '9'
      1,
    );
    

    Now we need a range between 1 and 63

    SELECT trunc(random()*62+1)::int+1
    FROM generate_series(1,1e2) AS gs(x)
    

    This gets us there.. Now we just have to join the two..

    SELECT substring(
      'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
      trunc(random()*62)::int+1
      1
    )
    FROM generate_series(1,1e2) AS gs(x);
    

    Then we wrap it in an ARRAY constructor (because this is fast)

    SELECT ARRAY(
      SELECT substring(
        'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
        trunc(random()*62)::int+1,
        1
      )
      FROM generate_series(1,1e2) AS gs(x)
    );
    

    And, we call array_to_string() to get a text.

    SELECT array_to_string(
      ARRAY(
          SELECT substring(
            'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
            trunc(random()*62)::int+1,
            1
          )
          FROM generate_series(1,1e2) AS gs(x)
      )
      , ''
    );
    

    From here we can even turn it into a function..

    CREATE FUNCTION random_string(randomLength int)
    RETURNS text AS $$
    SELECT array_to_string(
      ARRAY(
          SELECT substring(
            'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
            trunc(random()*62)::int+1,
            1
          )
          FROM generate_series(1,randomLength) AS gs(x)
      )
      , ''
    )
    $$ LANGUAGE SQL
    RETURNS NULL ON NULL INPUT
    VOLATILE LEAKPROOF;
    

    and then

    SELECT * FROM random_string(10);
    
    0 讨论(0)
提交回复
热议问题