I\'ve seen a bunch of different solutions on StackOverflow that span many years and many Postgres versions, but with some of the newer features like gen_random_bytes I want
Figured this out, here's a function that does it:
CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
bytes BYTEA := gen_random_bytes(size);
l INT := length(characters);
i INT := 0;
output TEXT := '';
BEGIN
WHILE i < size LOOP
output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
i := i + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
And then to run it simply do:
generate_uid(10)
-- '3Rls4DjWxJ'
When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10
for any reasonably commonly created object, I just used 10
as a simple example.
With the function defined, you can use it in a table definition, like so:
CREATE TABLE collections (
id TEXT PRIMARY KEY DEFAULT generate_uid(10),
name TEXT NOT NULL,
...
);
And then when inserting data, like so:
INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;
It will automatically generate the id
values:
id | name | ...
-----------+--------+-----
owmCAx552Q | ian |
ZIofD6l3X9 | victor |
Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:
CREATE TABLE collections (
id TEXT PRIMARY KEY DEFAULT ('col_' || generate_uid(10)),
name TEXT NOT NULL,
...
);
INSERT INTO collections (name) VALUES ('One');
INSERT INTO collections (name) VALUES ('Two');
INSERT INTO collections (name) VALUES ('Three');
SELECT * FROM collections;
id | name | ...
---------------+--------+-----
col_wABNZRD5Zk | ian |
col_ISzGcTVj8f | victor |
This query generate required string. Just change second parasmeter of generate_series to choose length of random string.
SELECT
string_agg(c, '')
FROM (
SELECT
chr(r + CASE WHEN r > 25 + 9 THEN 97 - 26 - 9 WHEN r > 9 THEN 64 - 9 ELSE 48 END) AS c
FROM (
SELECT
i,
(random() * 60)::int AS r
FROM
generate_series(0, 62) AS i
) AS a
ORDER BY i
) AS A;
Thanks to Evan Carroll answer, I took a look on hashids.org. For Postgres you have to compile the extension or run some TSQL functions. But for my needs, I created something simpler based on hashids ideas (short, unguessable, unique, custom alphabet, avoid curse words).
Shuffle alphabet:
CREATE OR REPLACE FUNCTION consistent_shuffle(alphabet TEXT, salt TEXT) RETURNS TEXT AS $$
DECLARE
SALT_LENGTH INT := length(salt);
integer INT = 0;
temp TEXT = '';
j INT = 0;
v INT := 0;
p INT := 0;
i INT := length(alphabet) - 1;
output TEXT := alphabet;
BEGIN
IF salt IS NULL OR length(LTRIM(RTRIM(salt))) = 0 THEN
RETURN alphabet;
END IF;
WHILE i > 0 LOOP
v := v % SALT_LENGTH;
integer := ASCII(substr(salt, v + 1, 1));
p := p + integer;
j := (integer + v + p) % i;
temp := substr(output, j + 1, 1);
output := substr(output, 1, j) || substr(output, i + 1, 1) || substr(output, j + 2);
output := substr(output, 1, i) || temp || substr(output, i + 2);
i := i - 1;
v := v + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
The main function:
CREATE OR REPLACE FUNCTION generate_uid(id INT, min_length INT, salt TEXT) RETURNS TEXT AS $$
DECLARE
clean_alphabet TEXT := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
curse_chars TEXT := 'csfhuit';
curse TEXT := curse_chars || UPPER(curse_chars);
alphabet TEXT := regexp_replace(clean_alphabet, '[' || curse || ']', '', 'gi');
shuffle_alphabet TEXT := consistent_shuffle(alphabet, salt);
char_length INT := length(alphabet);
output TEXT := '';
BEGIN
WHILE id != 0 LOOP
output := output || substr(shuffle_alphabet, (id % char_length) + 1, 1);
id := trunc(id / char_length);
END LOOP;
curse := consistent_shuffle(curse, output || salt);
output := RPAD(output, min_length, curse);
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
How-to use examples:
-- 3: min-length
select generate_uid(123, 3, 'salt'); -- output: "0mH"
-- or as default value in a table
CREATE SEQUENCE IF NOT EXISTS my_id_serial START 1;
CREATE TABLE collections (
id TEXT PRIMARY KEY DEFAULT generate_uid(CAST (nextval('my_id_serial') AS INTEGER), 3, 'salt')
);
insert into collections DEFAULT VALUES ;
So I had my own use-case for something like this. I am not proposing a solution to the top question, but if you are looking for something similar like I am, then try this out.
My use-case was that I needed to create a random external UUID (as a primary key) with as few characters as possible. Thankfully, the scenario did not have a requirement that a large amount of these would ever be needed (probably in the thousands only). Therefore a simple solution was a combination of using generate_uid()
and checking to make sure that the next sequence was not already used.
Here is how I put it together:
CREATE OR REPLACE FUNCTION generate_id (
in length INT
, in for_table text
, in for_column text
, OUT next_id TEXT
) AS
$$
DECLARE
id_is_used BOOLEAN;
loop_count INT := 0;
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
loop_length INT;
BEGIN
LOOP
next_id := '';
loop_length := 0;
WHILE loop_length < length LOOP
next_id := next_id || substr(characters, get_byte(gen_random_bytes(length), loop_length) % length(characters) + 1, 1);
loop_length := loop_length + 1;
END LOOP;
EXECUTE format('SELECT TRUE FROM %s WHERE %s = %s LIMIT 1', for_table, for_column, quote_literal(next_id)) into id_is_used;
EXIT WHEN id_is_used IS NULL;
loop_count := loop_count + 1;
IF loop_count > 100 THEN
RAISE EXCEPTION 'Too many loops. Might be reaching the practical limit for the given length.';
END IF;
END LOOP;
END
$$
LANGUAGE plpgsql
STABLE
;
here is an example table usage:
create table some_table (
id
TEXT
DEFAULT generate_id(6, 'some_table', 'id')
PRIMARY KEY
)
;
and a test to see how it breaks:
DO
$$
DECLARE
loop_count INT := 0;
BEGIN
-- WHILE LOOP
WHILE loop_count < 1000000
LOOP
INSERT INTO some_table VALUES (DEFAULT);
loop_count := loop_count + 1;
END LOOP;
END
$$ LANGUAGE plpgsql
;
I'm looking for something that gives me "shortcodes" (similar to what Youtube uses for video IDs) that are as short as possible while still containing only alphanumeric characters.
This is a fundamentally different question from what you first asked. What you want here then is to put a serial
type on the table, and to use hashids.org code for PostgreSQL.
[a-zA-Z0-9]
Code looks like this,
SELECT id, hash_encode(foo.id)
FROM foo; -- Result: jNl for 1001
SELECT hash_decode('jNl') -- returns 1001
This module also supports salts.
Review,
[a-z]
[A-Z]
[0-9]
[a-zA-Z0-9]
(base62)So it looks something like this. First we demonstrate that we can take the random-range and pull from it.
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
1, -- 1 is 'a', 62 is '9'
1,
);
Now we need a range between 1
and 63
SELECT trunc(random()*62+1)::int+1
FROM generate_series(1,1e2) AS gs(x)
This gets us there.. Now we just have to join the two..
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1
1
)
FROM generate_series(1,1e2) AS gs(x);
Then we wrap it in an ARRAY constructor (because this is fast)
SELECT ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
);
And, we call array_to_string()
to get a text.
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
)
, ''
);
From here we can even turn it into a function..
CREATE FUNCTION random_string(randomLength int)
RETURNS text AS $$
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,randomLength) AS gs(x)
)
, ''
)
$$ LANGUAGE SQL
RETURNS NULL ON NULL INPUT
VOLATILE LEAKPROOF;
and then
SELECT * FROM random_string(10);