MySQL: index json arrays of variable length?

后端 未结 3 1064
轻奢々
轻奢々 2021-02-04 06:58

I want to make a tags column of type json:

e.g.,

id  |  tags
=========================================
1   |  \'["tag1"         


        
3条回答
  •  无人及你
    2021-02-04 07:33

    By "extracts a scalar value", does this mean I must extract & index each item in the arrays individually [...]?

    You can extract as many items as you want. They will be stored as scalars (e.g. string), rather than as compound values (which JSON is).

    CREATE TABLE mytags (
        id INT NOT NULL AUTO_INCREMENT,
        tags JSON,
        PRIMARY KEY (id)
    );
    
    INSERT INTO mytags (tags) VALUES
        ('["tag1", "tag2", "tag3"]'),
        ('["tag1", "tag3", "tag5", "tag7"]'),
        ('["tag2", "tag5"]');
    
    SELECT * FROM mytags;
    
    +----+----------------------------------+
    | id | tags                             |
    +----+----------------------------------+
    |  1 | ["tag1", "tag2", "tag3"]         |
    |  2 | ["tag1", "tag3", "tag5", "tag7"] |
    |  3 | ["tag2", "tag5"]                 |
    +----+----------------------------------+
    

    Let's create an index with one item only (first value from the JSON object):

    ALTER TABLE mytags
        ADD COLUMN tags_scalar VARCHAR(255) GENERATED ALWAYS AS (json_extract(tags, '$[0]')),
        ADD INDEX tags_index (tags_scalar);
    
    SELECT * FROM mytags;
    
    +----+----------------------------------+-------------+
    | id | tags                             | tags_scalar |
    +----+----------------------------------+-------------+
    |  1 | ["tag1", "tag2", "tag3"]         | "tag1"      |
    |  2 | ["tag1", "tag3", "tag5", "tag7"] | "tag1"      |
    |  3 | ["tag2", "tag5"]                 | "tag2"      |
    +----+----------------------------------+-------------+
    

    Now you have an index on the VARCHAR column tags_scalar. The value contains quotes, which can also be skipped:

    ALTER TABLE mytags DROP COLUMN tags_scalar, DROP INDEX tags_index;
    
    ALTER TABLE mytags
        ADD COLUMN tags_scalar VARCHAR(255) GENERATED ALWAYS AS (json_unquote(json_extract(tags, '$[0]'))),
        ADD INDEX tags_index (tags_scalar);
    
    SELECT * FROM mytags;
    
    +----+----------------------------------+-------------+
    | id | tags                             | tags_scalar |
    +----+----------------------------------+-------------+
    |  1 | ["tag1", "tag2", "tag3"]         | tag1        |
    |  2 | ["tag1", "tag3", "tag5", "tag7"] | tag1        |
    |  3 | ["tag2", "tag5"]                 | tag2        |
    +----+----------------------------------+-------------+
    

    As you can already imagine, the generated column can include more items from the JSON:

    ALTER TABLE mytags DROP COLUMN tags_scalar, DROP INDEX tags_index;
    
    ALTER TABLE mytags
        ADD COLUMN tags_scalar VARCHAR(255) GENERATED ALWAYS AS (json_extract(tags, '$[0]', '$[1]', '$[2]')),
        ADD INDEX tags_index (tags_scalar);
    
    SELECT * from mytags;
    
    +----+----------------------------------+--------------------------+
    | id | tags                             | tags_scalar              |
    +----+----------------------------------+--------------------------+
    |  1 | ["tag1", "tag2", "tag3"]         | ["tag1", "tag2", "tag3"] |
    |  2 | ["tag1", "tag3", "tag5", "tag7"] | ["tag1", "tag3", "tag5"] |
    |  3 | ["tag2", "tag5"]                 | ["tag2", "tag5"]         |
    +----+----------------------------------+--------------------------+
    

    or use any other valid expression to auto-generate a string out of the JSON structure, in order to obtain something that can be easily indexed and searched like "tag1tag3tag5tag7".

    [...](meaning I must know the maximum length of the array to index them all)?

    As explained above, you don't need to know - NULL values can be skipped by using any valid expression. But of course it's always better to know.
    Now there's the architecture decision: Is JSON data type the most appropriate to achieve the goal? To solve this particular problem? Is JSON the right tool here? Is it going to speed up searching?

    How do I index a variable length array?

    If you insist, cast string:

    ALTER TABLE mytags DROP COLUMN tags_scalar, DROP INDEX tags_index;
    
    ALTER TABLE mytags
        ADD COLUMN tags_scalar VARCHAR(255) GENERATED ALWAYS AS (replace(replace(replace(cast(tags as char), '"', ''), '[', ''), ']', '')),
        ADD INDEX tags_index (tags_scalar);
    
    SELECT * from mytags;
    
    +----+----------------------------------+------------------------+
    | id | tags                             | tags_scalar            |
    +----+----------------------------------+------------------------+
    |  1 | ["tag1", "tag2", "tag3"]         | tag1, tag2, tag3       |
    |  2 | ["tag1", "tag3", "tag5", "tag7"] | tag1, tag3, tag5, tag7 |
    |  3 | ["tag2", "tag5"]                 | tag2, tag5             |
    +----+----------------------------------+------------------------+
    

    This way or another you end up with a VARCHAR or TEXT column, where you apply the most applicable index structure (some options).

    Further reading:

    • Indexing a Generated Column to Provide a JSON Column Index
    • Functions That Search JSON Values

提交回复
热议问题