ORDER BY alphanumeric characters only in SQLite

后端 未结 5 1152
再見小時候
再見小時候 2021-02-05 15:24

I am sorting songs in SQLite (on Android). I want to order them:

  1. Case-insensitive
  2. With leading-digits at the end, by integer value.
  3. Without punct
相关标签:
5条回答
  • 2021-02-05 15:38

    In my opinion, the highest performance approach is to create a trigger to fill a new field named sort_key. You will need a primary key.

    CREATE TABLE songs (n INTEGER, name TEXT, 
                        sort_key TEXT, 
                        ID INTEGER PRIMARY KEY AUTOINCREMENT);
    
    CREATE TRIGGER songs_key_trigger
        AFTER INSERT ON songs FOR EACH ROW
        BEGIN n
            Declare @sort_key as varchar(255)
            -- calculate and call here your slugify function
            -- to fill sort_key from 'new.n' and 'new.name'
            UPDATE songs 
              SET sort_key = @sort_key
              WHERE ID = new.ID;
        END
    

    Realize that this approach is index friendly, you can create an index over new column to avoid table full scan operations.

    0 讨论(0)
  • 2021-02-05 15:45

    You can use the sqlite3 Android NDK Bindings to gain access to the full sqlite3 c API by using JNI calls.

    Then you can Define New Collating Sequences by using the sqlite3_create_collation_v2() and related functions.

    This approach does not change the database, as the collation is only overridden on the current database connection. So it satisfies that requirement in that it works if the database is read-only.

    Notice I say you can. I'm not saying you SHOULD! Weigh the pros and cons of this approach as in most cases it is probably not worth the extra effort.

    0 讨论(0)
  • 2021-02-05 15:51

    If you're allowed to create functions, this is what I'd create (taken from How to strip all non-alphabetic characters from string in SQL Server? and modified a bit):

    Create Function [dbo].[RemoveNonAlphaNumericCharacters](@Temp VarChar(1000))
    Returns VarChar(1000)
    AS
    Begin
    
        Declare @KeepValues as varchar(50)
        Set @KeepValues = '%[^a-zA-Z0-9\s]%'
        While PatIndex(@KeepValues, @Temp) > 0
            Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')
    
        Return @Temp
    End
    

    This would meet your #3 requirement and strip all the junk out of your string, then your query would look like this:

    SELECT n
    FROM songs
    ORDER BY
      CASE WHEN [dbo].[RemoveNonAlphaNumericCharacters](name) GLOB '[0-9]*' THEN 1
           ELSE 0
      END,
      CASE WHEN [dbo].[RemoveNonAlphaNumericCharacters](name) GLOB '[0-9]*' THEN CAST(name AS INT)
           ELSE [dbo].[RemoveNonAlphaNumericCharacters](name)
      END
    COLLATE NOCASE
    

    It doesn't look pretty and might not have best performance. I'd probably do, what Stefan suggested. Parse your song names and insert trimmed ones into a separate column just for ordering (And of course have index on that column). It should be best solution.

    0 讨论(0)
  • 2021-02-05 15:53

    The first solution (when DB and application can be modified):

    Add to your table single column e.g. solumntForSorting. Then on your application before inserting, concatenate your second condition ("With leading-digits at the end, by integer value.") as 0 or 1 to song name which first 'was cleaned' from undesired symbols. So on solumntForSorting you will get something like this: 0Im a Rainbow Too and 1911 is a Joke.

    The second solution (when only application can be modified):

    If you have to sort data excluding some symbols and you are not allowed to change your DB, you will get a slower selection because of filtering undesired values. Most of the overhead will be at CPU time and memory.

    Using replace function is tedious from my point of view, that is why I suggest using CTE with list of values you want to drop, like this ('.', '.', ';', '(', ')', '''', '-'). CTE will be bulky like multiple replace but it is easier to modify and maintain.

    Try this solution:

     WITH RECURSIVE 
     ordering_name_substr(len, name, subsstr, hex_subsstr, number) 
     AS (SELECT  length(name), name, substr(name, 1, 1), hex(substr(name, 1, 1)), 1  
           FROM songs
          UNION ALL 
         SELECT len, name, substr(name, number + 1, 1),
                hex(substr(name, number + 1, 1)), number + 1
           FROM ordering_name_substr WHERE number < len),
     last_order_cretaria(value, old_name)
      AS (select GROUP_CONCAT(subsstr, ''), name 
               from ordering_name_substr 
            where hex_subsstr not in
           ('28', '29', '2C', '2E', '27') group by name )
    
    SELECT S.n, S.name
    FROM songs AS S LEFT JOIN last_order_cretaria AS OC
    ON S.name = OC.old_name
    ORDER BY
      CASE WHEN name GLOB '[0-9]*' THEN 1
           ELSE 0
      END,
      CASE WHEN name GLOB '[0-9]*' THEN CAST(name AS INT)
           ELSE
             OC.value
      END
    COLLATE NOCASE
    

    I have tested on sqlfiddle.

    In the list ('28', '29', '2C', '2E', '27') you have values of ASCII codes (in hex) which you want to escape from be considered in ordering.

    You also can try to use values itself like: ('.', '.', ';', '(', ')', '''', '-').

    WITH RECURSIVE 
     ordering_name_substr(len, name, subsstr, number) 
     AS (SELECT length(name), name, substr(name, 1, 1), 1  
           FROM songs
          UNION ALL 
         SELECT len, name, substr(name, number + 1, 1),
                number + 1
           FROM ordering_name_substr WHERE number < len),
     last_order_cretaria(value, old_name)
      AS (select GROUP_CONCAT(subsstr, ''), name 
               from ordering_name_substr 
            where subsstr not in
           ('.', '.', ';', '(', ')', '''', '-') group by name )
    
    SELECT S.n, S.name
    FROM songs AS S LEFT JOIN last_order_cretaria AS OC
    ON S.name = OC.old_name
    ORDER BY
      CASE WHEN name GLOB '[0-9]*' THEN 1
           ELSE 0
      END,
      CASE WHEN name GLOB '[0-9]*' THEN CAST(name AS INT)
           ELSE
             OC.value
      END
    COLLATE NOCASE
    

    To make this sorting work fast and simple you have to be able to change your DB and application.

    0 讨论(0)
  • 2021-02-05 16:02

    I would add an additional column in the table, called "SortingName" or something. Calculate this value when inserting, ideally not in SQL but in a higher level language where you have all these nice string operations.

    I didn't really understand this thing with the number. I guess the simplest thing you can do is extract the number before insert and put it into another column, like "SortingNumber".

    Then simply sort like this:

    Order By
      SortingName,
      SortingNumber
    

    (Or the other way around.)

    Another advantage is performance. You usually read data much more often then you write it. You can even create indexes on these two sorting columns, which is usually not possible if you calculate it in the query.

    0 讨论(0)
提交回复
热议问题