Can you split/explode a field in a MySQL query?

后端 未结 17 864
无人及你
无人及你 2020-11-22 04:03

I have to create a report on some student completions. The students each belong to one client. Here are the tables (simplified for this question).

CREATE TAB         


        
17条回答
  •  长发绾君心
    2020-11-22 04:29

    It is possible to explode a string in a MySQL SELECT statement.

    Firstly generate a series of numbers up to the largest number of delimited values you wish to explode. Either from a table of integers, or by unioning numbers together. The following generates 100 rows giving the values 1 to 100. It can easily be expanded to give larger ranges (add another sub query giving the values 0 to 9 for hundreds - hence giving 0 to 999, etc).

    SELECT 1 + units.i + tens.i * 10 AS aNum
    FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
    CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
    

    This can be cross joined against your table to give you the values. Note that you use SUBSTRING_INDEX to get the delimited value up to a certain value, and then use SUBSTRING_INDEX to get that value, excluding previous ones.

    SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
    FROM clients
    CROSS JOIN
    (
        SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
        FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
        CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
    ) sub0
    

    As you can see there is a slight issue here that the last delimited value is repeated many times. To get rid of this you need to limit the range of numbers based on how many delimiters there are. This can be done by taking the length of the delimited field and comparing it to the length of the delimited field with the delimiters changed to '' (to remove them). From this you can get the number of delimiters:-

    SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name
    FROM clients
    INNER JOIN
    (
        SELECT 1 + units.i + tens.i * 10 AS aNum
        FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
        CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
    ) sub0
    ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
    

    In the original example field you could (for example) count the number of students on each course based on this. Note that I have changed the sub query that gets the range of numbers to bring back 2 numbers, 1 is used to determine the course name (as these are based on starting at 1) and the other gets the subscript (as they are based starting at 0).

    SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(clients.courseNames, ',', sub0.aNum), ',', -1) AS a_course_name, COUNT(clientenrols.studentId)
    FROM clients
    INNER JOIN
    (
        SELECT 1 + units.i + tens.i * 10 AS aNum, units.i + tens.i * 10 AS aSubscript
        FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units
        CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
    ) sub0
    ON (1 + LENGTH(clients.courseNames) - LENGTH(REPLACE(clients.courseNames, ',', ''))) >= sub0.aNum
    LEFT OUTER JOIN clientenrols
    ON clientenrols.courseId = sub0.aSubscript
    GROUP BY a_course_name
    

    As you can see, it is possible but quite messy. And with little opportunity to use indexes it is not going to be efficient. Further the range must cope with the greatest number of delimited values, and works by excluding lots of duplicates; if the max number of delimited values is very large then this will slow things down dramatically. Overall it is generally far better to just properly normalise the database.

提交回复
热议问题