How to create dummy variable columns for thousands of categories in Google BigQuery?

后端 未结 1 1011
陌清茗
陌清茗 2020-11-28 14:30

I have a simple table with 2 columns: UserID and Category, and each UserID can repeat with a few categories, like so:

UserID   Category
------   --------
1           


        
相关标签:
1条回答
  • 2020-11-28 15:20

    You can use below "technic"

    First run query #1. It produces the query (query #2) that you need to run to get result you need. Please, still consider Mosha's comments before going "wild" with thousands categories :o)

    Query #1:

    SELECT 'select UserID, ' + 
       GROUP_CONCAT_UNQUOTED(
        'sum(if(category = "' + STRING(category) + '", 1, 0)) as ' + STRING(category)
       ) 
       + ' from YourTable group by UserID'
    FROM (
      SELECT category 
      FROM YourTable  
      GROUP BY category
    )
    

    Resulted will be like below - Query #2

    SELECT
      UserID,
      SUM(IF(category = "A", 1, 0)) AS A,
      SUM(IF(category = "B", 1, 0)) AS B,
      SUM(IF(category = "C", 1, 0)) AS C
    FROM
      YourTable
    GROUP BY
      UserID
    

    of course for three categories - you could do it manually, but for thousands it will definitelly will make day for you!!

    Result of query #2 will looks as you expect:

    UserID  A   B   C    
    1       1   1   0    
    2       0   0   1    
    3       1   1   1    
    
    0 讨论(0)
提交回复
热议问题