Unnest and totals.timeOnSite (BigQuery and Google Analytics data)

前端 未结 2 1063
温柔的废话
温柔的废话 2021-01-26 12:23

I want to calculate the total timeOnSite for all visitors to a website (and divide it by 3600 because it\'s stored as seconds in the raw data), and then I want to break it down

2条回答
  •  挽巷
    挽巷 (楼主)
    2021-01-26 12:35

    It might seem odd that I'm answering my own question like this, but a contact of mine from outside of Stack Overflow helped me solve this, so it's actually his answer rather than mine.

    The problem with session_duration can be solved by using a window function (you can read more about window functions in the BigQuery documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions)

    #StandardSQL
    SELECT   
     iso_date,   
     content_group,   
     content_level,  
     COUNT(DISTINCT SessionId) AS sessions, 
     SUM(session_duration) AS session_duration 
    FROM (   
         SELECT   
           date AS iso_date,   
           hits.contentGroup.contentGroup1 AS content_group,   
           (SELECT MAX(IF(index=51, value, NULL)) FROM UNNEST(hits.customDimensions)) AS content_level,  
           CONCAT(CAST(fullVisitorId AS STRING), CAST(visitId AS STRING)) AS SessionId, 
           (LEAD(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitId ORDER BY hits.time ASC) - hits.time) / 3600000 AS session_duration 
         FROM `projectname.123456789.ga_sessions_20170101`,   
           unnest(hits) AS hits
         WHERE _TABLE_SUFFIX BETWEEN "20170101" AND "20170131" 
           AND (SELECT 
                  MAX(IF(index=51, value, NULL)) 
                FROM 
                  UNNEST(hits.customDimensions) 
                WHERE 
                  value IN ("web", "phone", "tablet")
                ) IS NOT NULL 
         GROUP BY   
           iso_date, content_group, content_level
         ORDER BY 
           iso_date, content_group, content_level
        )   
    GROUP BY iso_date, content_group, content_level
    ORDER BY iso_date, content_group, content_level 
    

    Both LEAD - OVER - PARTITION in the subselect and the subsubselect in the WHERE-clause are required for the window function to work properly.

    A more accurate way of calculating sessions is also provided.

提交回复
热议问题