I want to calculate the total timeOnSite for all visitors to a website (and divide it by 3600 because it\'s stored as seconds in the raw data), and then I want to break it down
It might seem odd that I'm answering my own question like this, but a contact of mine from outside of Stack Overflow helped me solve this, so it's actually his answer rather than mine.
The problem with session_duration can be solved by using a window function (you can read more about window functions in the BigQuery documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions)
#StandardSQL
SELECT
iso_date,
content_group,
content_level,
COUNT(DISTINCT SessionId) AS sessions,
SUM(session_duration) AS session_duration
FROM (
SELECT
date AS iso_date,
hits.contentGroup.contentGroup1 AS content_group,
(SELECT MAX(IF(index=51, value, NULL)) FROM UNNEST(hits.customDimensions)) AS content_level,
CONCAT(CAST(fullVisitorId AS STRING), CAST(visitId AS STRING)) AS SessionId,
(LEAD(hits.time, 1) OVER (PARTITION BY fullVisitorId, visitId ORDER BY hits.time ASC) - hits.time) / 3600000 AS session_duration
FROM `projectname.123456789.ga_sessions_20170101`,
unnest(hits) AS hits
WHERE _TABLE_SUFFIX BETWEEN "20170101" AND "20170131"
AND (SELECT
MAX(IF(index=51, value, NULL))
FROM
UNNEST(hits.customDimensions)
WHERE
value IN ("web", "phone", "tablet")
) IS NOT NULL
GROUP BY
iso_date, content_group, content_level
ORDER BY
iso_date, content_group, content_level
)
GROUP BY iso_date, content_group, content_level
ORDER BY iso_date, content_group, content_level
Both LEAD - OVER - PARTITION in the subselect and the subsubselect in the WHERE-clause are required for the window function to work properly.
A more accurate way of calculating sessions is also provided.