问题
This is a second post (a follow-up from my first post) on looking at distributions within Firebase Analytics Data. This time around, I want to create a user distribution table in BigQuery based on Firebase Session Data. The output should look like this:
I managed to create the following script to count on app_instance_id's:
#standardSQL
SELECT
COUNT(DISTINCT(CASE WHEN sess_id = 0 THEN app_instance_id END)) AS sess_count_0,
COUNT(DISTINCT(CASE WHEN sess_id = 1 THEN app_instance_id END)) AS sess_count_1,
COUNT(DISTINCT(CASE WHEN sess_id > 1 AND sess_id <= 5 THEN app_instance_id END)) AS sess_count_2BETWEEN5,
COUNT(DISTINCT(CASE WHEN sess_id > 5 AND sess_id <= 10 THEN app_instance_id END)) AS sess_count_6BETWEEN10,
COUNT(DISTINCT(CASE WHEN sess_id > 10 AND sess_id <= 30 THEN app_instance_id END)) AS sess_count_11BETWEEN30,
COUNT(DISTINCT(CASE WHEN sess_id > 30 THEN app_instance_id END)) AS sess_count_PLUS31
FROM (SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (SELECT *, IF(previous IS null OR (min_time-previous)>(20*60*1000*1000),1, 0) session_start
FROM (SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (SELECT user_dim.app_info.app_instance_id,
user_dim.device_info.mobile_model_name,
user_dim.device_info.platform_version,
(SELECT MIN(timestamp_micros)
FROM UNNEST(event_dim)) min_time,
(SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
FROM `firebase-public-project.com_firebase_demo_IOS.app_events_*`
WHERE (_TABLE_SUFFIX BETWEEN '20170701' AND '20170731')
)
)
)
)
Questions:
Considering Users (and not Sessions), I want to make 100 % sure whether I should still count on App Instances (and not the Session Ids) ?
Any thoughts on optimising this query Is there a more efficient way to aggregate all the distribution ranges with one query ?
Finally, I wanted to compare the overall total that I got from above with distinct users that triggered the
session_start
-event over the same period. I was hoping to see that it roughly would align, but it did not. Why is there such a big difference: 7688 vs 16310 (488+7343+4967+1956+1165+391) ? Where did my logic go wrong ?#standardSQL SELECT COUNT (DISTINCT user_dim.app_info.app_instance_id) as users FROM `firebase-public-project.com_firebase_demo_IOS.app_events_*`, UNNEST(event_dim) AS event WHERE (_TABLE_SUFFIX BETWEEN '20170701' AND '20170731') AND event.name = "session_start"
来源:https://stackoverflow.com/questions/48687226/sessions-per-user-distribution-table-in-firebase