Total Sessions in BigQuery vs Google Analytics Reports

后端 未结 3 1076
情歌与酒
情歌与酒 2020-12-01 15:16

I\'m just learning BigQuery so this might be a dumb question, but we want to get some statistics there and one of those is the total sessions in a given day.

To do s

相关标签:
3条回答
  • 2020-12-01 15:30

    After posting the question we got into contact with Google support and found that in Google Analytics only sessions that had an "event" being fired are actually counted.

    In Bigquery you will find all sessions regardless whether they had an interaction or not.

    In order to find the same result as in GA, you should filter by sessions with totals.visits = 1 in your BQ query (totals.visits is 1 only for sessions that had an event being fired).

    That is:

    select sum(sessions) as total_sessions from (
      select
        fullvisitorid,
        count(distinct visitid) as sessions,
        from (table_query([40663402], 'timestamp(right(table_id,8)) between timestamp("20150519") and timestamp("20150519")'))
        where totals.visits = 1
        group each by fullvisitorid
    )
    
    0 讨论(0)
  • 2020-12-01 15:55

    standardsql

    Simply SUM(totals.visits) or when using COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) make sure totals.visits=1!

    If you use visitId and you are not grouping per day, you will combine midnight-split-sessions!

    Here are all scenarios:

    SELECT
      COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) allSessionsUniquePerDay,
      COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitId AS STRING) )) allSessionsUniquePerSelectedTimeframe,
      sum(totals.visits) interactiveSessionsUniquePerDay, -- equals GA UI sessions
      COUNT(DISTINCT IF(totals.visits=1, CONCAT(fullVisitorId, CAST(visitId AS STRING)), NULL) ) interactiveSessionsUniquePerSelectedTimeframe,
      SUM(IF(totals.visits=1,0,1)) nonInteractiveSessions
    FROM
      `project.dataset.ga_sessions_2017102*`
    

    Wrap up:

    • fullVisitorId + visitId: useful to reconnect midnight-splits
    • fullVisitorId + visitStartTime: useful to take splits into account
    • totals.visits=1 for interaction sessions
    • fullVisitorId + visitStartTime where totals.visits=1: GA UI sessions (in case you need a session id)
    • SUM(totals.visits): simple GA UI sessions
    • fullVisitorId + visitId where totals.visits=1 and GROUP BY date: GA UI sessions with too many chances for errors and misunderstandings
    0 讨论(0)
  • 2020-12-01 15:55

    The problem could be due to "COUNT DISTINCT".

    According to this post:

    COUNT DISTINCT is a statistical approximation for all results greater than 1000

    You could try setting an additional COUNT parameter to improve accuracy at the expense of performance (see post), but I would first try:

    SELECT COUNT( CONCAT( fullvisitorid,'_', STRING(visitid))) as sessions 
    from (table_query([40663402], 'timestamp(right(table_id,8)) between 
    timestamp("20150519") and timestamp("20150519")'))
    
    0 讨论(0)
提交回复
热议问题