I\'m just learning BigQuery so this might be a dumb question, but we want to get some statistics there and one of those is the total sessions in a given day.
To do s
After posting the question we got into contact with Google support and found that in Google Analytics only sessions that had an "event" being fired are actually counted.
In Bigquery you will find all sessions regardless whether they had an interaction or not.
In order to find the same result as in GA, you should filter by sessions with totals.visits = 1
in your BQ query (totals.visits
is 1 only for sessions that had an event being fired).
That is:
select sum(sessions) as total_sessions from (
select
fullvisitorid,
count(distinct visitid) as sessions,
from (table_query([40663402], 'timestamp(right(table_id,8)) between timestamp("20150519") and timestamp("20150519")'))
where totals.visits = 1
group each by fullvisitorid
)
standardsql
Simply SUM(totals.visits)
or when using COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) ))
make sure totals.visits=1
!
If you use visitId
and you are not grouping per day, you will combine midnight-split-sessions!
Here are all scenarios:
SELECT
COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitStartTime AS STRING) )) allSessionsUniquePerDay,
COUNT(DISTINCT CONCAT(fullVisitorId, CAST(visitId AS STRING) )) allSessionsUniquePerSelectedTimeframe,
sum(totals.visits) interactiveSessionsUniquePerDay, -- equals GA UI sessions
COUNT(DISTINCT IF(totals.visits=1, CONCAT(fullVisitorId, CAST(visitId AS STRING)), NULL) ) interactiveSessionsUniquePerSelectedTimeframe,
SUM(IF(totals.visits=1,0,1)) nonInteractiveSessions
FROM
`project.dataset.ga_sessions_2017102*`
Wrap up:
fullVisitorId
+ visitId
: useful to reconnect midnight-splitsfullVisitorId
+ visitStartTime
: useful to take splits into accounttotals.visits=1
for interaction sessionsfullVisitorId
+ visitStartTime
where totals.visits=1
: GA UI sessions (in case you need a session id)SUM(totals.visits)
: simple GA UI sessionsfullVisitorId
+ visitId
where totals.visits=1
and GROUP BY date
: GA UI sessions with too many chances for errors and misunderstandingsThe problem could be due to "COUNT DISTINCT".
According to this post:
COUNT DISTINCT is a statistical approximation for all results greater than 1000
You could try setting an additional COUNT parameter to improve accuracy at the expense of performance (see post), but I would first try:
SELECT COUNT( CONCAT( fullvisitorid,'_', STRING(visitid))) as sessions
from (table_query([40663402], 'timestamp(right(table_id,8)) between
timestamp("20150519") and timestamp("20150519")'))