how to write Bigquery in new schema with replacing event_dim in old schema from Firebase analytics?

后端 未结 2 1410
无人及你
无人及你 2021-01-21 06:02

The old BigQuery Export schema wise script is running.It is given below. But I want to replicate this code and write it according to new export schema as we Bigquery schema has

2条回答
  •  天涯浪人
    2021-01-21 06:19

    As I believe my previous answer provides some general ideas for the Community, I will keep it and write a new one in order to be more specific for your use case.

    First of all, I would like to clarify that in order to adapt a query (just like you are asking us to do), one needs to have a clear understanding of the statement, objective of the query, expected results and data to play with. As this is not the case, it is difficult to work with it, even more considering that there are some functionalities that are not clear from the query, for example: in order to obtain the "min_time" and "max_time" for each event, you are taking the min and max value across multiple events, which does not make clear sense to me (it may, depending on your use case, reason why I suggested that it would be better if you could provide more details or work more on the query yourself). Moreover, the new schema "flattens" events, in such a way that each event is written in a different line (you can easily check this by running a SELECT COUNT(*) FROM 'table_with_old_schema' and compare it to SELECT COUNT(*) FROM 'table_with_new_schema'; you will see that the second one has many more rows), so your query does not make sense anymore, because events are not grouped anymore, and then you cannot pick a minimum and maximum between nested fields.

    This being clarified, and having removed some fields that cannot be directly adapted to the new schema (you may be able to adapt this from your side, but this would require some additional effort and understanding of what did those fields mean to you in your previous query), here there are two queries that provide exactly the same results, when run against the same table, with different schema:

    Query against a table with the old schema:

    SELECT
      user_dim.app_info.app_instance_id,
      event.name,
      params.value.int_value engagement_time
    FROM
      `DATASET.app_events_YYYYMMDD`,
      UNNEST(event_dim) AS event,
      UNNEST(event.params) AS params,
      UNNEST(user_dim.user_properties) AS user_params
    WHERE
      (event.name = "user_engagement"
        AND params.key = "engagement_time_msec")
      AND (user_params.key = "plays_quickplay"
        AND user_params.value.value.string_value = "true")
    ORDER BY 1, 2, 3
    

    Query against the same table, with the new schema:

    SELECT
      user_pseudo_id,
      event_name,
      params.value.int_value engagement_time
    FROM
      `DATASET.events_YYYYMMDD`,
      UNNEST(event_params) AS params,
      UNNEST(user_properties) AS user_params
    WHERE
      (event_name = "user_engagement"
        AND params.key = "engagement_time_msec")
      AND (user_params.key = "plays_quickplay"
        AND user_params.value.string_value = "true")
    ORDER BY 1, 2, 3
    

    Again, for this I am using the following table from the public dataset: firebase-public-project.com_firebase_demo_ANDROID.app_events_YYYYMMDD, so I had to change some filters and remove some others in order for it to retrieve sensible results against that table. Therefore, feel free to modify or add the ones you need in order for it to be useful for your use case.

提交回复
热议问题