Export Custom Event Dimensions to SQL from Application Insights using Stream Analytics

别等时光非礼了梦想. 提交于 2020-01-03 13:09:07

问题


I'm following the example walkthrough Export to SQL from Application Insights using Stream Analytics. I am trying to export custom event dimensions (context.custom.dimensions in the JSON example below) which get added as a nested JSON array in the data file. How do I flatten the dimensions array at context.custom.dimensions for export to SQL?

JSON...

{
  "event": [
    {
      "name": "50_DistanceSelect",
      "count": 1
    }
  ],
  "internal": {
    "data": {
      "id": "aad2627b-60c5-48e8-aa35-197cae30a0cf",
      "documentVersion": "1.5"
    }
  },
  "context": {
    "device": {
      "os": "Windows",
      "osVersion": "Windows 8.1",
      "type": "PC",
      "browser": "Chrome",
      "browserVersion": "Chrome 43.0",
      "screenResolution": {
        "value": "1920X1080"
      },
      "locale": "unknown",
      "id": "browser",
      "userAgent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    },
    "application": {},
    "location": {
      "continent": "North America",
      "country": "United States",
      "point": {
        "lat": 38.0,
        "lon": -97.0
      },
      "clientip": "0.115.6.185",
      "province": "",
      "city": ""
    },
    "data": {
      "isSynthetic": false,
      "eventTime": "2015-07-15T23:43:27.595Z",
      "samplingRate": 0.0
    },
    "operation": {
      "id": "2474EE6F-5F6F-48C3-BA43-51636928075A"
    },
    "user": {
      "anonId": "BA05C4BE-1C42-482F-9836-D79008E78A9D",
      "anonAcquisitionDate": "0001-01-01T00:00:00Z",
      "authAcquisitionDate": "0001-01-01T00:00:00Z",
      "accountAcquisitionDate": "0001-01-01T00:00:00Z"
    },
    "custom": {
      "dimensions": [
        {
          "CategoryAction": "click"
        },
        {
          "SessionId": "73ef454d-fa39-4125-b4d0-44486933533b"
        },
        {
          "WebsiteVersion": "3.0"
        },
        {
          "PageSection": "FilterFind"
        },
        {
          "Category": "EventCategory1"
        },
        {
          "Page": "/page-in-question"
        }
      ],
      "metrics": []
    },
    "session": {
      "id": "062703E5-5E15-491A-AC75-2FE54EF03623",
      "isFirst": false
    }
  }
}

回答1:


A slightly more dynamic solution is to set up a temp table:

WITH ATable AS (
SELECT
     temp.internal.data.id as ID
    ,dimensions.ArrayValue.CategoryAction as CategoryAction
    ,dimensions.ArrayValue.SessionId as SessionId 
    ,dimensions.ArrayValue.WebsiteVersion as WebsiteVersion 
    ,dimensions.ArrayValue.PageSection as PageSection 
    ,dimensions.ArrayValue.Category as Category 
    ,dimensions.ArrayValue.Page as Page  
FROM [analyticseventinputs] temp 
CROSS APPLY GetElements(temp.[context].[custom].[dimensions]) as dimensions)

and then doing joins based on a unique key

FROM [analyticseventinputs] Input 
Left JOIN ATable CategoryAction on 
    Input.internal.data.id = CategoryAction.ID AND
    CategoryAction.CategoryAction <> "" AND
     DATEDIFF(day, Input, CategoryAction) BETWEEN 0 AND 5 

The rather annoying bit is the requirement for the datediff, because the joins are intended to combine 2 streams of data but in this case you are only joining on the unique key. So I set it to a large value of 5 days. This really only protects against the custom params not coming in ordered compared to the other solution.




回答2:


Most tutorials online use CROSS APPLY or OUTER APPLY however this is not what you are looking for because it will put each property on a different row. To over come this use the functions: GetRecordPropertyValue and GetArrayElement as demoed below. This will flatten out the properties into a single row.

SELECT
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'CategoryAction') AS CategoryAction,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'SessionId') AS SessionId,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 2), 'WebsiteVersion') AS WebsiteVersion,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 3), 'PageSection') AS PageSection,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 4), 'Category') AS Category,
    GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 5), 'Page') AS Page
INTO
  [outputstream]
FROM
  [inputstream] MySource



回答3:


What schema do you have in SQL? Do you want a single row in SQL with all the dimensions as columns?

This might not be possible today. However there will be more Array/Record functions in Azure Stream Analytics after July 30.

Then you will be able to do something like this:

SELECT 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 0
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 0), 'CategoryAction')
        ELSE ''
        END AS CategoryAction 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 1
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 1), 'WebsiteVersion')
        ELSE ''
        END AS WebsiteVersion 
    CASE 
        WHEN GetArrayLength(A.context.custom.dimensions) > 2
            THEN GetRecordPropertyValue(GetArrayElement(A.context.custom.dimensions, 2), 'PageSection')
        ELSE ''
        END AS PageSection
FROM input

If you want to have separate rows per dimension then you can use CROSS APPLY operator.




回答4:


A very convenient way to do this as proposed by Alex Raizman is to do some aggregations to the fields you want to flatten, grouping by the remaining select list, assuming that

  • you know the set of possible objects in dimensions, and
  • you do not have duplicate objects in this array, and
  • there is something you can uniquely identify your initial rows (like the id )

    SELECT
      CategoryAction= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'CategoryAction') AS
      NVARCHAR(MAX))),
      SessionId= min(CAST(GetRecordPropertyValue(d.arrayvalue, 'SessionId') AS
      NVARCHAR(MAX))),
      WebsiteVersion= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'WebsiteVersion') AS
      NVARCHAR(MAX))),
      PageSection= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'PageSection') AS
      NVARCHAR(MAX))),
      Category= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Category') AS
      NVARCHAR(MAX))),    
      Page= MIN(CAST(GetRecordPropertyValue(d.arrayvalue, 'Page') AS NVARCHAR(MAX))) 
    INTO  
      [outputstream] 
    FROM [inputstream] MySource 
    CROSS APPLY GetArrayElements(MySource.[context].[custom].[dimensions]) d 
    GROUP BY System.Timestamp, MySource.id
    

We also group by System.Timestamp to create one temporal window as expected from Stream Analytics to perform set-based operations like counts or aggregations.



来源:https://stackoverflow.com/questions/31528147/export-custom-event-dimensions-to-sql-from-application-insights-using-stream-ana

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!