Need help understanding alternatives to scd in SSIS

邮差的信 提交于 2019-12-12 01:54:21

问题


I am working on a data warehouse project that will involve integrating data from multiple source systems. I have set up an SSIS package that populates the customer dimension and uses the slowly changing dimension tool to keep track of updates to the customer.

I'm running into some issues. Take this example:

Source system A might have a record like that looks like this:

First Name, Last Name, Zipcode Jane, Doe, 14222

Source system B might have a record for the same client that looks like this:

First Name, Last Name, Zipcode Jane, Doe, Unknown

If I first import the record from system A, I'll have the first name, last name, and ethnicity. Great. Now, if I import the client record from system B, I can do fuzzy matching to recognize that this is the same person and use the slowly changing dimension tool to update the information. But in this case, I'm going to lose the zipcode because the 'unknown' will overwrite the valid data.

I am wondering if I am approaching this problem in the wrong way. The SCD tool doesn't seem to offer any way of selectively updating attributes based on whether the new data is valid or not. Would a merge statement work better? Am I making some kind of fundamental design mistake that I'm not seeing?

Thanks for any advice!


回答1:


In my experience the built-in SCD tool is not flexible enough to handle this requirement.

Either a couple of MERGE statements, or a series of UPDATE and INSERT statements will probably give you most flexibility with logic, and performance.

There are probably models out there for MERGE statement for SCD Type 2 but here is the pattern I use:

Merge Target
  Using Source
    On Target.Key = Source.Key

  When Matched And
    Target.NonKeyAttribute <> Source.NonKeyAttribute
    Or IsNull(Target.NonKeyNullableAttribute, '') <> IsNull(Source.NonKeyNullableAttribute, '')
  Then Update Set SCDEndDate = GetDate(), IsCurrent = 0

  When Not Matched By Target Then 
    Insert (Key, ... , SCDStartDate, IsCurrent)
    Values (Source.Key, ..., GetDate(), 1)

  When Not Matched By Source Then
    Update Set SCDEndDate = GetDate(), IsCurrent = 0;

Merge Target
  Using Source
    On Target.Key = Source.Key

  -- These will be the changing rows that were expired in first statement.
  When Not Matched By Target Then
    Insert (Key, ... , SCDStartDate, IsCurrent)
    Values (Source.Key, ... , GetDate(), 1);


来源:https://stackoverflow.com/questions/40665273/need-help-understanding-alternatives-to-scd-in-ssis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!