问题
I am working on a data warehouse project that will involve integrating data from multiple source systems. I have set up an SSIS package that populates the customer dimension and uses the slowly changing dimension tool to keep track of updates to the customer.
I'm running into some issues. Take this example:
Source system A might have a record like that looks like this:
First Name, Last Name, Zipcode Jane, Doe, 14222
Source system B might have a record for the same client that looks like this:
First Name, Last Name, Zipcode Jane, Doe, Unknown
If I first import the record from system A, I'll have the first name, last name, and ethnicity. Great. Now, if I import the client record from system B, I can do fuzzy matching to recognize that this is the same person and use the slowly changing dimension tool to update the information. But in this case, I'm going to lose the zipcode because the 'unknown' will overwrite the valid data.
I am wondering if I am approaching this problem in the wrong way. The SCD tool doesn't seem to offer any way of selectively updating attributes based on whether the new data is valid or not. Would a merge statement work better? Am I making some kind of fundamental design mistake that I'm not seeing?
Thanks for any advice!
回答1:
In my experience the built-in SCD tool is not flexible enough to handle this requirement.
Either a couple of MERGE
statements, or a series of UPDATE
and INSERT
statements will probably give you most flexibility with logic, and performance.
There are probably models out there for MERGE
statement for SCD Type 2 but here is the pattern I use:
Merge Target
Using Source
On Target.Key = Source.Key
When Matched And
Target.NonKeyAttribute <> Source.NonKeyAttribute
Or IsNull(Target.NonKeyNullableAttribute, '') <> IsNull(Source.NonKeyNullableAttribute, '')
Then Update Set SCDEndDate = GetDate(), IsCurrent = 0
When Not Matched By Target Then
Insert (Key, ... , SCDStartDate, IsCurrent)
Values (Source.Key, ..., GetDate(), 1)
When Not Matched By Source Then
Update Set SCDEndDate = GetDate(), IsCurrent = 0;
Merge Target
Using Source
On Target.Key = Source.Key
-- These will be the changing rows that were expired in first statement.
When Not Matched By Target Then
Insert (Key, ... , SCDStartDate, IsCurrent)
Values (Source.Key, ... , GetDate(), 1);
来源:https://stackoverflow.com/questions/40665273/need-help-understanding-alternatives-to-scd-in-ssis