Hi There,
Just wondering if any of you implemented a (Kimball type 2) dimension structure, in which a ParentID column exists which points to a record from the same dimension table, using a SCD objects in SSIS. The ParentID column would have to be "Historical".
The challange here is that you would need to go through the table twice somehow, because if I would do a lookup of the parent record in the first run, I wouldn't be sure if I got the right parent record.
Thnx, Jeroen.
Jeroen Alblas wrote: Hi There,
Just wondering if any of you implemented a (Kimball type 2) dimension structure, in which a ParentID column exists which points to a record from the same dimension table, using a SCD objects in SSIS. The ParentID column would have to be "Historical".
The challange here is that you would need to go through the table twice somehow, because if I would do a lookup of the parent record in the first run, I wouldn't be sure if I got the right parent record.
Thnx, Jeroen.
I don't understand the problem or question. I think if you define ParentID as 'historical' the SCD component will issue a new 'version' of the rows every time it detects a change in parentID value and it will 'expire' the previous 'version'. It’s just simple like that. It may be helpful is you provide an example of what you are trying to accomplish
|||Why would your Id column change if you have a type2 change?
Are you using a typ2 column to lookup your parentID?
|||Okay, let me try to give an example:
Lets say we have an employee named Doopie and his boss named Barkie, both in the dim_Employee table. In our case we work with start- and enddates.
ID Name ParentID Salary Stardate EndDate
10 Doopie 20 1000 1/1/1900 12/31/9999
20 Barkie NULL 4000 1/1/1900 12/31/9999
Okay, now, in the source system, Barkie gives himself a raise of 500. We want to track history of Salary. Now say we start the Data flow task with the Slowly changing dimension object. It could very well be, that Doopie's record is processed first. No changes there so Doopie's record remains unchanged. Now Barkie record which introduces a new row:
ID Name ParentID Salary Stardate EndDate
10 Doopie 20 1000 1/1/1900 31/12/9999
20 Barkie NULL 4000 1/1/1900 12/7/2006
21 Barkie NULL 4500 12/8/2006 31/12/9999
If Doopie's record was processed after the insert of Barkie's record, the lookup would return record 21 as the active Boss record, and introduce a new record for Doopie as well...
Thnx again!
Jeroen
|||
I see the problem now. Basically you need to make sure every child record point to the latest version of its parent.
Well, I don't see an easy way to do this at one pass. You may try to do an update after the dataflow with the SCD2 finish that assures every row is pointing to the latest version row of its parent. That could be safer in the case that an unchanged child row is not in your set of rows to be processed (assuming you are pulling only changed/new rows from the source). That is my best shot considering that I have not drunk my first coffee this morning
|||Why don't you normalize that table?|||Phil wrote: Why don't you normalize that table?
Hi Phil, I normalized some columns of it (name of parent), but the parent / child hierarchy goes about six or seven levels. So then I would have to add quite a few columns...
Rafael wrote: You may try to do an update after the dataflow with the SCD2 finish that assures every row is pointing to the latest version row of its parent.
I can't just do an update, because the ParentID is defined as type 2 (historical) and should result in a new row for the child record as well...
Jeroen
|||Jeroen Alblas wrote: I can't just do an update, because the ParentID is defined as type 2 (historical) and should result in a new row for the child record as well...
So let's define 'historical'. Defining a column as 'historical' to me is intended to track changes on a particular attribute. In the example you gave at the beginning you said that no changes were made to the child row:
“No changes there so Doopie's record remains unchanged.”
So, why do you want to issue a new version of the child row? The only change that occurred was to the parent; hence you should generate a new row to the parent, expire the existing (parent)one and update all its children to point to new parent ID. After all, there is not change to track when you look at the child record.
In your example, I would issue a new row for Doopie only if I detected that Barkie is not anymore his manager; that way I would be tracking historical changes of the managers.
Under your approach, have you figured out what would happened if a change occurs to the topmost manager? You would have to generate a new row for the whole organization…kind of messy to me. BTW, the only way I see you could implement this is through recursive queries…if you are using SQL Server 2005 you could use Common table expressions for that.
|||Indeed no changes are made to the attributes of the child record, unless you consider the ParentID column an attribute of the child record. And updating it will solve it for the current point in time, but as you can see in the example, yit would seem Doopie's parent would always have had a salary of 4500...
But I agree; if the topmost record would change I would get the whole tree inserted again ... Indeed that's not what I want.
So, I guess I will add the parent columns which must lead to a new child record (Firstname, Lastname, Login, etc) to the child record and update the ParentID to point to the active record of the parent.
Still, I think I must conclude that the dimensional modelling technique, at least to my knowledge, does not provide a good method of combining "tracking history" in combination with "recursive relationships".
Thnx, Jeroen
|||The way I usually handle this is by using a business Key to uniquely identify each person (see EmployeeID)
If you have a single source for your employees, you can use that id, otherwise you have to generate one. Then, you
use this key for the relationship like so
ID EmployeeID Name ParentID Salary Stardate EndDate
10 100 Doopie 200 1000 1/1/1900 12/31/9999
20 200 Barkie NULL 4000 1/1/1900 12/31/9999
After the change the data would look like this:
ID EmployeeID Name ParentID Salary Stardate EndDate
10 100 Doopie 200 1000 1/1/1900 31/12/9999
20 200 Barkie NULL 4000 1/1/1900 12/7/2006
21 200 Barkie NULL 4500 12/8/2006 31/12/9999
Jeroen Alblas wrote: Indeed no changes are made to the attributes of the child record, unless you consider the ParentID column an attribute of the child record.
Indeed, I would not consider it an attribute of the child row.
Jeroen Alblas wrote: And updating it will solve it for the current point in time, but as you can see in the example, yit would seem Doopie's parent would always have had a salary of 4500...
I disagree. It does not seem like Doopie's parent always have had a salary of 4500; what it actually seems is the current salary of her parent is 4500. You should not try to get historical information of a row through its children...I gues that should be part of educating end users on what a SCD 2 is and what is not.
Jeroen Alblas wrote: But I agree; if the topmost record would change I would get the whole tree inserted again ... Indeed that's not what I want.
That is certanly not practical...
Jeroen Alblas wrote: So, I guess I will add the parent columns which must lead to a new child record (Firstname, Lastname, Login, etc) to the child record and update the ParentID to point to the active record of the parent.
That was pretty much my original sugestion...good luck with that
|||David Frommer wrote: The way I usually handle this is by using a business Key to uniquely identify each person (see EmployeeID)
Thnx David. I do have the business key of the employee in the dimension. However, if you create a parent-child dimension in Analysis Services, you need a key which identifies a unique row in the dimension table. I didn't mention this before, but this is one of the things for which I use the column.
Jeroen
|||
Rafael Salas wrote: You should not try to get historical information of a row through its children...I gues that should be part of educating end users on what a SCD 2 is and what is not.
Although I'm still convinced the ParentID is not quite as useful if it doesn't give the same result as joining using the business key in combination with start- and enddate, I think this a quite a good conclusion to wrap this up :) thnx.
Jeroen
No comments:
Post a Comment