问题
Facing problem and out of ideas on figuring on how to implement parent-child relationship in Talend.
Problem Statement:
Having a feed file which has data in below format
MemberCode|LastName|FirstName
A|SHINE|MICHAEL
B|SHINE|MICHELLE
C|SHINE|ERIN
A|RODRIGUEZ|DAMIAN
A|PAVELSKY|STEPHEN
B|PAVELSKY|TERESA
(there are many more columns and many more rows - just few rows for reference purpose). LastName and FirstName are self-explanatory. MemberCode denotes the relationship. A will be parent, B or C will be child. For a certain employee record the data will always be in sequential manner - meaning the complete parent-child data will be in continuous rows.
Expected Result:
The above data needs to be outputed in below format:
MemberCode|MemberLastName|MemberFirstName|DependentLastName|DependentFirstName
A |SHINE |MICHAEL | |
B |SHINE |MICHAEL |SHINE |MICHELLE
C |SHINE |MICHAEL |SHINE |ERIN
A |RODRIGUEZ |DAMIAN | |
A |PAVELSKY |STEPHEN | |
B |PAVELSKY |STEPHEN |PAVELSKY |TERESA
What I have tried so far:
The Talend job is having these components: tFileInputDelimited->tMap->tLogRow
And tMap
has the below logic -
which gives me output like below -
MemberCode|MemberLastName|MemberFirstName|DependentLastName|DependentFirstName
A |SHINE |MICHAEL | |
B | | |SHINE |MICHELLE
C | | |SHINE |ERIN
A |RODRIGUEZ |DAMIAN | |
A |PAVELSKY |STEPHEN | |
B | | |PAVELSKY |TERESA
How to replicate the value for MemberFirstName and MemberLastName for MemberCode A for the rows having MemberCode B or C. Thanks in advance.
Platform: Talend Open Studio for Data Integration Version: 6.5.1
回答1:
Here's the solution I put together:
You need to split your rows into parents and children based on their MemberCode. You write the parents to file with DependentLastName
and DependentFirstName
being empty, while saving the parent info to global variables (ParentLastName
and ParentFirstName
) in a tSetGlobalVar
.
When you move to the next row, which is a child row, your parent has already been saved as it's always the first in the group. So you can retrieve its first and last name using the global variables in the children output, and write this to the same physical file.
Both tFileOutputDelimited
components have identical settings; they are in append mode, and have the option Custom the flush buffer size
set to 1 (this is important in order to keep the rows sorted in the right order).
回答2:
The solution provided by @iMezouar works just fine. Posting another alternative way.
Job Layout:
The approach used was to capture the previous row values (LastName & FirstName) and store them in variables inside tMap and then use them to the output row.
来源:https://stackoverflow.com/questions/49927959/parent-child-relationship-in-talend