问题
I have two csv file . In one file i have 10 rows and in another list of data . What i want to do is , check the data of one filed of first csv and compare it with another csv file . So how can i achieve this ? Any help would be great .
回答1:
The step you are looking for is named the a Stream Lookup
step.`
Read you CSV and the reference files, and drop the two flows in a Stream Lookup
and set it up as follow:
a) Lookup step = the step that reads the reference
b) Keys / field = the name of field of the CSV that contains any field able to identify the row in the reference file.
c) Keys / Lookup field = the name of the field in the reference file.
d) Field to retrieve = the name of the field in the reference to return (may be the identifier or any other field you need)
e) Field to retrieve / Type = Do not forget !
Like that, you will add a column from the reference file to the 10 rows of the CSV file. You may then filter out the rows which the Lookup did not found by testing if the value of the new column is not null.
As in the PDI all the above setup are guided with drop down lists, it should take you 2 minutes.
来源:https://stackoverflow.com/questions/50017225/how-to-validate-one-csv-data-compare-with-another-csv-file-using-pentaho