How to get a resultset from a dynamic select-based sql query in Pentaho-Kettle?

送分小仙女□ 提交于 2019-12-11 19:52:55

问题


I want to execute a set of select-based sql queries, derived from xml-node elements within a XML file, and write the values of the corresponding resultsets in a CSV file. I'd like to clarify that no field of the sql query is parameterized, but the full sql query itself is.

The part of getting the full sql query is done as expected, but I don't know how to proceed with the part of launching the sql query so that I can get the corresponding resultset to manage it later.

What I've tried until now with no success is the following:

  1. Execute SQL script step: according to what it's said here, this sort of step doesn't get any resultset.

  2. Dynamic SQL Row step: I get the corresponding sql as long as I hardcode the field 'TemplateSQL', but I can't do that because the sql queries derived from the previous step are very different among them and don't provide a fixed meta-data scheme.

  3. Table Input step: this step only allows single values of a sql query to be parameterized, not the full query.

I'd really appreciate if some pentaho-kettle expert could tell me the right way of accomplishing the task I want to do.

Update (following @Cristian Curti instructions): The problem arise in KTR-1. When you say "you have to pass this query as a variable to another KTR", I understand that what I have to do first is setting the xml-node value in a variable and then pass it to the KTR-2. To do so, in my KTR-1 I'm using these steps:

  1. Get Data from XML: I'm getting 'SQLQuery' field.
  2. Set Variables: I'm setting the field 'SQLQuery' in a variable named 'QUERY'. (Moreover, KTR-1 is set as you said, that is, in the Parameters tab I set a parameter named 'QUERY').

When I preview the second step of KTR-1, the Kettle console logs the following error:

Get data from XML.0 - Finished processing (I=2, O=0, R=0, W=2, U=0, E=0)
Set variables.0 - Setting environment variables...
Set variables.0 - Set variable QUERY to value [        
  SELECT "Table1".*, "table2"."field1" FROM "Table1" INNER JOIN "Table2" ON Table1"."PATIENT" = "Table2"."PATIENT" WHERE ("Table1"."field1" > Table1"."field2") AND NOT("field1" BETWEEN 'date1' AND 'date2')     
]
Set variables.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-1-14 10.30.55 by buildguy) : Unexpected error
Set variables.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-1-14 10.30.55 by buildguy) : org.pentaho.di.core.exception.KettleStepException: 
Set variables.0 - Only 1 input row was expected to set the variables and at least 2 were received.

It seems Set Variables step only expects to receive one single row. I don't know if I'm doing something wrong or missing something. In case you need further details, just tell me. Thanks.


回答1:


From the link you shared, follow Rishu's Answer, Part 2.

In the part where YOU are getting the Query from the XML nodes, you have to pass this query as a variable to another KTR, and in the Input Table step of this second KTR, use ${variable} in the SQL Editor of the step.

Should look something like this.

Update Answer

No, you used the wrong approach. You use the Set Variables step when you know only 1 row will reach the step, when you have multiple rows reaching your "Result set", you need to use the Copy rows > Execute for every row approach. This requires 1 Job and 2 KTR's to be executed.

Your Job will simply look like this:

The Last step in your first KTR will be 'Copy Rows to Result'.

In the Job, you need to configure your second KTR entry as such:

The Stream column name must match the same column name in the first KTR that contains the queries to be executed, also, the named parameter in the second KTR (First answer picture) MUST match the parameter in the JOB configuration as the pictures show.

After that all you have to do is generate a dynamic filename for each query on the second KTR, usually all output steps that generate Files have a 'Include date in filename' and a 'Include time in filename', using both should be enough to generate different filenames with zero effort, and you get a timestamp of when the file was created.



来源:https://stackoverflow.com/questions/56363232/how-to-get-a-resultset-from-a-dynamic-select-based-sql-query-in-pentaho-kettle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!