pdi

Pentaho Data Integration Import large dataset from DB

余生颓废 提交于 2021-02-10 20:30:26
问题 I'm trying to import a large set of data from one DB to another (MSSQL to MySQL). The transformation does this: gets a subset of data, check if it's an update or an insert by checking hash, map the data and insert it into MySQL DB with an API call. The subset part for the moment is strictly manual, is there a way to set Pentaho to do it for me, kind of iteration. The query I'm using to get the subset is select t1.* from ( select *, ROW_NUMBER() as RowNum over (order by id) from mytable ) t1

lookup_Pentaho data Integration

ⅰ亾dé卋堺 提交于 2021-01-29 06:10:30
问题 I have two files (App.csv and Acess.csv) App.csv has one column called Application Application App-A App-B Access.csv contains 3 columns (Application, entitlement, userid) Application, entitlement, userid App-A,ent-A,user1 App-A,ent-B,user1 App-B,ent-c,user2 App-B,ent-d,user1 App-C,ent-c,user2 App-C,ent-d,user1 I need extract all the App-A and App-B details if it matches Application file column and output should like be below App-A,ent-A,user1 App-A,ent-B,user1 App-B,ent-c,user2 App-B,ent-d

select mongodb aray element in a single field in pentaho PDI

无人久伴 提交于 2020-12-15 06:35:24
问题 Following is the structure of the document i have in a collection in MongoDB { "_id": { "$oid": "5f48e358d43721376c397f53" }, "heading": "this is heading", "tags": ["tag1","tag2","tag3"], "categories": ["projA", "projectA2"], "content": ["This", "is", "the", "content", "of", "the", "document"], "timestamp": 1598612312.506219, "lang": "en" } When i am importing data in PDI using the mongodb input step the system is putting each array of the "content" element in a different field I want to

select mongodb aray element in a single field in pentaho PDI

亡梦爱人 提交于 2020-12-15 06:33:29
问题 Following is the structure of the document i have in a collection in MongoDB { "_id": { "$oid": "5f48e358d43721376c397f53" }, "heading": "this is heading", "tags": ["tag1","tag2","tag3"], "categories": ["projA", "projectA2"], "content": ["This", "is", "the", "content", "of", "the", "document"], "timestamp": 1598612312.506219, "lang": "en" } When i am importing data in PDI using the mongodb input step the system is putting each array of the "content" element in a different field I want to

select mongodb aray element in a single field in pentaho PDI

天大地大妈咪最大 提交于 2020-12-15 06:32:27
问题 Following is the structure of the document i have in a collection in MongoDB { "_id": { "$oid": "5f48e358d43721376c397f53" }, "heading": "this is heading", "tags": ["tag1","tag2","tag3"], "categories": ["projA", "projectA2"], "content": ["This", "is", "the", "content", "of", "the", "document"], "timestamp": 1598612312.506219, "lang": "en" } When i am importing data in PDI using the mongodb input step the system is putting each array of the "content" element in a different field I want to

Extract data from large files excel

风流意气都作罢 提交于 2020-01-06 19:31:53
问题 I'm using Pentaho Data Integration to create a transformation from xlsx files to mysql, but I can't import data from large files with Excel 2007 xlsx(apache POI Straiming) . It gives me out of memory errors. 回答1: Did you try this option ? Advanced settings -> Generation mode -> Less memory consumed for large excel(Event mode (You need to check "Read excel2007 file format" first) 回答2: I would recommend you to increase jvm memory allocation before running the transformation. By default, pentaho

Running PDI Kettle on Java - Mongodb Step Missing Plugins

China☆狼群 提交于 2020-01-05 14:11:30
问题 I am trying to run a transformation which includes mongodb input step from a java app but always resulting error with this message: org.pentaho.di.core.exception.KettleMissingPluginsException: Missing plugins found while loading a transformation Step : MongoDbInput at org.pentaho.di.trans.TransMeta.loadXML(TransMeta.java:2931) at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2813) at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2774) at org.pentaho.di.trans.TransMeta.<init>

Pentaho Kettle - Get the file names dynamically

感情迁移 提交于 2019-12-25 01:47:23
问题 I hope this message finds everyone well! I'm stucked on a situation on Pentaho PDI Tool and I'm looking for an answer (or at least a light in the end of the cave) to solve it! I have to import, every month, a bunch of xls's files of differents clients. Every file has a different name (witch is given aleatory) and this files are on a folder named with the name of the client. However, I use the same process for all clients and situations. Is there a way to pass the name of the directory as a

Migrating Transformations in Pentaho PDI

久未见 提交于 2019-12-24 15:51:30
问题 We are using two servers, one as preprod and other as Production. When we are migrating jobs or Transformations from preprod to Prod it copies its connection properties as well and this affects our Production job execution. Can someone let me know how to migrate transformations without coping it's connections to another server. 回答1: From the Tools->Options menu, there are two checkboxes that effect PDI's import behavior: "Replace existing objects on open/import" and "Ask before replacing

Applying Pivot in Pentaho Kettle

烂漫一生 提交于 2019-12-24 13:55:43
问题 I'm using pentaho kettle 5.2.0 version. I'm trying to do pivots on my source data,here it is the structure of my source Billingid sku_id qty 1 0 1 1 0 12 1 0 6 1 0 1 1 0 2 1 57 2 1 1430 1 1 2730 1 2 3883 2 2 1456 1 2 571 9 2 9801 5 2 1010 1 And this is what I'm Expecting billingid 0 57 1430 2730 3883 1456 571 9801 1010 1 *******sum of qty****** 2 Any help would be much appreciated ..THANKS in advance 回答1: For denormaliser to work, you first have to Sort, and the Group the rows, to have the