data-integration

How to schedule Pentaho Kettle transformations?

自闭症网瘾萝莉.ら 提交于 2019-12-05 07:34:31
问题 I've set up four transformations in Kettle. Now, I would like to schedule them so that they will run daily at a certain time and one after the another. For example, tranformation1 -> transformation2 -> transformation3 -> transformation4 should run daily at 8.00 am. How can I do that? 回答1: You can execute transformation from the command line using the tool Pan: Pan.bat /file:transform.ktr /param:name=value The syntax might be different depending on your system - check out the link above for

Unable to connect to HDFS using PDI step

强颜欢笑 提交于 2019-12-04 17:17:34
I have successfully configured Hadoop 2.4 in an Ubuntu 14.04 VM from a Windows 8 system. Hadoop installation is working absolutely fine and also i am able to view the Namenode from my windows browser. Attached Image Below: So, my host name is : ubuntu and hdfs port : 9000 (correct me if I am wrong). Core-site.xml : <property> <name>fs.defaultFS</name> <value>hdfs://ubuntu:9000</value> </property> The issue is while connecting to HDFS from my Pentaho Data Integration Tool. Attached Image Below. PDI version: 4.4.0 Step Used: Hadoop Copy Files Please kindly help me in connecting to HDFS using PDI

Data loading is slow while using “Insert/Update” step in pentaho

倖福魔咒の 提交于 2019-12-04 14:38:44
问题 Data loading is slow while using "Insert/Update" step in pentaho 4.4.0 I am using pentaho 4.4.0. While using the "Insert/Update" step in kettle the speed of the data load is too slow compared to mysql. This step will scan through the entire records in table before inserting. If the record exist it will do a update. So what shall be done to optimize the performance while doing "Insert/Update" . and the process speed is 4 r/s, so totally my records will be above 1 lakh... The process takes 2

Apache Kafka vs Apache Storm

﹥>﹥吖頭↗ 提交于 2019-12-04 07:23:10
问题 Apache Kafka: Distributed messaging system Apache Storm: Real Time Message Processing How we can use both technologies in a real-time data pipeline for processing event data? In terms of real time data pipeline both seems to me do the job identical. How can we use both the technologies on a data pipeline? 回答1: You use Apache Kafka as a distributed and robust queue that can handle high volume data and enables you to pass messages from one end-point to another. Storm is not a queue. It is a

How to schedule Pentaho Kettle transformations?

旧街凉风 提交于 2019-12-03 21:42:08
I've set up four transformations in Kettle. Now, I would like to schedule them so that they will run daily at a certain time and one after the another. For example, tranformation1 -> transformation2 -> transformation3 -> transformation4 should run daily at 8.00 am. How can I do that? You can execute transformation from the command line using the tool Pan : Pan.bat /file:transform.ktr /param:name=value The syntax might be different depending on your system - check out the link above for more information. When you have a batch file executing your transformation you can just schedule it to run

Apache Kafka vs Apache Storm

℡╲_俬逩灬. 提交于 2019-12-02 13:53:37
Apache Kafka: Distributed messaging system Apache Storm: Real Time Message Processing How we can use both technologies in a real-time data pipeline for processing event data? In terms of real time data pipeline both seems to me do the job identical. How can we use both the technologies on a data pipeline? You use Apache Kafka as a distributed and robust queue that can handle high volume data and enables you to pass messages from one end-point to another. Storm is not a queue. It is a system that has distributed real time processing abilities, meaning you can execute all kind of manipulations