data-integration

Casting date in Talend Data Integration

﹥>﹥吖頭↗ 提交于 2019-12-22 13:56:21
问题 In a data flow from one table to another, I would like to cast a date. The date leaves the source table as a string in this format: "2009-01-05 00:00:00:000 + 01:00". I tried to convert this to a date using a tConvertType, but that is not allowed apparently. My second option is to cast this string to a date using a formula in a tMap component. At the moment I tried these formulas: - TalendDate.formatDate("yyyy-MM-dd",row3.rafw_dz_begi); - TalendDate.formatDate("yyyy-MM-dd HH:mm:ss",row3.rafw

Unable to connect to HDFS using PDI step

自古美人都是妖i 提交于 2019-12-21 21:22:39
问题 I have successfully configured Hadoop 2.4 in an Ubuntu 14.04 VM from a Windows 8 system. Hadoop installation is working absolutely fine and also i am able to view the Namenode from my windows browser. Attached Image Below: So, my host name is : ubuntu and hdfs port : 9000 (correct me if I am wrong). Core-site.xml : <property> <name>fs.defaultFS</name> <value>hdfs://ubuntu:9000</value> </property> The issue is while connecting to HDFS from my Pentaho Data Integration Tool. Attached Image Below

Missing plugins found while loading a transformation on Kettle

回眸只為那壹抹淺笑 提交于 2019-12-17 21:36:52
问题 I receive this error whenever I run my extraction from the command line, not in the Spoon UI. Missing plugins found while loading a transformation Step : MongoDbInput at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMeta(JobEntryTrans.java:1200) at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:643) at org.pentaho.di.job.Job.execute(Job.java:714) at org.pentaho.di.job.Job.execute(Job.java:856) ... 4 more Caused by: org.pentaho.di.core.exception

Pentaho Hadoop File Input

 ̄綄美尐妖づ 提交于 2019-12-13 01:24:46
问题 I'm trying to retrieve data from an standalone Hadoop (version 2.7.2 qith properties configured by default) HDFS using Pentaho Kettle (version 6.0.1.0-386 ). Pentaho and Hadoop are not in the same machine but I have acces from one to another. I created a new "Hadoop File Input" with the following properties: Environment File/Folder Wildcard Rquired Include subfolders url-to-file N N url-to-file is built like: ${PROTOCOL}://${USER}:${PASSWORD}@${IP}:${PORT}${PATH_TO_FILE} eg: hdfs://hadoop:

Count the number of rows for each file along with the file name in Talend

[亡魂溺海] 提交于 2019-12-12 06:36:59
问题 I have built a job that reads the data from a file, and based on the unique data of a particular columns, splits the data set into many files. I am able to acheive the requirement by the below job : Now from this job which is splitting the output into multiple files, what I want is to add a sub job which would give me two columns. In the first column I want the name of the files that I created in my main job and in the second column, I want the count of number of rows each created output file

Apache Nifi/Cassandra - how to load CSV into Cassandra table

痞子三分冷 提交于 2019-12-10 09:33:46
问题 I have various CSV files incoming several times per day, storing timeseries data from sensors, which are parts of sensors stations. Each CSV is named after the sensor station and sensor id from which it is coming from, for instance "station1_sensor2.csv". At the moment, data is stored like this : > cat station1_sensor2.csv 2016-05-04 03:02:01.001000+0000;0; 2016-05-04 03:02:01.002000+0000;0.1234; 2016-05-04 03:02:01.003000+0000;0.2345; I have created a Cassandra table to store them and to be

Designing a component both producer and consumer in Kafka

有些话、适合烂在心里 提交于 2019-12-07 18:19:45
问题 I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data. My problem is that one of my consumers (lets say ConsumerA ) consumes several topics from Kafka and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that

Casting date in Talend Data Integration

回眸只為那壹抹淺笑 提交于 2019-12-06 07:21:59
In a data flow from one table to another, I would like to cast a date. The date leaves the source table as a string in this format: "2009-01-05 00:00:00:000 + 01:00". I tried to convert this to a date using a tConvertType, but that is not allowed apparently. My second option is to cast this string to a date using a formula in a tMap component. At the moment I tried these formulas: - TalendDate.formatDate("yyyy-MM-dd",row3.rafw_dz_begi); - TalendDate.formatDate("yyyy-MM-dd HH:mm:ss",row3.rafw_dz_begi); - return TalendDate.formatDate("yyyy-MM-dd HH:mm:ss",row3.rafw_dz_begi); None of these worked

Designing a component both producer and consumer in Kafka

丶灬走出姿态 提交于 2019-12-06 01:32:47
I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data. My problem is that one of my consumers (lets say ConsumerA ) consumes several topics from Kafka and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that forms a loop on Kafka and my component. This is what bothers me, is this a desired architecture in Kafka?

Apache Nifi/Cassandra - how to load CSV into Cassandra table

坚强是说给别人听的谎言 提交于 2019-12-05 14:44:29
I have various CSV files incoming several times per day, storing timeseries data from sensors, which are parts of sensors stations. Each CSV is named after the sensor station and sensor id from which it is coming from, for instance "station1_sensor2.csv". At the moment, data is stored like this : > cat station1_sensor2.csv 2016-05-04 03:02:01.001000+0000;0; 2016-05-04 03:02:01.002000+0000;0.1234; 2016-05-04 03:02:01.003000+0000;0.2345; I have created a Cassandra table to store them and to be able to query them for various identified tasks. The Cassandra table looks like this : cqlsh > CREATE