flume-twitter

Flume - TwitterSource language filter

穿精又带淫゛_ 提交于 2019-12-24 13:08:16
问题 I would like to ask your help in the following case. I'm currently using Cloudera CDH 5.1.2 and I tried to collect Twitter data using Flume as it is described in the following porsts (Cloudera): http://blog.cloudera.com/blog/2012/10/analyzing-twitter-data-with-hadoop-part-2-gathering-data-with-flume/ github.com/cloudera/cdh-twitter-example I downloaded the source and rebuilt the flume-sources after updating the versions in pom.xml: <flume.version>1.5.0-cdh5.1.2</flume.version> <hadoop.version

Unable to correctly load twitter avro data into hive table

杀马特。学长 韩版系。学妹 提交于 2019-12-17 20:27:35
问题 Need your help! I am trying a trivial exercise of getting the data from twitter and then loading it up in Hive for analysis. Though I am able to get data into HDFS using flume (using Twitter 1% firehose Source) and also able to load the data into Hive table. But unable to see all the columns I have expected to be there in the twitter data like user_location, user_description, user_friends_count, user_description, user_statuses_count. The schema derived from Avro only contains two columns

Flume not processing keywords from Twitter source with flume-ng with Hadoop 2.5 cdh5.3

前提是你 提交于 2019-12-10 13:58:14
问题 I am trying to process some twitter keywords with MemChannel and HDFS . But flume-ng is not showing further progress after HDFS started status on the console. Here are /etc/flume-ns/conf/flume-env.sh file contents. # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version

unable to download data from twitter through flume

心已入冬 提交于 2019-12-06 17:01:15
问题 bin/flume-ng agent -n TwitterAgent --conf ./conf/ -f conf/flume-twitter.conf -Dflume.root.logger=DEBUG,console When I run the above command it generate the following errors: 2016-05-06 13:33:31,357 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist. Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com

unable to download data from twitter through flume

混江龙づ霸主 提交于 2019-12-04 22:43:25
bin/flume-ng agent -n TwitterAgent --conf ./conf/ -f conf/flume-twitter.conf -Dflume.root.logger=DEBUG,console When I run the above command it generate the following errors: 2016-05-06 13:33:31,357 (Twitter Stream consumer-1[Establishing connection]) [INFO - twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] 404:The URI requested is invalid or the resource requested, such as a user, does not exist. Unknown URL. See Twitter Streaming API documentation at http://dev.twitter.com/pages/streaming_api This is my flume-twitter.conf file located in flume/conf folder: TwitterAgent

Cloudera 5.4.2: Avro block size is invalid or too large when using Flume and Twitter streaming

被刻印的时光 ゝ 提交于 2019-11-30 23:46:40
There is tiny problem when I try Cloudera 5.4.2. Base on this article Apache Flume - Fetching Twitter Data http://www.tutorialspoint.com/apache_flume/fetching_twitter_data.htm It tries to fetching tweets using Flume and twitter streaming for data analysis. All things are happy, create Twitter app, create directory on HDFS, configure Flume then start to fetch data, create schema on top of tweets. Then, here is the problem. Twitter streaming converts tweets to Avro format and send Avro events to downsteam HDFS sinks, when Hive table backed by Avro load the data, I got the error message said

Cloudera 5.4.2: Avro block size is invalid or too large when using Flume and Twitter streaming

人走茶凉 提交于 2019-11-30 18:27:41
问题 There is tiny problem when I try Cloudera 5.4.2. Base on this article Apache Flume - Fetching Twitter Data http://www.tutorialspoint.com/apache_flume/fetching_twitter_data.htm It tries to fetching tweets using Flume and twitter streaming for data analysis. All things are happy, create Twitter app, create directory on HDFS, configure Flume then start to fetch data, create schema on top of tweets. Then, here is the problem. Twitter streaming converts tweets to Avro format and send Avro events