Cloudera 5.4.2: Avro block size is invalid or too large when using Flume and Twitter streaming

后端 未结 1 665
后悔当初
后悔当初 2021-01-06 13:45

There is tiny problem when I try Cloudera 5.4.2. Base on this article

Apache Flume - Fetching Twitter Data http://www.tutorialspoint.com/apache_flume/fetching_twitt

相关标签:
1条回答
  • 2021-01-06 14:10

    Use Cloudera TwitterSource

    Otherwise will meet this problem.

    Unable to correctly load twitter avro data into hive table

    In the article: This is apache TwitterSource

    TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
    Twitter 1% Firehose Source
    This source is highly experimental. It connects to the 1% sample Twitter Firehose using streaming API and continuously downloads tweets, converts them to Avro format, and sends Avro events to a downstream Flume sink.
    

    But it should be cloudera TwitterSource:

    https://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/

    http://blog.cloudera.com/blog/2012/10/analyzing-twitter-data-with-hadoop-part-2-gathering-data-with-flume/

    http://blog.cloudera.com/blog/2012/11/analyzing-twitter-data-with-hadoop-part-3-querying-semi-structured-data-with-hive/

    TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
    

    And not just download the pre build jar, because our cloudera version is 5.4.2, otherwise you will get this error:

    Cannot run Flume because of JAR conflict

    You should compile it using maven

    https://github.com/cloudera/cdh-twitter-example

    Download and compile: flume-sources.1.0-SNAPSHOT.jar. This jar contains the implementation of Cloudera TwitterSource.

    Steps:

    wget https://github.com/cloudera/cdh-twitter-example/archive/master.zip

    sudo yum install apache-maven Put to flume plugins directory:

    /var/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar 
    

    mvn package

    Notice: Yum update to latest version, otherwise compile (mvn package) fails due to some security problem.

    0 讨论(0)
提交回复
热议问题