问题
I am writing a simple twitter program, where I am reading Tweets using Kafka and want to use Avro for serialization. So far I have just set up twitter configuration in Scala and now want to read tweets using this config.
How do I import the following avro schema as defined in the file tweets.avsc in my program?
{
"namespace": "tweetavro",
"type": "record",
"name": "Tweet",
"fields": [
{"name": "name", "type": "string"},
{"name": "text", "type": "string"}
]
}
I followed some examples on web which shows something like import tweetavro.Tweet
to import the schema in Scala so that we can use it like
def main (args: Array[String]) {
val twitterStream = TwitterStream.getStream
twitterStream.addListener(new OnTweetPosted(s => sendToKafka(toTweet(s))))
twitterStream.filter(filterUsOnly)
}
private def toTweet(s: Status): Tweet = {
new Tweet(s.getUser.getName, s.getText)
}
private def sendToKafka(t:Tweet) {
println(toJson(t.getSchema).apply(t))
val tweetEnc = toBinary[Tweet].apply(t)
val msg = new KeyedMessage[String, Array[Byte]](KafkaTopic, tweetEnc)
kafkaProducer.send(msg)
}
I am following the same and using the below following plugins in pom.xml
<!-- AVRO MAVEN PLUGIN -->
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.7.7</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/scala/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<!-- MAVEN COMPILER PLUGIN -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
After doing all this, still i cannot do import tweetavro.Tweet
Can anayone please help?
Thanks!
回答1:
You should first compile that schema into a class. I'm not sure there is a library for Avro in Scala which is production ready but you may generate a class for Java and use it in Scala:
java -jar /path/to/avro-tools-1.7.7.jar compile schema tweet.avsc .
Change this line for your needs and you should get a tweetavro.Tweet class generated by this tool. Then you can place it into your project and use in the way you've just described.
More info here
upd: FYI it seems there is a library in Scala but I've never used it before
回答2:
You could also use avro4s. Define your case class (or generate it) based on the schema. Let's call that class Tweet
. Then you create an AvroOutputStream
, which will infer the schema as well from the case class, and is used to serialize instances. Then we can write to a byte array, and send that to kafka. Eg:
val tweet: Tweet= ... // the instance you want to serialize
val out = new ByteArrayOutputStream // we collect the serialized output in this
val avro = AvroOutputStream[Tweet](out) // you specify the type here as well
avro.write(tweet)
avro.close()
val bytes = out.toByteArray
val msg = new KeyedMessage[String, Array[Byte]](KafkaTopic, bytes)
kafkaProducer.send(msg)
回答3:
I recommend using Avrohugger. It is the new kid on the block in terms of Scala case classes for Avro, but supports everything I need and I really like that it isn't macro based, so I can actually see what gets generated.
The maintainer has been awesome to work with and very accepting of contributions and feedback. It is not and probably never will be as feature rich as the official Java code gen, but it will suit most peoples needs.
Currently, it is missing support for unions (Other than optional types) and recursive types.
The SBT plugin works very well and there is a new web interface if you want to quickly see what it does with your Avro schemas:
https://avro2caseclass.herokuapp.com/
More details here:
https://github.com/julianpeeters/avrohugger
来源:https://stackoverflow.com/questions/31763571/importing-avro-schema-in-scala