I can do,
{
\"type\": \"record\",
\"name\": \"Foo\",
\"fields\": [
{\"name\": \"bar\", \"type\": {
\"type\": \"record\",
I assume, your motivation is (as my own) structuring your schema definition and avoiding copy&paste-errors.
To achieve that, you can also use Avro IDL. It allows to define avro schemas on a higher level. Reusing types is possible within the same file and also across multiple files.
To generate the .avsc-files run
$ java -jar avro-tools-1.7.7.jar idl2schemata my-protocol.avdl
The resulting .avsc-files will look pretty much the same as your initial example, but as they are generated from the .avdl you'll not get lost in the verbose json-format.
Yes, it's possible.
I've done that in my java project by defining common schema files in avro-maven-plugin Example:
search_result.avro:
{"namespace": "com.myorg.other",
"type": "record",
"name": "SearchResult",
"fields": [
{"name": "type", "type": "SearchResultType"},
{"name": "keyWord", "type": "string"},
{"name": "searchEngine", "type": "string"},
{"name": "position", "type": "int"},
{"name": "userAction", "type": "UserAction"}
]
}
search_suggest.avro:
{"namespace": "com.myorg.other",
"type": "record",
"name": "SearchSuggest",
"fields": [
{"name": "suggest", "type": "string"},
{"name": "request", "type": "string"},
{"name": "searchEngine", "type": "string"},
{"name": "position", "type": "int"},
{"name": "userAction", "type": "UserAction"},
{"name": "timestamp", "type": "long"}
]
}
user_action.avro:
{"namespace": "com.myorg.other",
"type": "enum",
"name": "UserAction",
"symbols": ["S", "V", "C"]
}
search_result_type.avro
{"namespace": "com.myorg.other",
"type": "enum",
"name": "SearchResultType",
"symbols": ["O", "S", "A"]
}
avro-maven-plugin configuration:
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.7.4</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
<includes>
<include>**/*.avro</include>
</includes>
<imports>
<import>${project.basedir}/src/main/resources/avro/user_action.avro</import>
<import>${project.basedir}/src/main/resources/avro/search_result_type.avro</import>
</imports>
</configuration>
</execution>
</executions>
</plugin>
The order of imports in the pom.xml matters. You must import the subtypes first before processing the rest.
<imports>
<import>${project.basedir}/src/main/resources/avro/Bar.avro</import>
<import>${project.basedir}/src/main/resources/avro/Foo.avro</import>
</imports>
That would unblock the codegen from emitting undefined name: Bar.avro
error.
You need to import the avsc file in avro-maven plugin where you have first written the object schema that you want to reuse
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>${avro.maven.plugin.version}</version>
<configuration>
<stringType>String</stringType>
</configuration>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>src/main/java/com/xyz/avro</sourceDirectory> // Avro directory
<imports>
<import>src/main/java/com/xyz/avro/file.avsc</import> // Import here
</imports>
</configuration>
</execution>
</executions>
You can also define multiple schemas inside of one file:
schemas.avsc:
[
{
"type": "record",
"name": "Bar",
"fields": [ ]
},
{
"type": "record",
"name": "Foo",
"fields": [
{"name": "bar", "type": "Bar"}
]
}
]
If you want to reuse the schemas in multiple places this is not super nice but it improves readability and maintainability a lot in my opinion.
From what I have been able to figure out so far, no.
There is a good write up about someone who coded their own method for doing this here:
http://www.infoq.com/articles/ApacheAvro