Can I split an Apache Avro schema across multiple files?

前端 未结 6 1459
青春惊慌失措
青春惊慌失措 2020-12-24 12:43

I can do,

{
    \"type\": \"record\",
    \"name\": \"Foo\",
    \"fields\": [
        {\"name\": \"bar\", \"type\": {
            \"type\": \"record\",
             


        
相关标签:
6条回答
  • 2020-12-24 12:48

    I assume, your motivation is (as my own) structuring your schema definition and avoiding copy&paste-errors.

    To achieve that, you can also use Avro IDL. It allows to define avro schemas on a higher level. Reusing types is possible within the same file and also across multiple files.

    To generate the .avsc-files run

    $ java -jar avro-tools-1.7.7.jar idl2schemata my-protocol.avdl
    

    The resulting .avsc-files will look pretty much the same as your initial example, but as they are generated from the .avdl you'll not get lost in the verbose json-format.

    0 讨论(0)
  • 2020-12-24 12:49

    Yes, it's possible.

    I've done that in my java project by defining common schema files in avro-maven-plugin Example:

    search_result.avro:

    {"namespace": "com.myorg.other",
     "type": "record",
     "name": "SearchResult",
     "fields": [
         {"name": "type", "type": "SearchResultType"},
         {"name": "keyWord",  "type": "string"},
         {"name": "searchEngine", "type": "string"},
         {"name": "position", "type": "int"},
         {"name": "userAction", "type": "UserAction"}
     ]
    }
    

    search_suggest.avro:

    {"namespace": "com.myorg.other",
     "type": "record",
     "name": "SearchSuggest",
     "fields": [
         {"name": "suggest", "type": "string"},
         {"name": "request",  "type": "string"},
         {"name": "searchEngine", "type": "string"},
         {"name": "position", "type": "int"},
         {"name": "userAction", "type": "UserAction"},
         {"name": "timestamp", "type": "long"}
     ]
    }
    

    user_action.avro:

    {"namespace": "com.myorg.other",
     "type": "enum",
     "name": "UserAction",
     "symbols": ["S", "V", "C"]
    }
    

    search_result_type.avro

    {"namespace": "com.myorg.other",
     "type": "enum",
     "name": "SearchResultType",
     "symbols": ["O", "S", "A"]
    }
    

    avro-maven-plugin configuration:

    <plugin>
        <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.7.4</version>
        <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
            <goal>schema</goal>
            </goals>
        <configuration>
         <sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
             <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
         <includes>
             <include>**/*.avro</include>
         </includes>
         <imports>
                  <import>${project.basedir}/src/main/resources/avro/user_action.avro</import>
                  <import>${project.basedir}/src/main/resources/avro/search_result_type.avro</import>
         </imports>
           </configuration>
         </execution>
    </executions>
    </plugin>
    
    0 讨论(0)
  • 2020-12-24 13:00

    The order of imports in the pom.xml matters. You must import the subtypes first before processing the rest.

    <imports>
        <import>${project.basedir}/src/main/resources/avro/Bar.avro</import>
        <import>${project.basedir}/src/main/resources/avro/Foo.avro</import>
    </imports>
    

    That would unblock the codegen from emitting undefined name: Bar.avro error.

    0 讨论(0)
  • 2020-12-24 13:05

    You need to import the avsc file in avro-maven plugin where you have first written the object schema that you want to reuse

    <plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>${avro.maven.plugin.version}</version>
    <configuration>
        <stringType>String</stringType>
    </configuration>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>src/main/java/com/xyz/avro</sourceDirectory> // Avro directory
                <imports>
                    <import>src/main/java/com/xyz/avro/file.avsc</import> // Import here
                </imports>
            </configuration>
        </execution>
    </executions>
    

    0 讨论(0)
  • 2020-12-24 13:07

    You can also define multiple schemas inside of one file:

    schemas.avsc:

    [
    {
        "type": "record",
        "name": "Bar",
        "fields": [ ]
    },
    {
        "type": "record",
        "name": "Foo",
        "fields": [
            {"name": "bar", "type": "Bar"}
        ]
    }
    ]
    

    If you want to reuse the schemas in multiple places this is not super nice but it improves readability and maintainability a lot in my opinion.

    0 讨论(0)
  • 2020-12-24 13:13

    From what I have been able to figure out so far, no.

    There is a good write up about someone who coded their own method for doing this here:

    http://www.infoq.com/articles/ApacheAvro

    0 讨论(0)
提交回复
热议问题