Apache Flink Mapping at Runtime

问题

i have build a flink streaming job to read a xml file from kafka convert the file and write it in a database. As the attributes in the xml file don't match the database column names i have build a switch case for the mapping.

As this is not really flexible i want to take this hardwired mapping information out of the code. First of all i came up with the idea of a mapping file which could look like this:

path.in.xml.to.attribut=database.column.name

The current job logic looks like this:

switch(path.in.xml.to.attribute){
    case "example.one.name":
        return "name";

With the mapping file i guess i would work with an Map to store the mapping data as a Key-Value-Pair.

This would make the job more flexible as it is right now. Still a downside would be that for every change in this configuration i want to apply i would have to restart the flink job.

My question is if it is possible to inject this kind of mapping logic at runtime, for example via an own kafka topic. And when this kind of implementation is possible how could it look like as an example.

回答1:

If the only you need is to be able to update the mapping between the xml attributes and database column names, then the The Broadcast State Pattern can be used. Also, A Practical Guide to Broadcast State in Apache Flink is usefull as well.

The idea is to have a stream, subscribed to your own kafka topic with database mappings which broadcasts the updates to all task managers. These operators will maintain this Map<String, String> as a state and you can use this mapping state to resolve the column name, i.e. instead of switch(path.in.xml.to.attribute) use map.get(path.in.xml.to.attribute)). The map operator in this case should be replaced with BroadcastProcessFunction.

来源：https://stackoverflow.com/questions/64894695/apache-flink-mapping-at-runtime

标签

java

apache-flink

flink-streaming