How to populate the cache in CachedSchemaRegistryClient without making a call to register a new schema?

扶醉桌前 提交于 2020-01-13 11:58:13

问题


we have a spark streaming application which integrates with Kafka, I'm trying to optimize it because it makes excessive calls to Schema Registry to download schema.

The avro schema for our data rarely changes, and currently our application calls the Schema Registry whenever a record comes in, which is way too much.

I ran into CachedSchemaRegistryClient from confluent, and it looked promising. Though after looking into its implementation I'm not sure how to use its built-in cache to reduce the REST calls to Schema Registry.

The above link will bring you to the source code, here I'm pasting the only method that has something to do with appending schema to the cache of CachedSchemaRegistryClient.

public synchronized int register(String subject, Schema schema) throws IOException, RestClientException
{
    Object schemaIdMap;
    if(this.schemaCache.containsKey(subject)) {
        schemaIdMap = (Map)this.schemaCache.get(subject);
    } else {
        schemaIdMap = new HashMap();
        this.schemaCache.put(subject, (Map)schemaIdMap);
    }
    /*
     * let's call the above as the FIRST part of this method, below as the SECOND part
     */
    if(((Map)schemaIdMap).containsKey(schema)) {
        return ((Integer)((Map)schemaIdMap).get(schema)).intValue();
    } else if(((Map)schemaIdMap).size() >= this.identityMapCapacity) {
        throw new IllegalStateException("Too many schema objects created for " + subject + "!");
    } else {
        int id = this.registerAndGetId(subject, schema);
        ((Map)schemaIdMap).put(schema, Integer.valueOf(id));
        return id;
    }
}

The purpose of this method is to register a schema to Schema Registry as well as local cache and return its schemaID; or return schemaID if schema already exists locally. This works perfectly if we are registering a complete new schema.

But in a scenario where a schema is already registered in Schema Registry (by another application in our situation), and we only want to put the schema in local cache of CachedSchemaRegistryClient for easy and quick access - personally I don't think this is supported as of today, so is there a clean work-around without customization?

We thought about maintaining a local cache ourselves, but would like to keep it as a last-resort if confluent has something to offer.

Any suggestions/ideas are appreciated, thanks in advance.

来源:https://stackoverflow.com/questions/40621390/how-to-populate-the-cache-in-cachedschemaregistryclient-without-making-a-call-to

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!