问题
we have a spark streaming application which integrates with Kafka, I'm trying to optimize it because it makes excessive calls to Schema Registry to download schema.
The avro schema for our data rarely changes, and currently our application calls the Schema Registry whenever a record comes in, which is way too much.
I ran into CachedSchemaRegistryClient from confluent, and it looked promising. Though after looking into its implementation I'm not sure how to use its built-in cache to reduce the REST calls to Schema Registry.
The above link will bring you to the source code, here I'm pasting the only method that has something to do with appending schema to the cache of CachedSchemaRegistryClient.
public synchronized int register(String subject, Schema schema) throws IOException, RestClientException
{
Object schemaIdMap;
if(this.schemaCache.containsKey(subject)) {
schemaIdMap = (Map)this.schemaCache.get(subject);
} else {
schemaIdMap = new HashMap();
this.schemaCache.put(subject, (Map)schemaIdMap);
}
/*
* let's call the above as the FIRST part of this method, below as the SECOND part
*/
if(((Map)schemaIdMap).containsKey(schema)) {
return ((Integer)((Map)schemaIdMap).get(schema)).intValue();
} else if(((Map)schemaIdMap).size() >= this.identityMapCapacity) {
throw new IllegalStateException("Too many schema objects created for " + subject + "!");
} else {
int id = this.registerAndGetId(subject, schema);
((Map)schemaIdMap).put(schema, Integer.valueOf(id));
return id;
}
}
The purpose of this method is to register a schema to Schema Registry as well as local cache and return its schemaID; or return schemaID if schema already exists locally. This works perfectly if we are registering a complete new schema.
But in a scenario where a schema is already registered in Schema Registry (by another application in our situation), and we only want to put the schema in local cache of CachedSchemaRegistryClient for easy and quick access - personally I don't think this is supported as of today, so is there a clean work-around without customization?
We thought about maintaining a local cache ourselves, but would like to keep it as a last-resort if confluent has something to offer.
Any suggestions/ideas are appreciated, thanks in advance.
来源:https://stackoverflow.com/questions/40621390/how-to-populate-the-cache-in-cachedschemaregistryclient-without-making-a-call-to