问题
I have an ExecuteScript processor that does an XML flow file validation against schematron. I'd like the content of the schematron file to be cached somewhere rather than read from the disk for every flow file again and again.
What is the best option for doing this? Do I need yet another script that puts the content of the schematron into context.stateManager or PutDistributedMapCache or what?
回答1:
I was about to answer NO but it seems that it is possible. You are able to cache variables inside the ExecuteScript processor.
general idea
Using a simple script with the ExecuteScript processor using the EcmaScript engine shows that you actually are able to store state inside the processor.
var flowFile = session.get();
if (flowFile !== null) {
var x = (x || 0) + 1;
log.error('this is round: ' + x);
session.transfer(flowFile, REL_SUCCESS);
}
Using this script inside the processor will result in something along the lines being logged:
...
ExecuteScript[id=...] this is round: 3
ExecuteScript[id=...] this is round: 2
ExecuteScript[id=...] this is round: 1
updating the file at most every x time units
I borowed the base code from the existing NiFi ValidateXML processor.
The basic idea is to update the file when
- it is not set yet or
- at least x units of time have passed since last update
The following code will achieve this, whereby SCHEMA_FILE_PATH is the path to the schema file. In this case x is thirty seconds:
// type definitions
var File = Java.type("java.io.File");
var FileNotFoundException = Java.type("java.io.FileNotFoundException");
var System = Java.type("java.lang.System");
// constants
var SCHEMA_FILE_PATH = "/foo/bar"; // exchange with real path
var timeoutInMillis = 30 * 1000; // 30 seconds
// initialize
var schemaFile = schemaFile || null;
var lastUpdateMillis = lastUpdateMillis || 0;
var flowFile = session.get();
function updateSchemaFile() {
schemaFile = new File(SCHEMA_FILE_PATH);
if (!schemaFile.exists()) {
throw new FileNotFoundException("Schema file not found at specified location: " + schemaFile.getAbsolutePath());
}
lastUpdateMillis = System.currentTimeMillis();
}
if (flowFile !== null) {
var now = System.currentTimeMillis();
var schemaFileShouldBeUpdated = (schemaFile == null) || ((lastUpdateMillis || 0) + timeoutInMillis) < now;
if (schemaFileShouldBeUpdated) {
updateSchemaFile();
}
// TODO Do with the file whatever you want
log.error('was file updated this round? ' + schemaFileShouldBeUpdated + '; last update millis: ' + lastUpdateMillis);
session.transfer(flowFile, REL_SUCCESS);
}
DISCLAIMER
I cannot tell if, let alone when, the variable/s may be purged. Inspecting the source code used in the ExecuteScript processor indicates that the script file is reloaded periodically. I am not sure about the consequences of that.
Also I haven't tried using one of the other ScriptingLanguage supported as I'm most familiar with JavaScript.
回答2:
In groovy
script there is a possibility to declare class with static variables, so they definitely will keep status after processor started.
Additionally, to manage initialization of those static variables you could use the feature of ExecuteGroovyScript
processor to intercept processor start and stop.
In following example I'm going to compare flow-file content to some file on disk because I'm not familiar to schematron.
import org.apache.nifi.processor.ProcessContext
class Cache {
static String validatorText = null
}
//this function called on processor start, so you can't use flow file in it
static void onStart(ProcessContext context){
//init cached(static) variable from file
Cache.validatorText = new File('/path/to/validator.txt').getText('UTF-8')
println "onStart ${context}"
}
//process flow file and compare it to `Cache.validatorText`
def ff=session.get()
if(!ff)return
def ffText = ff.read().getText("UTF-8")
assert ffText = Cache.validatorText
REL_SUCCESS << ff
Note: you could set
Failure strategy
=transfer to failure
. In this case on any error (including assertion failure) flow file will be redirected to REL_FAILURE without additional code.
来源:https://stackoverflow.com/questions/58959352/caching-file-content-inside-executescript-processor-of-apache-nifi