问题
I have a pipeline in NiFi that pulls down some invalid JSON that I need to clean up. The best solution I've concocted is to run a Python script via ExecuteStreamCommand
and simultaneously clean/split it up in one fell swoop. However, even though I use sys.stdout.write()
in my for loop, only the original JSON comes out in the output stream in NiFi.
Am I misusing sys.stdout.write()
or is this possible, but I've just done something wrong? My end goal is for each line of the json to be a new flow file, i.e. file 1 is {"fruit":"apple",...
, file 2 is {"fruit":"cherry",...
, and so on.
example JSON
{"fruit":"apple", "vegetable":"celery", "location":{"country":"nor\\way", "city":"oslo", }, "color":"blue"}
{"fruit":"cherry", "vegetable":"kale", "location":{"country":"france", "city":"calais", }, "color":"green"}
{"fruit":"peach", "vegetable":"peas", "location":{"country":"united\\kingdom", "city":"london", }, "color":"yellow"}
script
import json
import re
import sys
flow_file = sys.stdin.read()
try:
load = json.loads(flow_file)
sys.stdout.write(flow_file)
except:
flow_file_esc = re.sub(r"[(\\)]", "", flow_file)
for f in flow_file_esc.splitlines():
sys.stdout.write(str(f))
回答1:
Can you clean the file first with ReplaceText and then split it with SplitJson, SplitRecord, or ForkRecord?
If you need to combine the two operations and want to script it, you could try ExecuteScript with Jython (since it doesn't look like you're using native CPython libraries), I have some simple examples in my cookbook and my blog.
来源:https://stackoverflow.com/questions/60383529/using-sys-stdout-write-to-create-multiple-files-in-nifi