How to use ExecuteScript (with python as a script engine) for an exercise to add numbers? [Novice user trying to learn NiFi]

后端 未结 1 1792
故里飘歌
故里飘歌 2021-01-20 14:14

I am relatively new to NiFi and am not sure how to do the following correctly. I would like to use ExecuteScript processor (script engine: python) to do the fol

相关标签:
1条回答
  • 2021-01-20 15:01

    Your script is not doing what you would like it to do. There are a couple approaches to this problem:

    1. Operate on the whole flowfile at once with a script that iterates over the rows in the CSV content
    2. Treat the rows in the CSV content as a "record" and operate on each record with a script that handles a single line

    I will provide changes to your script to handle the entire flowfile content at once; you can read more about the Record* processors here, here, and here.

    Here is a script which performs the action you expect. Note the differences to see where I changed things (this script could certainly be made more efficient and concise; it is verbose to demonstrate what is happening, and I am not a Python expert).

    import json
    from java.io import BufferedReader, InputStreamReader
    from org.apache.nifi.processor.io import StreamCallback
    
    # This PyStreamCallback class is what the processor will use to ingest and output the flowfile content
    class PyStreamCallback(StreamCallback):
      def __init__(self):
            pass
      def process(self, inputStream, outputStream):
          try:
            # Get the provided inputStream into a format where you can read lines
            reader = BufferedReader(InputStreamReader(inputStream))
            # Set a marker for the first line to be the header
            isHeader = True        
            try:
              # A holding variable for the lines
              lines = []
              # Loop indefinitely
              while True:
                # Get the next line
                line = reader.readLine()
                # If there is no more content, break out of the loop
                if line is None:
                  break
                # If this is the first line, add the new column
                if isHeader:
                  header = line + ",total"
                  # Write the header line and the new column
                  lines.append(header)
                  # Set the header flag to false now that it has been processed
                  isHeader = False
                else:
                  # Split the line (a string) into individual elements by the ',' delimiter
                  elements = self.extract_elements(line)
                  # Get the sum (this method is unnecessary but shows where your "summation" method would go)
                  sum = self.summation(elements)
                  # Write the output of this line
                  newLine = ",".join([line, str(sum)])
                  lines.append(newLine)
    
              # Now out of the loop, write the output to the outputStream
              output = "\n".join([str(l) for l in lines])
              outputStream.write(bytearray(output.encode('utf-8')))
    
            finally:
                if reader is not None:
                    reader.close()
    
          except Exception as e:
            log.warn("Exception in Reader")
            log.warn('-' * 60)
            log.warn(str(e))
            log.warn('-' * 60)
            raise e
            session.transfer(flowFile, REL_FAILURE)
    
      def extract_elements(self, line):
        # This splits the line on the ',' delimiter and converts each element to an integer, and puts them in a list
        return [int(x) for x in line.split(',')]
    
      # This method replaces your "summation" method and can accept any number of inputs, not just 3
      def summation(self, list):
        # This returns the sum of all items in the list
        return sum(list)
    
    
    flowFile = session.get()
    if (flowFile != None):
      flowFile = session.write(flowFile,PyStreamCallback())
      session.transfer(flowFile, REL_SUCCESS)
    

    Result from my flow (using your input in a GenerateFlowFile processor):

    2018-07-20 13:54:06,772 INFO [Timer-Driven Process Thread-5] o.a.n.processors.standard.LogAttribute LogAttribute[id=b87f0c01-0164-1000-920e-799647cb9b48] logging for flow file StandardFlowFileRecord[uuid=de888571-2947-4ae1-b646-09e61c85538b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1532106928567-1, container=default, section=1], offset=2499, length=51],offset=0,name=470063203212609,size=51]
    --------------------------------------------------
    Standard FlowFile Attributes
    Key: 'entryDate'
        Value: 'Fri Jul 20 13:54:06 EDT 2018'
    Key: 'lineageStartDate'
        Value: 'Fri Jul 20 13:54:06 EDT 2018'
    Key: 'fileSize'
        Value: '51'
    FlowFile Attribute Map Content
    Key: 'filename'
        Value: '470063203212609'
    Key: 'path'
        Value: './'
    Key: 'uuid'
        Value: 'de888571-2947-4ae1-b646-09e61c85538b'
    --------------------------------------------------
    first,second,third,total
    1,4,9,14
    7,5,2,14
    3,8,7,18
    
    0 讨论(0)
提交回复
热议问题