Skip to main content

Improving Connector to Connector Performance for Flows in CData Arc using FlowExecute

  • 23 August 2024
  • 0 replies
  • 10 views

In 2022, Arc introduced the Flow API Functionality, which allows for the publication of an entire flow as an API endpoint that is consumed by users authenticated through the Admin API. Since the entire flow is processed before the API response is returned, one of the benefits of this feature is that a flow executed via a Flow API typically performs faster than normal automation. 

 

In a standard flow, messages are processed on disk from one connector to the next. The worker threads in the application are distributed amongst connectors in a round robin fashion. So, a flow starts off when the first connector in a flow gets assigned worker threads, the file is downloaded or processed in this connector, then the file is written to the input of the successive connector and the worker threads are freed up in the initial connector. Arc’s automation will cycle through all the connectors in all workspaces, but there is not necessarily an order to operations.  

 

While the application will use multiple threads to process in parallel across many connectors at once, connectors are not guaranteed to process in the order that they appear in the flow. Eventually, the successive connector in the flow will then process the file once the application assigns worker threads to it, the connector moves the file to the next connector, frees up its resources, and so on. 

 

Flow API processing is different in that a single message is processed through multiple connectors as a single consecutive process, rather than each discrete connector running independently with its own automation. Documents processed by a Flow API flow are held in memory and processed across all the connectors in the flow using the same worker thread. In a flow involving many connectors this can lead to overall improved performance as Arc would not have to write the message to disk multiple times and wait for resources to be distributed to each connector.  

 

To get these performance benefits you would typically be constrained to using the Flow API. This means you would be limited to interacting with your flows through the API endpoint and you would be limited in the types of connectors which you can incorporate into these flows (Flow API unsupported connectors). However, in the final release of Arc 2023 and now in Arc 2024, you can get almost any type of preexisting flows to be processed in the same way as a flow in a Flow API using the FlowExecute operation.   

 

NOTE: Similar to the Flow API, usage of the FlowExecute operation would require a license tier of Professional or higher to have access to the Admin API features. The authtoken necessary to invoke the operation would need to be taken from an Arc user account with Admin API access: https://cdn.cdata.com/help/AZK/mft/User-Roles.html#admin-api-access 

 

An example of the inputs and the outputs available to this operation in a Script connector are seen below. In your script you will need to define the first connector in the flow by specifying the name of the connector and the workspace where it can be found.  You can optionally also define a filename for the file that you will process through the flow as well as the file contents. The operation can then be called using your admin API user and authtoken with the connector.rsc/flowexecute endpoint.  

 

<!--Required:Define first connector in the flow--> 

<arc:set attr="flow.connectorid" value="ScriptUpdateFilename"/> 

<!--Define workspace where flow is located, if not set this will use the Default workspace--> 

<arc:set attr="flow.workspaceid" value="FlowExecute"/> 

<!--Set filename of file to pass through flow as messagename attribute--> 

<arc:set attr="flow.messagename" value="=Filename]"/> 

<!--Get contents of message to pass to flow and set as messagedata attribute--> 

<!--Note this loads the contents of the file as a string--> 

<arc:set attr="in.file" value="afilepath]" /> 

<arc:call op="fileRead" in="in" out="out" > 

  <arc:set attr="flow.messagedata" value="tout.file:data]" /> 

</arc:call> 

<!--Set a header and its value preserving the original filename--> 

<arc:set attr="flow.headername#1" value="origfilename"/> 

<arc:set attr="flow.headervalue#1" value="sfilename]"/> 

  

<arc:call op="connector.rsc/flowexecute" authtoken="YourUser:YourAdminAPIToken" in="flow" out="results"> 

  <!-- optional logging step to see the flow result state in the application log (Success/Error/Warning/Pending/Skipped)--> 

  <arc:set attr="_log.info" value="Message Status: sresults.result]" /> 

  <!-- optional logging step to see the output message data if applicable in the application log --> 

  <arc:set attr="_log.info" value="Final Flow Output: :results.messagedata]" /> 

  <!-- optional logging step to see the ID of the last connector in the flow in the application log --> 

  <arc:set attr="_log.info" value="Last Connector ID: ;results.lastconnectorid]" /> 

  <!-- optional logging step to see the ID of the last workspace in the flow in the application log -->  

  <arc:set attr="_log.info" value="Last Workspace ID: �results.lastworkspaceid]" /> 

  <!-- optional logging step to see the messageID in the application log -->  

  <arc:set attr="_log.info" value="Message ID: �results.messageid]" /> 

  <!-- optional logging step to see if an error was returned in the application log --> 

  <arc:set attr="_log.info" value="If an Error was encountered this is the result: _results.ErrorMessage | def]" /> 

</arc:call> 

 

Optionally, you can also reference many attributes to gather information on the request you made to the FlowExecute endpoint. These can be the result state, the output message data, the last connector where your file was processed, the workspace where your file was processed, the message ID of your file, and the contents of any error message if applicable.