Skip to main content
Solved

Find and replace CSV file columns based on translation table in CData Arc?


Hi,

I have a flow and would like to take in and forward a csv file in said flow. In the flow, the end product should however have the 2nd column values translated. Say we have a python dict for translations:

translations: Dict[str, str] = {
"08.0": "S",
"08.1": "O",
"08.3": "S",
"08.4": "O",
"08.6": "S3",
"09.0": "T",
"10.5": "LS",
"10.6": "LS"
}

What I would like to do would be to check each row’s 2nd column if a key from this dictionary is present in the said 2nd column (substring, doesn’t have to be 100% match), and then change the row so that the 2nd column would be rewritten to the value of the key found from the translation dictionary.

If the row’s 2nd column’s text doesn’t match to any of the dictionary keys, the row should be omitted completely.

 

I can do this really easily in Python. But how can I achieve this in CData Arc, should I use the ArcScript or create a python-based program that exposes REST API…? The latter sounds a little too much for this simple of a task.

 

Can I write python snippets inside CData Arc and call those snippets as parts of my flow?

There is not a way to run a simple snippet of Python code in a standalone ArcScript to create a simple function (we are exploring Python integration via a dedicated connector in future releases), but you can create a standalone python shell script that you can execute using the sysExecute operation:

 

<!-- Execute python File -->
<arc:set attr="Name" value="C:\\Temp\\scripts\\Test.py" />
<arc:set attr="Arguments" value="Filename or lookup value" />
<arc:call op="sysExecute" out="sys">
<!-- the stdoutput of the script can be read here -->
sys.sys:output]
</arc:call>

 

The most design friendly way of doing this that is consistent with the mapping tools in CData Arc would be to create a simple table like a SQLite or Derby database that contained the contents of the dictionary, and perform that lookup as part of a Lookup action in Arc. The CSV connector will convert your CSV to XML, and the designed with the Lookup action will allow you to lookup each matching value from the database with its entry in the table (using column 1 as the key). Depending on the size of the dictionary, this is probably the fastest approach, but it would take several steps:

  1. CSV connector to convert the original CSV to XML
  2. A Database connector to lookup the value of each row from the table (here, you can also select the default behavior for a failed match)
  3. An XML Map connector to remove the original lookup column, if undesired
  4. A final CSV connector to convert the result to CSV again

 

 


There is not a way to run a simple snippet of Python code in a standalone ArcScript to create a simple function (we are exploring Python integration via a dedicated connector in future releases), but you can create a standalone python shell script that you can execute using the sysExecute operation:

 

<!-- Execute python File -->
<arc:set attr="Name" value="C:\\Temp\\scripts\\Test.py" />
<arc:set attr="Arguments" value="Filename or lookup value" />
<arc:call op="sysExecute" out="sys">
<!-- the stdoutput of the script can be read here -->
sys.sys:output]
</arc:call>

 

The most design friendly way of doing this that is consistent with the mapping tools in CData Arc would be to create a simple table like a SQLite or Derby database that contained the contents of the dictionary, and perform that lookup as part of a Lookup action in Arc. The CSV connector will convert your CSV to XML, and the designed with the Lookup action will allow you to lookup each matching value from the database with its entry in the table (using column 1 as the key). Depending on the size of the dictionary, this is probably the fastest approach, but it would take several steps:

  1. CSV connector to convert the original CSV to XML
  2. A Database connector to lookup the value of each row from the table (here, you can also select the default behavior for a failed match)
  3. An XML Map connector to remove the original lookup column, if undesired
  4. A final CSV connector to convert the result to CSV again

 

 

Thank you! With your pointer I was able to make it work using the sysExecute command. It feels a bit hacky, but gets the job done. A standalone Python Connector would be nice (run a simple python function or something). I actually had to call python as follows;

<arc:set attr="name" value="C:\\Program Files\\Python312\\python.exe" />
<arc:set attr="arguments" value="C:\\CDataArcScriptData\\translate.py C:\\test.csv C:\\out.csv" />
<arc:call op="sysExecute">
</arc:call>

(my python program takes two params, input and output csv filenames).

 

Is there a way to input these params to the script dynamically (script input and output file names)?


Yes, ArcScript supports nested evaluation of expressions in square braces, so you could do something like: 

 

<!-- use other code to get at these -->
<arc:set attr="tmp.infile" value="C:\\test.csv" />
<arc:set attr="tmp.outfile" value="C:\\out.csv" />

<!-- call your script -->
<arc:set attr="name" value="C:\\Program Files\\Python312\\python.exe" />
<arc:set attr="arguments" value="C:\\CDataArcScriptData\\translate.py ytmp.infile] ]tmp.outfile]" />
<arc:call op="sysExecute">
</arc:call>

 


“we are exploring Python integration via a dedicated connector in future releases” - if you can add a JS Connector too, that would help those, like me, who struggle with ArcScript 😀


Reply