Mapping a document with parent-child relationships to CSV (larger files)

  • 24 March 2023
  • 0 replies

Userlevel 4

This is a continuation of the discussion in Mapping a document with parent-child relationships to CSV (small files)

In the previous discussion, relative XPaths were used to map header elements to line values in the XML Map, but this can become impractical if the input XML is very large, as is often the case with large inventory reports and health care claim reports. Loading the XML document model for large files in memory can often require more memory than is available, or at least significantly degrade performance. 


Fortunately, there is a feature of the XML Map that allows for efficient processing of XML files, available in the Advanced tab called XML Streaming


When XML Streaming is enabled, only the subtree for a particular element is available while the XML document is processed, allowing the connector to only load the relevant chunk of XML at a time. 


When doing this, relative XPaths as in our previous example are not available, so how can we access elements that are available at the earlier element? To do this, we will take advantage of the New Loop feature of the XML Map to create virtual loops that we can traverse without necessarily mapping each Foreach loop to a parent node in the destination. 


First, let’s reset our mapping:



This time, we’re going to create a new virtual loop around the OrderLines element - this will allow us to map against the Orders in the source file first. To do this, right-click on the OrderLines element and select New Loop:




This will create a Loop that doesn’t correspond to a new element in the destination:




Now, we can create a Foreach relationship to the Orders element at the Loop, and a nested Foreach to the Items level for the OrderLines element like so:



This loops through each Orders element without creating a new child, but then for each Items a child is added. Between the Foreach on the Orders element and the inner Foreach on the Items element, elements of the Order in the source are available. 


Now, we will take advantage of another feature of the XML Map connector - the ability to store values in memory through the use of a special map item in ArcScript ( Under the Orders loop, create 2 New Script elements, and use a bit of code like this to store the value at an XPath into a value in memory:

<arc:set attr="_map.CustomerName" value="[xpath(CustomerName)]" />

Do this for every element that needs to be read at the Orders loop:



These attributes in the map item can be evaluated in an expression later at the Items loop:



In the final result, the values are stored in the Orders loop when elements at that parent loop are available, and recalled in the child loop. When the Orders loop is reentered again, the stored attributes are recalculated.


Attached to this discussion is the arcflow that is generated above. To see this flow in action, simply click on the More->Create Test Files in the Input tab of the OrderToCSVStreaming connector.

0 replies

Be the first to reply!