Removing duplicate lines from a text file

  • 4 August 2023
  • 1 reply
  • 62 views

Userlevel 6
Badge

This script is co-authored in part by Russel with Jersey Post:

https://community.cdata.com/members/russell-jerseypost-190

 

The following code will enumerate the lines of a multiline file, such as a txt or csv, and output a new file with only the unique lines returned:

 

<arc:set attr="file.file" value="[FilePath]" />

<arc:call op="fileReadLine" item="file">
<!-- enumerate the file and add the rows to a collection -->
<arc:check attr="newrows.[file.file:data|md5hash(false)]">
<arc:else>
<arc:set item="newrows" attr="[file.file:data|md5hash(false)]" value="[file.file:data]" />
<arc:set attr="tmp.keys#" value="[file.file:data|md5hash(false)]" />
</arc:else>
</arc:check>
</arc:call>

<!-- repopulate the row data -->
<arc:set attr="output.data">
<arc:enum attr="tmp.keys">[newrows.[_value]]\n</arc:enum>
</arc:set>

<arc:push item="output" />

 

 


1 reply

Userlevel 6
Badge

@russell-jerseypost - Thank you for your help with this one. 

Reply