Skip to main content

Removing duplicate lines from a text file

  • August 4, 2023
  • 1 reply
  • 97 views

James B
Forum|alt.badge.img

This script is co-authored in part by Russel with Jersey Post:

https://community.cdata.com/members/russell-jerseypost-190

 

The following code will enumerate the lines of a multiline file, such as a txt or csv, and output a new file with only the unique lines returned:

 

<arc:set attr="file.file" value="[FilePath]" />

<arc:call op="fileReadLine" item="file">
<!-- enumerate the file and add the rows to a collection -->
<arc:check attr="newrows.[file.file:data|md5hash(false)]">
<arc:else>
<arc:set item="newrows" attr="[file.file:data|md5hash(false)]" value="[file.file:data]" />
<arc:set attr="tmp.keys#" value="[file.file:data|md5hash(false)]" />
</arc:else>
</arc:check>
</arc:call>

<!-- repopulate the row data -->
<arc:set attr="output.data">
<arc:enum attr="tmp.keys">[newrows.[_value]]\n</arc:enum>
</arc:set>

<arc:push item="output" />

 

 

This topic has been closed for replies.

1 reply

James B
Forum|alt.badge.img
  • Author
  • Employee
  • August 4, 2023

@russell-jerseypost - Thank you for your help with this one.