Until dataset is 1 line Split dataset into two halves Take intersection of sets Store intersection in duplicate list Split each dataset into two datasets, and repeat end Open original dataset file Until EOD read line compare to list of known duplicates if in that list if duplicate flag not marked emit line to output mark duplicate as emitted endif else emit line on output endif end #### Sort a copy of the datafile Open sorted copy Until EOD Read line Compare to previous line If line == previous line if line not in duplicate table put line in duplicate table endif else previous line = line endif end Open original data file Until EOD read line if line in duplicate table if duplicate not marked emit line on output mark duplicate line end else emit line on output endif end