Remove Duplicate Rows From A Text File Using Powershell
If your text file is already sorted... then removing duplicates is very easy.
PS:\> gc $filename | get-unique > $newfileName
(But remember, the Get-Unique
command only works on sorted data!)
If the file's content is not sorted, and the final order of the lines is unimportant, then it's also easy....
Sort it -- and then use Get-Unique
gc $filename | sort | get-unique > $newfileName
(You now end up with a file that is sorted, and where every line is unique)
However... the case that one bumps into is always the tricky case...
If the file data is not 'sorted', but the order *is* important... then it's a little trickier. I've got an approach... let's turn it into a solution.
Remove Duplicate Rows From A Text File Using Powershell... unsorted file, where order is important
I'm going to add each line to a hash table.
But before adding it -- i'll check if the line is already in the hash table.
If it's not there yet -- then I'll send that line into the new file. Here's an example:
PS H:\> $hash = @{} # define a new empty hash table PS H:\> gc c:\rawlist.txt | >> %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} > >> c:\newlist.txt
I test it out... given this input:
Apple Dog Dog Carrot Banana Fun Dog Apple Egg Carrot Egg
I get this output...
Apple Dog Carrot Banana Fun Egg
Okay... i thought that was going to be really hard. Huh.
One more thing to do... see if I can comment this a little better...
PS H:\> $hash = @{} # Define an empty hashtable PS H:\> gc c:\rawlist.txt | # Send the content of the file into the pipeline... >> % { # For each object in the pipeline... >> # note '%' is an alias of 'foreach-object' >> if ($hash.$_ -eq $null) { # if that line is not a key in our hashtable... >> # note -eq means 'equals' >> # note $_ means 'the data we got from the pipe' >> # note $null means NULL >> $_ # ... send that line further along the pipe >> }; >> $hash.$_ = 1 # Add that line to the hash (so we won't send it again) >> # note that the value isn't important here, >> # only the key. ;-) >> } > c:\newlist.txt # finally... redirect the pipe into a new file. >>
By the way, my tools NimbleText and NimbleSET make removing duplicates from a list (or a file) even easier.
Next → ← PreviousMy book "Choose Your First Product" is available now.
It gives you 4 easy steps to find and validate a humble product idea.