Skip to Content
Author's profile photo Joachim Rees

GNU Tools for checking input files: using awk to check for duplicate keys

Last time I made the case that it’s a good idea to check your input file prior to a migration. I wrote about how you can use the uniq command to check for duplicate lines in your Iinput files in Using GNU tools to quickly check your input files – duplicates lines.


But maybe you don’t want to check the complete lines, but only check if certain field combinations ( e.g. key-fields!) appear more than once.


Lets say you have a file like this:











-> each line is clearly unique, however, if MATNR and LGORT are key fields, then we have a problem.


We can find out with the help of awk (I’m using gawk):



cat [filename] | gawk -F ; “{print $3, $5 }”


-> it reads the file, interpreting “;” as the field-separator (-F;), the prints the 3rd and 5th field ($3 $5), separated by the “output field separator” (OFS), which by default is space (,).


So the output in the example is:

MAT12 1000

MAT12 1000

MAT12 1200

MAT45 1200



-> as we now only have the key-values we wanted to compare, we can easily pipe it into the uniq -d we already know to see if there are any duplicates.

(and as this might be a lot of lines, we just count them with wc -l)


So here is our one-liner for this task:


cat [filename] | gawk -F ; “{print $3, $5 }” | | uniq – d | wc –l


(-> if it’s 0, everything is fine!)

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.