Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
joachimrees1
Active Contributor


Last time I made the case that it's a good idea to check your input file prior to a migration. I wrote about how you can use the uniq command to check for duplicate lines in your Iinput files in Using GNU tools to quickly check your input files - duplicates lines.

 

But maybe you don't want to check the complete lines, but only check if certain field combinations ( e.g. key-fields!) appear more than once.

 

Lets say you have a file like this:

 

 

ABC;XYZ;MATNR;DBBD;LGORT;SOMETHIG_ELSE

 

12121;13213;MAT12;dfhsf;1000;sdfsdjhf

1sad21;13213;MAT12;dfhsf;1000;sdfsdjsadhf

12121;13213;MAT12;;1200;sdfsdjhf

121;13213;MAT45;;1200;sdfsdjhf

 

 

-> each line is clearly unique, however, if MATNR and LGORT are key fields, then we have a problem.

 

We can find out with the help of awk (I'm using gawk):

 

 

cat [filename] | gawk -F ; "{print $3, $5 }"

 

-> it reads the file, interpreting “;” as the field-separator (-F;), the prints the 3rd and 5th field ($3 $5), separated by the "output field separator" (OFS), which by default is space (,).

 

So the output in the example is:

MAT12 1000

MAT12 1000

MAT12 1200

MAT45 1200

 

 

-> as we now only have the key-values we wanted to compare, we can easily pipe it into the uniq -d we already know to see if there are any duplicates.

(and as this might be a lot of lines, we just count them with wc -l)

 

So here is our one-liner for this task:

 

cat [filename] | gawk -F ; "{print $3, $5 }" | | uniq - d | wc –l

 

(-> if it’s 0, everything is fine!)

Labels in this area