Additional Blogs by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos

Further to my last blog on a Perl one liner to filter escaped quote characters from a CSV before import to BI, I have been finding more ways to take advantage of the power and speed of Perl scripting.  However, all is not rosy, even when you've got past the inpenetrability of perl syntax.

The peculiarity I've come across today is changing case of strings.  Perl provides the functions lc and uc (change to lower case and upper case respectively) for this purpose.  In my script to import one log format and export another, I thought converting all the fields to uppercase would be the simplest task.  But during testing it seemed that the incoming utf-8 file contained some accented characters which uc wasn't converting.

Fortunately Perl provides us with more than way to skin this paritcular cat.

The first step was to write a script to confirm my suspiscion.

for (32..255) {
     my $c = chr($_);
     my $uc = uc $c;
     my $lc = uc $c;
     if ( $uc eq $lc ) {
          print "${_}: ${c}\t${uc}\t${lc}\n";
     }
}

This not only confirmed my thoughts that uc wasn't converting the accented characters, it gave me a good list of those characters to refer to.

The command that comes to our rescue is tr/// or transliterate.  In English, this means it swaps all characters it finds in the first argument to tr with the corresponding character from the second argument.  So given the command tr/abc/123/, any leter a in the string tr operates on will be replaced with a 1, b will be replaced with 2 and c with 3.  Populating a tr call with all the characters we found that weren't being converted to upper case in our last test (and the standard A-Z,a-z for good measure) we can build our very own upper casing subroutine.

sub upper($) {
     $_ = shift;
     tr/abcdefghijklmnopqrstuvwxyzšœžàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿþ/ABCDEFGHIJKLMNOPQRSTUVWXYZŠŒŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝŸÞ/;
     return uc $_;
}

So, without having to resort to playing with the locale settings or anything else I fail to grasp, we have a multilingual replacement for uc.