Sounds like teen spirit

eddy_declercq · ‎04-22-2008

For some reason, the ability to use the Soundex algorithm during SQL queries in SAP is little known (see also this, rather basic, help page). It seems that SAP doesn't want us to know that it exists anyway, that in favour of the TREX and other search products. Yes, they are far more flexible and advanced than the simple Soundex stuff. But one sometimes doesn't need more in order to find someone back.

For those of you who don't know what I'm talking about, Soundex is an algorithm that enables you to do a phonetic search. That comes handy if you want to look up eg a phone number of someone where you don't have a clue how the name is spelled. My surname is a perfect example of that. I live for 40 years now with misspelled family and even first names.

Anyway, the best way to prevent you to miss names when you search is to search phonetical. It isn't just as easy as eg replacing all occurences of PH with F. No, bad luck if you were thinking that. There is a special algorithm for this.
Guess what? That algorithm was patented by Robert C. Russell (with Margaret O'Dell as co-author) in 1918 already under the name 'Index'. He patented between 1917 and 1924 several versions of it. It's only later that one started to call it Soundex. Soundex is based on the six phonetic classifications of human speech sounds (bilabial, labiodental, dental, alveolar, velar, and glottal), based on how you put your lips and tongue in order to make sounds. The original phonetics are rather basic and were extended during the years. In the ABAP code below, you'll find the most common algorithm used with some extensions:

you can select if you want to use the original algorithm or the more common one. The latter is usually referred to as consensus. The code is written as such that you add extra algorithms like Metaphone
you can choose the language. I've provided English and Dutch for obvious reasons, but I encourage you to add more languages and share it
you can select the length of you output. Soundex has a standard output length of 4, but isn't sufficient enough. My surname needs 5 in order to be accurate.

Without further ado, here's the code for a function module. You can also create it as a class method if you like.

FUNCTION Z_SOUNDEX.
*"----------------------------------------------------------------------
*"*"Lokale interface:
*"  IMPORTING
*"     VALUE(INPUTSTR) TYPE  CHAR255
*"     VALUE(SOUNDEXLEN) TYPE  I DEFAULT 4
*"     VALUE(LANG) TYPE  LANG DEFAULT 'N'
*"     VALUE(CONSENSUS) TYPE  I DEFAULT 1
*"  EXPORTING
*"     REFERENCE(SOUNDEX) TYPE  STRING
*"----------------------------------------------------------------------

  data: tmp_soundex type char255, tmp_str type string, curchar type c, lastchar type c,
        firstletter TYPE c, slen type i, pos TYPE i.

* the desired length needs to be within the limits
  if soundexlen > 10.
    soundexlen = 10.
  endif.
  if soundex<4.
    soundexlen = 4.
  ENDIF.
  tmp_soundex = inputstr.
* all needs to be upper case and only standard alphanum
  TRANSLATE tmp_soundex TO UPPER CASE.
  REPLACE ALL OCCURRENCES OF REGEX '[^A-Z]' in tmp_soundex with space.
  condense tmp_soundex.

* do we use the standard or the basic algorithm?
  case consensus.
    when 1.
      case lang.
* English
        when 'E'.
          REPLACE ALL OCCURRENCES OF 'DG' in tmp_soundex with 'G'.
          REPLACE ALL OCCURRENCES OF 'GH' in tmp_soundex with 'H'.
          REPLACE ALL OCCURRENCES OF 'GN' in tmp_soundex with 'N'.
          REPLACE ALL OCCURRENCES OF 'KN' in tmp_soundex with 'N'.
          REPLACE ALL OCCURRENCES OF 'PH' in tmp_soundex with 'F'.
          REPLACE ALL OCCURRENCES OF REGEX 'MP([STZ])' in tmp_soundex with 'M$1'.
          REPLACE ALL OCCURRENCES OF REGEX '^PS' in tmp_soundex with 'S'.
          REPLACE ALL OCCURRENCES OF REGEX '^PF' in tmp_soundex with 'F'.
          REPLACE ALL OCCURRENCES OF REGEX 'MB' in tmp_soundex with 'M'.
          REPLACE ALL OCCURRENCES OF REGEX 'TCH' in tmp_soundex with 'CH'.
* Dutch
        when 'N'.
          REPLACE ALL OCCURRENCES OF 'QU' in tmp_soundex with 'KW'.
          REPLACE ALL OCCURRENCES OF 'SCH' in tmp_soundex with 'SEE'.
          REPLACE ALL OCCURRENCES OF 'KS' in tmp_soundex with 'XX'.
          REPLACE ALL OCCURRENCES OF 'KX' in tmp_soundex with 'XX'.
          REPLACE ALL OCCURRENCES OF 'KC' in tmp_soundex with 'KK'.
          REPLACE ALL OCCURRENCES OF 'CK' in tmp_soundex with 'KK'.
          REPLACE ALL OCCURRENCES OF 'DT' in tmp_soundex with 'TT'.
          REPLACE ALL OCCURRENCES OF 'TD' in tmp_soundex with 'TT'.
          REPLACE ALL OCCURRENCES OF 'CH' in tmp_soundex with 'GG'.
          REPLACE ALL OCCURRENCES OF 'SZ' in tmp_soundex with 'SS'.
          REPLACE ALL OCCURRENCES OF 'IJ' in tmp_soundex with 'YY'.
      ENDCASE.
  ENDCASE.

* We need to maintain the first char
  firstletter = tmp_soundex(1).

  case consensus.
    when 1.
      case lang.
        when 'E'.
* we don't want to translate the H or W if it's the first char
          if firstletter eq 'H' or firstletter eq 'W'.
            tmp_soundex(1) = '-'.
          endif.
* all other occurences will be translated
          REPLACE ALL OCCURRENCES OF REGEX '[HW]' in tmp_soundex with '.'.
        when 'N'.
      ENDCASE.
  ENDCASE.

* all the rest will be translated in digits according to the basis algorithm
  case lang.
    when 'E'.
      REPLACE ALL OCCURRENCES OF REGEX '[AEIOUYHW]' in tmp_soundex with '0'.
      REPLACE ALL OCCURRENCES OF REGEX '[BPFV]' in tmp_soundex with '1'.
      REPLACE ALL OCCURRENCES OF REGEX '[CSGJKQXZ]' in tmp_soundex with '2'.
      REPLACE ALL OCCURRENCES OF REGEX '[DT]' in tmp_soundex with '3'.
      REPLACE ALL OCCURRENCES OF REGEX '[L]' in tmp_soundex with '4'.
      REPLACE ALL OCCURRENCES OF REGEX '[MN]' in tmp_soundex with '5'.
      REPLACE ALL OCCURRENCES OF REGEX '[R]' in tmp_soundex with '6'.
    when 'N'.
      REPLACE ALL OCCURRENCES OF REGEX '[AEHIOUJY]' in tmp_soundex with '0'.
      REPLACE ALL OCCURRENCES OF REGEX '[BP]' in tmp_soundex with '1'.
      REPLACE ALL OCCURRENCES OF REGEX '[CGSKZQ]' in tmp_soundex with '2'.
      REPLACE ALL OCCURRENCES OF REGEX '[DT]' in tmp_soundex with '3'.
      REPLACE ALL OCCURRENCES OF REGEX '[FVW]' in tmp_soundex with '4'.
      REPLACE ALL OCCURRENCES OF REGEX '[L]' in tmp_soundex with '5'.
      REPLACE ALL OCCURRENCES OF REGEX '[MN]' in tmp_soundex with '6'.
      REPLACE ALL OCCURRENCES OF REGEX '[R]' in tmp_soundex with '7'.
      REPLACE ALL OCCURRENCES OF REGEX '[X]' in tmp_soundex with '8'.
  ENDCASE.

* the dot will be deleted
  case consensus.
    when 1.
      case lang.
        when 'E'.
          REPLACE ALL OCCURRENCES OF '.' in tmp_soundex with ''.
        when 'N'.
      ENDCASE.
  ENDCASE.

* Next step is to delete all the equal adjacent digits
  slen = strlen( tmp_soundex ).
  clear lastchar.
  clear tmp_str.

  do slen times.
    pos = sy-index - 1.
    curchar = tmp_soundex+pos(1).
    if curchar ne lastchar.
      CONCATENATE tmp_str curchar INTO tmp_str.
      lastchar = curchar.
    endif.
  ENDDO.
  slen = strlen( tmp_str ) - 1.
  soundex = tmp_str+1(slen).
* remove all spaces and zeroes
  REPLACE ALL OCCURRENCES OF REGEX '\s' in soundex with ''.
  REPLACE ALL OCCURRENCES OF '0' in soundex with ''.
* Add zeroes
  concatenate firstletter soundex '0000000000' into soundex.
* And cut to desired length
  soundex = soundex(soundexlen).

ENDFUNCTION.

Sounds like teen spirit

Are you there, SAP? It's me, Jelena

Integration Point of MM-FI-SD in SAP ERP

SAP Project System - A ready Reference ( Part 1 )