SAP Cloud for Customer Phone Number Parsing and Formatting
In the SAP Cloud for Customer solution each phone number is stored in two different representations.
There is the single field representation or formatted number where – as the name implies – the whole number is stored in a single field. The UI uses exclusively this representation.
There is also the structured representation of the GDT PhoneNumber consisting of the components
All message based communication, including replication to and from SAP OnPremise systems, uses this representation.
Naturally the existence of two different representations for the same entity necessitates program logic to parse the single field representation into the structured representation and reversely to format the structured representation into the single field representation. In this post I will try to describe the underlying logic behind both.
[Side note: All I am writing about phone numbers applies identically to fax numbers. For simplicity’s sake I will be talking about phone numbers only, though.]
Country Determination from the Postal Address
A fully parsed or formatted phone number contains the country information if at all possible. Hence whenever that information is not provided in the phone number itself, i.e. the country code in the structured representation is empty or invalid or the formatted representation does not start with ‘+’ and a valid international dialing code, the program logic tries to retrieve the country code from the related postal address. In case of an Account this is the main address, in case of a Contact this is the main address of the main Account. This country code will be added to both representations of the phone number.
NANPA or not?
The biggest distinction in both parsing or formatting is whether the phone number in question belongs to the North American Numbering Plan (NANPA) or not. The NANPA consists of the United States, Canada and eighteen other countries primarily in North America, including the Caribbean and the U.S. territories. Not all North American countries participate in the NANPA. They all have in common that they share the international dialing code ‘+1’.
Cloud for Customer considers a number to belong to NANPA if:
- The number is provided in the single field representation and starts with a ‘+1’
- The number is provided in structured representation and the Country Code belongs to a NANPA country
- The number is provided in a way that does not allow to determine the country but the country of the related postal address is a NANPA country.
Depending on whether the phone number belongs to a NANPA country or not the parsing and formatting logic differs significantly.
Non-NANPA numbers and the Trunk Prefix
Many countries outside NANPA have a so called trunk prefix, a single digit – often a ‘0’ – that needs to be prefixed to the phone number whenever a domestic call is made. For example in Germany the trunk prefix is ‘0’, so when calling SAP headquarters from within Germany you dial
06227 7 47474
while when calling from the United States you dial
01149 6227 7 47474
This implies that different structured representations can lead to the same formatted number, in the above example
CountryCode: DE AreaID: 06227 SubscriberID: 7 ExtensionID: 47474
CountryCode: DE AreaID: 6227 SubscriberID: 7 ExtensionID: 47474
both will result in the formatted number +49 (6227) 7-47474. Hence when parsing the formatted number a decision has to be made whether the resulting structured representation should include the trunk prefix in the area id or not.
In SAP Cloud for Customer the parsing routine will include the trunk prefix in the structured representation.
Phone Number Parsing
If a phone number is provided in single field representation it has to be parsed to the structured representation, potentially altering the single field representation as well.
As the first step all characters except digits, letters, ‘(‘, ‘)’, ‘/’ and ‘-‘ are removed from the number.
The next step is to determine the country code.
This country can only be determined if the phone number starts with a ‘+’. If the leading ‘+’ is missing the program has no way to tell whether the user merely omitted the ‘+’, whether the user entered an international dialing prefix instead of the ‘+’ or whether the user simply entered a domestic number only. For example a phone number starting with ‘01149’ could be a number dialed to call Germany from the United States, or it could be a domestic German phone number with the AreaID ‘01149’. Hence when a phone number in single field representation has no leading ‘+’ the program logic will always assume it is a domestic number and try to determine the country code from the related postal address.
If the phone number starts with a leading ‘+’ then the following digits are used to determine the country code from the international dialing code. For example a phone number starting with ‘+33’ will be considered French. In case multiple countries share the same international dialing code – like e.g. all NANPA countries or Finland and the Aaland Islands – the next digits, presumably the area id, are used to further narrow down the country code if possible.
Once the country code is determined the logic branches depending on whether the country belongs to NANPA or not.
In case of a NANPA phone number we make use of the fact that NANPA phone numbers follow a very rigid structure of AAA-CCC-NNNN. Hence the first three digits form the Area ID, while the next seven form the subscriber ID, with a ‘-‘ between the third and fourth. All special characters will be ignored and removed.
If the number has more than ten digits, everything after the tenth digit is considered an extension. The program logic tries to determine whether an extension prefix like ‘x’ or ‘ext:’ is provided. Everything but the extension prefix forms the Extension ID. The currently recognized extension prefixes (case insensitive) are ‘e’, ‘x’, ‘ext’, and ‘extension’, followed by an ‘:’ or not.
In case of phone numbers that do not belong to a NANPA country the phone number structure is not as rigid. For example in Germany the length of the Area ID can be 3, 4 or 5. Hence the parsing logic has to work with whatever information it can get from the user input. So it looks for separators.
A separator is one of the characters ‘(‘, ‘)’, ‘/’ or ‘-‘ that does not appear before the first digit. So a leading ‘(‘ is not considered a separator. Spaces are also not considered separators as in some countries it is a common notation to group phone number digits in groups of two even within the individual phone number elements.
Only the first two separators are relevant. The number is then distributed among Area ID, Subscriber ID and Extension ID according to the following logic:
- If no separator can be found, everything is put into the Subscriber ID
- If one separator is found that is not a ‘-‘ then the part before the separator becomes the Area ID and the part after the separator becomes the Subscriber ID.
- If one separator is found that is a ‘-‘ then the part before the separator becomes the Subscriber ID and the part after the separator becomes the Extension ID.
- If two or more separators are found then the part before the first separator becomes the Area ID, the part between first and second separator becomes the Subscriber ID and everything after the second separator becomes the Extension ID
Once the phone number has been parsed this way, the parsed result will then be formatted according to the formatting logic to replace the originally provided phone number. This is to ensure uniform formatting in the system.
|Initial Phone Number||Country Code||Area ID||Subscriber ID||Extension ID||Final Phone Number|
|+1 650-849-4000||US||650||849-4000||+1 650-849-4000|
|65084940005555||US||650||849-4000||5555||+1 650-849-4000 ext: 5555|
|+1 650-849-4000 x5555||US||650||849-4000||5555||+1 650-849-4000 ext: 5555|
|+1 650-849-4000 myext 5555||US||650||849-4000||MYEXT 5555||+1 650-849-4000 ext: MYEXT 5555|
|+49 (6227) 7-47474||DE||06227||7||47474||+49 (6227) 7-47474|
|06227 / 7-47474||DE||06227||7||47474||+49 (6227) 7-47474|
|+49 0 62 27 / 74 74 74||DE||0 62 27||74 74 74||+49 (62 27) 74 74 74|
|+49 62 – 27 – 74 – 74 – 74||DE||062||27||74 – 74 – 74||+49 (62) 27-74 – 74 – 74|
In the third example it is assumed that country code ‘US’ is maintained in the related address.
In the seventh example it is assumed that country code ‘DE’ is maintained in the related address.
Phone number formatting
If a phone number is provided in the structured representation it has to be formatted. During that process the structured representation might change as well.
The first step again is to remove all characters except digits, letters, ‘(‘, ‘)’, ‘/’ and ‘-‘ are removed from the number.
As the second step, if the Country Code is empty, it will be provided from the related postal address.
This determination of the Country Code from the postal address can be suppressed in the Business Configuration Scoping under Built-in Services and Support -> Business Environment -> Addresses and Languages -> Phone Number Country Defaulting.
From then on, again, the logic branches depending on whether the country belongs to NANPA or not.
For NANPA numbers the Area ID, Subscriber ID and Extension ID are concatenated into a single string. All characters except digits and letters are removed and the result is condensed. Then if applicaple a ‘-‘ is inserted between the third and fourth as well as between the sixth and seventh characters. If applicable an ‘ ext: ‘ is inserted between the tenth and eleventh character. Finally the end result is prefixed with a ‘+1’.
For non NANPA numbers a first draft is created in the form
+<international dialing code> (<Area ID>) Subscriber ID – Extension ID
That one is then parsed into structured form and formatted again for a second and final time, following the same above schema. The latter is to take care of cases where due to various reasons like incomplete or incorrect mapping rules the components were not entered into the correct fields initially during message inbound processing. The most frequent case for this is the whole phone number being maintained in the Subscriber ID field.
|Country Code||Area ID||Subscriber ID||Extension ID||Formatted Number||Final Country Code||Final Area ID||Final Subscriber ID||Final Extension ID|
|US||650||849-4000||5555||+1 650-849-4000 ext: 5555||US||650||849-4000||5555|
|US||65084940005555||+1 650-849-4000 ext: 5555||US||650||849-4000||5555|
|DE||06227||7||47474||+49 (6227) 7-47474||DE||06227||7||47474|
|DE||6227/7||47474||+49 (6227) 7-47474||DE||6227/7||47474|
|+49 6227/7-47474||+49 (6227) 7-47474||DE||06227||7||47474|
In the last example we assume that the country code ‘DE’ was maintained in the related postal address.
Since the structured representation is not relevant for the UI display of the phone number the database will only be updated with the final structured representation resulting from the formatting routine if that final representation differs from the original input in non cosmetic ways. If the difference is merely cosmetic the structured representation will be kept exactly as it was passed to the address.
This is to minimize the changes performed in a bidirectional replication scenario.
The delta is considered cosmetic if merely special characters were changed or digits were moved from one component field to another. As of C4C release 1902 the addition of the trunk prefix is also considered a cosmetic change. In the above table of examples the ones highlighted in yellow lead to a non cosmetic change.