ADT: Custom Dictionary with SAP terminology
I can’t abide poor spelling in code comments. I think it shows a lack of care. And then I start wondering whether the same level of care was taken with the code. For this reason, I encourage the use of a spell checker. Very often however, such spell checkers are unaware of the specialized terminology used in the context of SAP development. This leads to a sea of reported misspellings.
In this blog post, I’ll show how I set up my Eclipse ADT environment to use a custom dictionary seeded with SAP terminology.
Eclipse Spell Checking with SAP Terminology
The Eclipse framework supports both spell checking and the use of a custom dictionary. See Preferences-> General -> Editors -> Text Editors. The custom dictionary is a text file with one word on each line.
Sourcing and Extraction
So the task ahead of us is to source some SAP terminology and convert it into a custom dictionary. It so happens that the SAP ECC system contains a table called sterm_text . This table contains around 350,000 language specific entries of SAP terminology.
With a little bit of ABAP, the contents of this table can be downloaded to a text file on the presentation layer for subsequent processing.
Conversion to Dictionary format
The dictionary format requires one word per line. The table sterm_text potentially contains multiple words per terminology entry. This means the lines in the text file containing multiple lines need to be split into multiple lines each containing one word.
To achieve this outcome, I
- use sed to replace spaces with a new line
- use sed to remove any lines containing whitespace only
- use tr to translate upper to lowercase
- use sort to sort the file
- use uniq to remove non-unique entries
Chained together, this looks like:
sed 's/ /\n/g' my_text_file | sed '/^[[:space:]]*$/d' | tr '[:upper:]' '[:lower:]' | sort | uniq > sap_dictionary
In my environment, the download from SAP uses an (SAP) codepage that results in a UTF-8 encoded file. It is important then in Eclipse to specify the appropriate encoding. Select UTF-8.
I hope someone may find this useful.