ADT: Custom Dictionary with SAP terminology
I can’t abide poor spelling in code comments. I think it shows a lack of care. And then I start wondering whether the same level of care was taken with the code. For this reason, I encourage the use of a spell checker. Very often however, such spell checkers are unaware of the specialized terminology used in the context of SAP development. This leads to a sea of reported misspellings.
In this blog post, I’ll show how I set up my Eclipse ADT environment to use a custom dictionary seeded with SAP terminology.
Eclipse Spell Checking with SAP Terminology
The Eclipse framework supports both spell checking and the use of a custom dictionary. See Preferences-> General -> Editors -> Text Editors. The custom dictionary is a text file with one word on each line.
Sourcing and Extraction
So the task ahead of us is to source some SAP terminology and convert it into a custom dictionary. It so happens that the SAP ECC system contains a table called sterm_text . This table contains around 350,000 language specific entries of SAP terminology.
With a little bit of ABAP, the contents of this table can be downloaded to a text file on the presentation layer for subsequent processing.
Conversion to Dictionary format
The dictionary format requires one word per line. The table sterm_text potentially contains multiple words per terminology entry. This means the lines in the text file containing multiple lines need to be split into multiple lines each containing one word.
To achieve this outcome, I
- use sed to replace spaces with a new line
- use sed to remove any lines containing whitespace only
- use tr to translate upper to lowercase
- use sort to sort the file
- use uniq to remove non-unique entries
Chained together, this looks like:
sed 's/ /\n/g' my_text_file | sed '/^[[:space:]]*$/d' | tr '[:upper:]' '[:lower:]' | sort | uniq > sap_dictionary
In my environment, the download from SAP uses an (SAP) codepage that results in a UTF-8 encoded file. It is important then in Eclipse to specify the appropriate encoding. Select UTF-8.
I hope someone may find this useful.
Ha! Depending on the day, you might hate my code. Sometimes I just throw the comments into my code. Those days are the days I really would rather focus on the code than the comments.
I'll add - we have functional/technical specs. Those are spell checked. Personally the comments in the code help me a lot more as long as it explains why I did something I did. So the important thing is to get it commented in the code. I'd rather have some things not spelled correctly than no comment at all.
The functional/technical specs are at a higher level and include all the different objects used to finish the requirement. I those it is critical to get the spelling right.
Andrew, we must be the soulmates! I absolutely hate spelling errors in the comments and yes, it always makes me wonder if the code is just as sloppy. (Unsurprisingly, it usually is.)
Since I'm not a native English speaker, I rely heavily on the tools like spell checker and it's something I always wish was built into every IDE. Of course, in SAP world we have developers from many different cultures and human languages, so no one expects some high style in the comments but not doing basic spell check is inexcusable. (And I guess it's another reason to use Eclipse since there isn't any spell check in SE80.)
To me, paying attention to comment language goes together with attention to details in general. And I believe that people who strive for excellence even in detail generally do well not just in development but in life in general.