Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
JimSpath
Active Contributor
Just because I say something doesn't mean I can always complete the tale. For example, asking about a service outage or pending repair might never be answered, particularly around a brief event like a live webcast or meetup. The detective in me wanted to know why an entry for personal gender identity was a requirement at all, much less having binary choices.

Before diving into a specific case, I want to review the thought process a database developer would go through in building up a record structure related to a person, say, for admission to a concert. They might think recording gender would make sense. Or they were told to do so. If they just go ahead unthinking what effect collecting and storing this information could be,  we'll look at presently.

Here's a snippet from a table definition:
create table x (
--
gender char(1),
--
);

In this example, of course, there is no requirement of an entry, much less the often-used and misused "M/F", which I'll leave to the junior detectives to research for their own particular SQL flavor/flavour. Here's a tip (from a MySQL schema for image storage):
--
color char(1) not null default 'Y' check (
active in ('Y', 'N')
)
,
--

Now suppose we are sent an export (like a PDF from a freedom-of-information-act request obfuscation) that has full words, such as "MALE", "FEMALE", or, possibly "UNKNOWN". We'll assume a possibility no value is entered there, for kicks. So we might need a temporary table to keep this evidence chain before analyzing it further. You know, check the logs, I mean, locks.
create table y (
--
gender char(10),
--
);

Either of these data definitions would allow SQL to count people, whether they wanted to be or not. As an old school DBA, I found a few code snippets online (links below) to help illustrate the thought process further.

 
I am trying to get the total of males, females and unknowns in each agerange.

 

Here is one of the suggested query logic:

 
 10  count( case when gender = 'M' then 1 end ) male,
11 count( case when gender = 'F' then 1 end ) female
12 from t

 

Again, I will omit analysis of this, other than to show subsequent responses that should trigger a clue.

 
    count( case when gender = 'M' then 0
when gender = 'F' then 0
else 1 end) unknown

 

This implies counting entries other than M or F. It also allows a NULL value to be considered. The database could have a rule requiring an entry in the column (NOT NULL), or it could allow a blank (again, assuming a single character width, and a LATIN-1 alphabet, and English words).

Any issues?

Suppose this isn't an English-reading user? Will we need to decode the local words for gender and store them in a table somewhere? Or will we put the rules at the data dictionary level and have the frontend application developers work out the rules per our published specs.

Does the application layer coincide with the underlying database?

To encode "no entry", a colleague used this conversion (Python😞
          @six = isnull(gender, ''),

The intent is to have the database record a null rather than a zero length string, if that computes. What is unintended is the mixing of six and gender, from a declaration level. But that's beside the main point of this investigation. Please note I altered the Python variable name string per SAP community blog post rules, which I am endeavoring to support in the fairest way I can given the sensitivity of personally identifying information and characteristics.

Data Disablement


What happens if the database designer doesn't require an entry, instead removing the column entirely, as opposed to allowing nulls, blanks, unknown, or free form entry? That's fine if the decision is early enough that user-facing components don't include that column. Otherwise, not just the data dictionary needs adjustment, possibly a bunch of code and/or configuration needs to be as well.
work=> 
create table demo (
person_id int,
gender char(1),
primary key(person_id)
);
CREATE TABLE

work=> alter table demo drop column gender;
ALTER TABLE

 

After the column is dropped, you could get hit with this error (postgreSQL specific, but any database will fail in some way):
pq_driver: [PGRES_FATAL_ERROR]ERROR:  
column "gender" of relation "demo" does not exist
LINE 1: INSERT INTO "work"."demo" ( "gender","person_id") VALUES ( '...
^
(caused by statement
'INSERT INTO "work"."demo" ( "gender","person_id") VALUES ( 'M','3')')

 

The Twist


 


Tell Us About Yourself



The Tweet(s)


 


SAP ID MR MS



I registered for @SAPTechEd  2021. The *required* binary gender/salutation prefix choice was a little cringe-worthy. I'd prefer an option to leave this out.


Hey Jim Thank You



Hey, Jim. Thank you for bringing this to our attention. I shared your feedback with the team, and am happy to report the field has been removed completely from the SAP TechEd registration form. Thanks! – Dianna, SAP Social | ‘cc @JelenaAtLarge @se38 @MatthSteffen

 
Thank you, Dianna! This is good to hear. Perhaps this field could also be removed from all other SAP registration forms? More info here: blogs.sap.com/2019/06/12/dea

(time passes)
Hi Jim, hi @JelenaAtLarge, hi all, The team have now updated go.support.sap.com/minisap/ too. The Project Manager needs to keep the field mandatory for tech reasons. But we recognize that this is wrong, and can cause distress, eg to GNC / NB users. So: The radio buttons are now...

 

Here, amidst much else going on, we have another name for the Gender column: Salutation. Which implies that the entity capturing this data wants to later say, "Hi, Mr. Spath" (as the case may be). Fine, I guess, if that's important in your data design universe.

And I cannot credit Jelena enough for being ahead of this issue. We often hit the same topics at different frequencies, which is pretty cool I think. As Jelena said, "... when I see a salutation field on a form my reaction is: none of your business.?"

The short time between my post (October 13, 2021), and the SAP team response (October 14) is phenomenal in my experience with enterprise application software event planning. I expect the conversations may have already occurred, given Jelena posting the same complaint over 2 years earlier, so it could be as easy as pushing a draft fix out.

Issues and Caveats


First, when you're in the design chain pipeline, keep your wits about you, and if there is a decision that doesn't seem right, ask questions. I had a volunteer role copying data from an organizations main repository to a downstream reporting system. The upstream include birth date, but I questioned whether we would need to use that for anything. The only response was "maybe we want to see what young/old people want" which was too slim for me to commit, so we left that out of the clone. I felt better than no one could grab that level of personal information from a repository I was responsible for.

Second, as a user, you should speak up when a design issue like the above presents itself. There are definitely legitimate reasons for personal data capture such as gender, perhaps for health service delivery (I'm being vague here), and equity ("are females still earning much less than males in this organization").

Third, as the customer service link in the system, you might empathize with the application users, as julie.plummer found in the tweets above. "Hi everyone, I agree it's a good point" is what a caller wants to hear.

Deeply hard-coded values will bite you later. I took this "free sample" (SOS: "Son Of Service"; see links below) and used it for a little project. The more elegant design approach would be to put these enumerated/allowed values into another table as a key constraint. Or a trigger at the application level as long as you can ensure that's the only access method.

(MS-SQL code):
ALTER TABLE [dbo].[Strings]  WITH CHECK ADD CHECK (
(
[Type_]='phone_number'
OR
[Type_]='email_address'
OR
[Type_]='service'
OR
[Type_]='work'
OR
[Type_]='role'
OR
[Type_]='skill'
OR
[Type_]='relationship'
OR
[Type_]='extended'
)
)
GO

Or (in MySQL):
--
type enum('extended', 'relationship', 'skill', 'work', 'phone_number'),
--

(Slight aside: I could not name the column defined above as "type" as that's a reserved word, so used "type_".

Commit Statement


If the database design is well-baked and difficult to unroll, you'll hear sentiments such as:

 
The Project Manager needs to keep the field mandatory for tech reasons.

 

No. This is _not_ a technical decision, in my opinion. It's financial (time to repair, down time estimates, testing costs) and a social engineering path one should recognize.
"Yes / No / No choice" No Choice is the *default*.

The workaround, a good compromise between doing nothing, and doing way too much work, is to hard-code a NULL value as the text value "No choice".

LINKS



fin: I have seen 2 productions of Dickens' A Christmas Carol this month, and some of the seasonal hopefulness shines through, possibly in my words, if not in spirit.

 
  • SAP Managed Tags: