Few days back, I was in a strange situation where availability checks of MSSQL databases were required to be set from Solution Manager which was running on AIX. Request was to monitor availability, even when the applications(SAP) are down.
I tried with some basic setup, but all failed from SAP level. Querying MSSQL database directly was not working in my case.I went through some KBAs and notes.
Like 1601608 and 1458291.
So, Issue now was:
Whenever, we access any remote database from any SAP system, perhaps we should say a central system like SOLMAN, we have to provide ‘Database Shared Library’, abbreviated as DBSL and Microsoft SQL Server SNAC Client Library on that SAP system. This DBSL is used to connect SAP processes, it’s database counterparts to query and manipulate data residing in databases. MS-SQL is a product of Microsoft and it’s supported DBSLs are also provided by them. Microsoft has only made DBSLs for Linux X86_64 and it’s own Windows Operating systems. SNAC is Microsoft’s SQL Server Native Client is a stand-alone data access application programming interface (API) that is used for both OLE DB and ODBC so a mandatory requirement to take advantages of Solution Manager DBA Cockpit and it’s counterparts.
I raised this wtih SAP but nothing worked
Because of no supported DBLS, every attempt to connect would fail –
the connection failed *** ERROR => DlLoadLib()==DLENOACCESS – dlopen(“/usr/sap/<SID>/DVEBMGSXX/exe/dbmssslib.o”) FAILED
Finally, one work around to monitor MS-SQL availability was to listen to a port 1433 which MSSQL instantiates whenever it goes live.
So, we prepared a script and tested on test system, and that worked.
We also checked bi-directional ports 1433 and 1434 to be opened with MS-SQL hosts and our Solution Manager system.
But now, the protocol/program used to do this test was telnet.
Whenever, we started a telnet test from script, a connection was established and seen on MSSQL host.
Also, on our AIX, a new process was created for every test and existed even after terminating the script.
The connection remained active on MSSQL host unless telnet process was removed from OS level.
So, we took things further, we prepared two scripts, 1st script for doing telnet and the other, 2nd script, for killing the processes.
Now, question was how would the script know when to kill the process, and importantly which processes are to be killed.
What we did, every connection from telnet was redirected to a log file.
Now, the 2nd script reads the current system processes, filter all telnet processes on for MSSQL host and 1433 ports, and prefix the output with ‘kill -9’ before the OS process which is now written to a 3rd dynamic-temp 3rd script.
We further added one more test system to observer the behavior. Now, when the 1st script was executed, we made it wrote the output to a DB-SID specific log file, but the next telnet for new system was not executed, as the connection was established for the 1st system and connection didn’t exit.
So, we started the 2nd scripts, using FOR loop based on DB-SIDs, and filter the process, as written above, in the 3rd script and executed it, read the telnet DB-SID specific log file. If we found “connected” word in the log file for this system, that means the DB is available and emailed from OS level to the required users and distribution list.
This all was tested and worked well for us, there were no pending telnet connection on remote databases left.
Taking this further, we used around 20 MSSQL systems and modified the script further which read DB information from a flat file and triggered email when DB was not available. We scheduled scripts in cron for every 10 mins, with 2nd scripts using SLEEP function in loop, giving 1st script few seconds to test connection.
I cannot share that script, but the algorithm is almost here.
Hope, this will help others if they face similar situation in near future.