The primary purpose of this blog is to demonstrate how a skilled person can solve problems quickly and efficiently than by several groups. I don’t suggest one person could solve all problems in all situations. The point I’m trying to make is that a Basis Administrator should explore all opportunities to gain experience in non-SAP layers. Opportunities to gain knowledge in other layers used to be easy in the past – 15+ years ago. Now for more than one reason, I see more and more Basis Administrators focusing on just SAP-ABAP and Java layers. They don’t even consider the importance of ABAP/programming knowledge to solve performance issues. With skills just in SAP layer, Basis Administrators’s scalability-to be able to support large and complex landscapes- would become an issue.
Recently we converted Oracle/Solaris based BW system to Unicode. Our testing in non-P systems didn’t reveal any serious issues.So after working more than 48 hours non-stop(well almost), we delivered the system ahead of time to BW team. They started testing. Two hours into testing, we experienced a serious error. See the screenshot below. Oracle started showing ORA-00600 errors. Anyone working in Oracle would know this error is normally a serious error. Since we didn’t run into this error in non-P systems, we suspected a very serious situation. We were on a conference call while this issue was discovered. Here is one of many alerts we received from Remedy system:
Immediately I logged into Solaris server to see if there were any other useful messages in Oracle’s alert.log. I was really panicked at this point for more than one reason:
- Already worked 48+ hours (almost) non-stop.
- ORA-00600 error is normally very serious – anyone who has worked in Oracle would know.
- 3 other system were converted, didn’t run into this issue.
Oracle log showed additional entries indicating there were unusable indexes with name SAPSR3.”/BIC/B0003420000″ etc. Immediately after seeing this error, I felt a little bit better, told myself – no need for panic. Any table in BW with names such as “/BIC/B000*” are either PSA or Change Log Tables. These tables are normally temporary tables(PSA is definitely temporary tables, change logs-if used as designed-are also temporary tables holding records for slightly longer periods depending on business needs. Since the purpose of this blog is not about CL or PSA, I don’t want to get into the details). They’re not accessed directly by the users. I also knew weekly PSA data deletion jobs were running when I was monitoring the system after Unicode conversion. I also knew these tables are partitioned. And there is a bug with ~900 index of F table of an aggregate. Somehow I thought the error I see now is related to ~900 bug. What is that bug? Read Lars’s blog: Question to SAP Support “Why is partition pruning not working in my BW on Oracle?”.
As I explained, PSA/Change log tables are partitioned. Underlying indexes could either be global index or local to the partition. For those who don’t understand partitioning, I’m planning to write blogs on partitioning. The indexes should be created LOCAL for a partitioned table if you’re planning to drop one/more partitions. Otherwise if a partition is dropped, the global index would become unusable unless the index is rebuilt after dropping the partition. I believed this was exactly what was happening. Temporarily I guessed the error could be fixed by rebuilding the index using SE14. However one issue with SE14 is that the primary index can’t be dropped. So I performed two things:
- Checked whether the table was PSA or change log. You can check this in either RSTSODS table or in the description of the table in SE11 for the table shown in alert.log. PSA’s table description would be something like “Transfer Structure for…”. SE11 showed “Transfer Structure for..”. This is a great news because change log tables are huge in this customer’s site. Rebuilding index for CL would take time whereas PSA index would take no time because PSA deletion jobs run regularly every week.
- In SQLPLUS, I dropped the index and rebuilt it in SE14.
What happened? Why we didn’t run into this issue in other non-P systems? Did you guess it? Well, this functionality was not tested. We ran into this error in Development system two weeks after going live.(Nothing new I guess). Since BW system uses partitioning feature more than non-BW system(almost zero in non-BW systems unless performed by the customer), we should use SMIGR_CREATE_DDL to generate Data Definition Language(DDL) statements for the objects in the database. I suspected a bug in SMIGR_CREATE_DDL. I checked the output of *.SQL files generated by that script. I couldn’t figure out. I generated those files with older version SMIGR_CREATE_DDL. I ran diff command between old and new SQL files. I saw LOCAL was missing in newer files used by Unicode conversion job. I checked marketplace for any note corrections for SMIGR_CREATE_DDL. Yes, I found one. That correction should’ve been applied before running the script/Unicode Conversion. Since it was not tested in non-P systems, we didn’t know about that bug. Now I had two options: 1) Open SAP message or 2) Write my own script to generate Drop/Recreate statements for all indexes of PSA/Change log tables using AWK. Since writing script is easier than contacting SAP support I decided to write awk script.Currently we’re testing in non-P systems. One more thing: Since these tables are customer based, we can’t use older version’s SMIGR_CREATE_DDL’s output because the names would be different in different systems/days.
New SMIGR_CREATE_DDL’s output showing missing “LOCAL” keyword:
Old SMIGR_CREATE_DDL’s output showing LOCAL keyword:
AWK script which parses new APPL1.SQL, retrieves “CREATE UNIQUE INDEX” Statement and adds DROP INDEX command before CREATE UNIQUE INDEX statement.
I guess this example demonstrates how multi-skilled person can solve a problem quickly without escalating the situation. What skills were used to identify and come up with a solution quickly and efficiently. This table would provide the details: