How to troubleshoot SAP on SQL performance issues?
I experienced a poor performance issue recently. I want to share my troubleshooting procedure and hope someone could give me some feedback.
Part one: The whole system is slow. This is a problem mostly a Basis guy should be responsible for.
1. Check OS07 or perfmon to see if there is any SQL server memory trimmed out or external programs have very high CPU utilization. Also check page life expectency is not below 300 seconds constantly. It indicates a memory pressure. SQL server performance decreases dramatically when Data cache hit ratio decreases because too much data has to be read from physical disks instead of memory.
2. Check DBACockpit -> Performance -> History -> DB Collector History to see if “PAGEIOLATCH_SH [ms] / request” and “WRITELOG [ms] / request” increase. If it happens, you need to check out expensive SQL statements in DBACockpit -> Performance -> History -> SQL statement history or if a full backup occurs or a checkdb is running(Check SQL server agent -> jobs or exec sp_WhoIsActive, you can search and download this stored procedure using google). You can trigger the job: SAP_<sid>_<SID>_MSSQL_COLLECTOR in the SQL server agent -> Jobs to take a snapshot instead of waiting for 20 minutes.
ex: run this T-SQL script to check non-R3 processes
select * from sysprocesses where program_name not like ‘R3%’ and spid > 50 and status = ‘runnable’
3. Check DBACockpit -> Performance -> Locks to see if there is any blocking locks. Sometimes this job executes for a long time and you might miss a lot of blocking locks information(including the active blocking locks). You can use exec sap_lock2 in SQL server management studio or download HangMan.vbs from a SAP note instead.
4. Run SE38 -> /SDF/RSMSSVSE to compare wait events by day or every 10 minutes. If you find a wait event you are not familiar, search its meaning here (sys.dm_os_wait_stats (Transact-SQL) ). Please refers to this customer message( How to identify which SQL statement contributes the largest percentage of wait time? ).
5. Check SM66 to see if there are too many programs running and are they all necessary? Check SM50 to see if the program is reading the data from SQL server. Normally, you should see sequential reads or other operations read, ex: more than 1600 records per second. If it reads just 200 records per second, something slows down SAP on SQL server. you should create a baseline(measure the read speed using SAT) when the system is normal.
DB: Fetch MSEG 9967 hits with 12 seconds(830 MSEG Fetch per second)
DB: Open MKPF 13554 hits with 8 seconds(1669 MKPF Open per second)
DB: Fetch MSEG 9787 hits with 64 seconds(152 MSEG Fetch per second)
DB: Open MKPF 13309 hits with 29 seconds(458 MKPF Open per second)
6. Eliminate unnecessary queries. Use DBACockpit -> Performance -> Database Processes, then un-check “Include SQL Engine Processes” and check “Active Request Only”. Use Reset and Since Reset to check which program consumes the highest CPU and Reads. In production system, you might find dozens of running processes so it is difficult to identify which one should be terminated. Anyway, you can use step 2 first.
7. Check the bottleneck from ST03. Compare ST03 -> Detail Analysis -> Last minutes’s Load with yesterday’s load. Check if database time consumes more than 40% of total runtime. If the answer is yes, it indicates it is a database performance issue or a network performance issue(SQL server does not transfer the data to application servers fast enough) or the active programs request too much data.
Part two: A specific program is slow. This is a problem mostly a developer should deal with.
1. Check execution plan. Use DBACockpit -> Performance -> Database Processes to see the execution plan SQL server is currently used. Does the variant select too many records? Is it an optimized plan? Is update statistics up to date? Is there a parameter sniffing?
2. If you see SQL server does not read the data in SM50 and there is no blocking locks, your next step is to use SAT to trace SAP ABAP codes. Is there an infinite loop or a long running RFC? Anyway, you have to identify what does SAP actually do in this time frame.