Skip to Content
I blogged about a performance issue in sales order management last September.    That focused on problems caused by table buffering changes.    This time, there are reported  issues with manufacturing response time, both dialog and batch.

 

 First I looked at dialog response time using ST03 (expert mode).    I picked a few of the most-used transactions starting with “M”.   We keep 5 months online, so I viewed average response times back to July 2007.    For each month, I copied the values to a spreadsheet after switching the screen to Excel viewing mode.

It’s easier for me to see the trends with a picture, so I graphed the first 3 columns – dialog steps, time, and time per step.    I often compute the average instead of using the downloaded column.    Below are the views I captured.


image
image
image
image


 

I saw different trends, where some transactions show degrading performance over time, with others remaining constant.   This means the entire system is not having a problem, but perhaps it is the tables or indexes used in MB51?    As that one showed the largest change, I looked at it closer (note the slope of the yellow line on the first graph compared to the others).

 


image


The week view has higher numbers than the month views led me to think, although a few readings were under 2 seconds.   I stopped charting after looking at 2 application servers as I felt a pattern was beginning to emerge.

When I added a trendline, I saw increasing response time on these app servers.   The change from December to mid-January was a half second.  This might start to be noticeable to users.

I posted emails asking if end users were reporting worse performance, and didn’t hear of anything significant, although one business analyst mentioned that using date ranges in MB51 seemed to be less efficient than he thought it should be. 

 


image


The day view widened the recorded response time ranges.   This time I looked at one more app server, and skipped the weekend data.

 

The last drill down level I looked at was response time per user for MB51 on each application server on a specific day.   This is the only way I’ve found to understand who is doing what, without running traces or  analyses. 


image


I used a scatter plot to display each user’s response time for that day.  There are a few over 2 seconds, including a surprising maximum over 4 minutes.  I later found out from the Basis team that we were getting dumps where other transactions hit the maximum time allowed, but these are not recorded in ST03 times.


image


The last plot for that day omitted the top user, with a resulting trend of the vast majority of users seeing 1 or 2 seconds.

 

After looking at ST03, I switched gears the next day and started looking at ST04 to find what was showing up in the database shared SQL statement cache, and looked at a different perspective in ST03, the 10 minute snapshot.

The SQL cache review highlighted an SAP program (RM07DOCS) as having many disk reads.  Several executions of one ABAP statement were in the cache, hinting that the code was being run in different ways.

 

Here’s one I found:

SELECT
  T_01 . "ANLN1" , T_01 . "ANLN2" , T_01 . "APLZL" , T_01 . "AUFNR" , T_01 . "AUFPL" ,  
[...]
FROM
  "MKPF" T_00 , "MSEG" T_01                                                                        
WHERE
  ( T_01 . "MANDT" = '040' AND T_00 . "MANDT" = T_01 . "MANDT" AND T_00 . "MBLNR" = T_01 . "MBLNR" 
  AND T_00 . "MJAHR" = T_01 . "MJAHR" ) AND T_00 . "MANDT" = '###' AND T_01 . "MATNR" = '##'
  AND T_00 . "BUDAT" BETWEEN '20080121' AND '20080122' AND T_01 . "WERKS" = '##'

 

Note: I changed literal values to number signs

This is a join between MSEG and MKPF, with the former being larger (50GB?).  The explain plan lists a nested loop, with MSEG accessed with index “M” and MKPF using the primary key index.  The material number was entered but not the year, which might be more efficient.  Index “M” also keys off material number, with plant (WERKS) as the next key, so this should work well.  But ST04 shows almost 30,000 disk reads for this statement, meaning it could be better.

ST03 10 minute detail report

    
Started Server Transaction Program T Scr. Wp User Response Time Time in
                ms WPs ms
                   
10:01:39 ap06_PRD_0 MB51 RM07DOCS D 0120 0 T 14,808 14,726
10:03:54 ap02_PRD_0 MB51 RM07DOCS D 0120 8 L 18,629 18,536
10:05:12 ap03_PRD_0 MB51 RM07DOCS D 0120 31 Y 33,836 33,518
10:07:24 ap07_PRD_0 MB51 RM07DOCS D 0120 2 E 169,999 169,858
10:08:46 ap07_PRD_0 MB51 RM07DOCS D 0120 8 C 38,617 38,536
10:11:37 ap07_PRD_0 MB51 RM07DOCS D 0120 2 E 37,659 37,659

 

An email sent to a user returned with the feedback:

“… pero lo corrio sin parametros, ya  hable con ellos para que no vuelva a pasar” 

(Maybe Blag or Ignacio can help me out here) 

 

Some of the above will be easier once I get Central Performance History working – see these earlier blogs [7606and 7607 ] for where I’m headed with that.

I’ll try to blog on the batch run time analysis when I have time. 

 

[All opinions expressed are mine, no one else’s]

[04-Feb-2008 Update] 

 

Problem user or problem system? 

Ignacio rightly challenged my assertion that performance problems are caused by users, as it should be up to the developers and infrastructure team to deliver quality customer solutions.  However, sometimes the budget isn’t big enough, and sometimes, you inherit a system designed for a previous problem while users are solving the next one.

Problem user or user error? 

There is turnover in the user community, so last year’s training class was effective for those who attended, but not new hires.  This dilemma requires ongoing learning, whether virtual, viral or vocal.  Part of my job is determining where those gaps occur.  Then there are refresher courses, or just reminders, for those who have forgotten the basics such as which columns allow the fastest return on query investment.

Then there are the “top” users, who are the exception to the rule, and who sometimes  need to submit the dinosaur killer query to get their job done.  The systems people need to recognize who they are, and perhaps steer them to a quiet time to prevent contention with other users.

Nuts & Bolts 

After posting my original blog, I have continued research with the Basis and DBA teams.  Below are findings as of 04-Feb-2008.  Our Unicode conversion is next week, so some performance aspects may change due to index rebuilds.  One index on MKPF did not get transported to production this weekend as expected, meaning I’ll need to wait another 2 weeks before that impact is seen.

Tabe rows (TST)

MSEG 145,342,400
MKPF  19,374,700
 

Custom Indexes we have created over the past 10 years on MSEG and MKPF

MSEG~Z1

     

  1. WERKS  
  2. BWART  
  3. KOSTL

MSEG~Z2

     

  1. LIFNR  
  2. WERKS  
  3. BWART

MSEG~Z3

     

  1. LGORT
  2. WERKS
  3. BWART
  4. SOBKZ

MSEG~Z4

     

  1. AUFNR  
  2. MATNR  
  3. WERKS  
  4. BWART

MKPF~Z1

     

  1. MANDT  
  2. CPUDT

MKPF~Z2

     

  1. MANDT   
  2. BLDAT

MKPF~Z3

     

  1. XBLNR

New for February 2008:

MKPF~Z4

     

  1. USNAM   
  2. BUDAT

PRD – Oracle 9i / non-Unicode

SQL Plan for above generated code

TST – Oracle 10g / Unicode

New index is in place; transport into production 02-Feb-2008. 

PRD – a different SQL statement from the same ABAP statement

The Plan

TST plan

Well, this is not as good as the other, as the optimizer decided date wasn’t selective enough (range is a month instead of one day now), and went for index Z1 on MSEG.  This will find all the rows mathing that plant (and the 2 BWARTs), and weed through MKPF by MBLNR.  Ugh — what I like to call the Mumble Number.

Shrinking the date dange in ST05 explain plan demo screen shows how the plan changes.

T_00 . “BUDAT” BETWEEN   ‘20070526’ AND ‘20070527’

The other improvement to this plan is a switch from NESTED LOOP to HASH JOIN.

Conclusions?

 

Not yet!  There is a transport in flight, as well as Unicode and Oracle 10 migration.  We could see improvements from better user education, and maybe pre-filling columns to steer users away from open-ended wildcard queries.

 

Stay tuned.

To report this post you need to login first.

9 Comments

You must be Logged on to comment or reply to a post.

      1. Ignacio Hernández
        I think that the user ran the report without any parameter ( filter ), that altered your stats.

        your question left me thinking.
        “user error”… does exist something like that ? or an “user error” is just a “system error” triggered by a user action.

        (0) 
  1. Paul Hardy
    The design of the tables MKPF & MSEG is fatally flawed. MKPF has the date as the index, MSEG has the material and plant. So the join has two choices:-
    (i) It gets the data for EVERY plant using the date selection on MKPF and then joins to MSEG for only the required plant / material.
    (ii) It gets the data for the plant / material for all of time via MSEG and then joins to MKPF to restrict by date.
    The first choice is best, but neither are very good. this is why you sometimes see MB51 running faster if you run it for a week leaving the material selection blank than if you run it for a week specifying a single material. In the former case it uses the index on MKPF, in the latter the index M on MSEG.
    The best table to get such data from is S033 as it has date and plant in it’s key, but as an LIS table you can’t trsut it 100%. Of course the symptoms I describe will be different for each company depending on how many materials and plants you have.

    Of course if anyone has solved the problem of reading material documents quickly I would be interested to learn how.

    Cheersy Cheers

    Paul

    (0) 
    1. Lars Breddemann
      Hi Paul,

      the inherent problems with joins of MKPF and MSEG are widely know.
      Due to design limitations in the database scheme of SAP (namely the way dates are stored – as strings) the optimizers have hard times to find the optimal execution plans.
      For oracle you may want to check notes
      SAP Note 122718 CBO: tables with special processing
      SAP Note 797629 FAQ: Oracle histograms
      to get a bit reliefed from that limitation.

      For MSS there is note
      SAP Note 821722 Performance of the join on MSEG and MKPF

      Anyhow: query against such a view without specifying selective information like ‘year’ is not a wise choice application wise. This should be omitted by the ABAP code.

      KR Lars

      (0) 
      1. Jim Spath Post author
        All – thanks for the commiseration on MB51.  I could tell how bad this was from the disk reads and transaction times.  Many users have learned to improve response time by entering more filtering columns, so part of the answer is education (or reeducation).  Another part of the answer is continued database tuning.  We have 3 “Z” indexes on MKPF already, I believe, as I saw some SQL statements use those.  Another index is on the way to production.  I think the only trick we have not tried is histograms, so I passed that hint to the DBAs for review.  //jim
        (0) 
    1. Jim Spath Post author
      SM:

        It is possible that combining two different indexes might save some space in the database, though the overlapping columns would need to be large enough to matter.  In this case, KOSTL and LIFNR are not common, and are the major portions of the respective Z1 and Z2 indexes.
        But you are overlooking two critical factors here.  One is whether existing statements would use the combined index, which would depend on the specific columns in each statement.  I would much rather have two indexes that are used efficiently than one smaller one that is not.  The other factor is use of memory buffer as opposed to space on disk.  A combined index may drag more blocks into memory than a smaller more selective one would; we’d only know that by doing statistical reviews of the selectivity of the columns (or doing a transport and hoping for the best).
        I think a more productive approach would be to look for indexes created in the past (whether SAP or custom) and deleting unused ones.  That would save space, plus cut down on insert/delete overhead.

      Jim

      (0) 

Leave a Reply