I may be able to help explain why!

As I tend to talk too much, I have provided a summary and a details section below.  Choose your own adventure 😆

SUMMARY
  • Webi monitors VM usage as part of it’s Memory Analysis feature.  This can cause issues when VM limits are exceeded
  • glibc (version 2.10 or higher) which is updated on most newer RHEL versions can cause excessive VM allocation out of the box.
  • There are ways to tune/tweak memory allocations using the steps outlined in this KBA
  • Another option is to disable the Enable Memory Analysis option for the WIPS servers
DETAILS

We have seen a few issues come into Product Support now where BI System Administrators were reporting abnormally high VIRT (Virtual Memory) usage for the BI processes on their Linux installations.  I didn’t really think much of it at first because virtual memory isn’t usually a concern now a days with RAM being so cheap and hard drive space being plentiful.  Whelp, it wasn’t long before I realized that this IS an issue for BI 4.x.  Why you ask?

Well, it all started a few years back when we were learning to operate within the constraints of a 32-bit operating system.  Windows has a well know limit of about 1.8 – 2.0 gigabytes of memory per 32-bit process.  Once a process hits this limit, it’s a crapshoot as to what will happen.  Either the application will crash, hang or if you were lucky enough, it would through a catchable error message.

To work within this 32-bit limitation, we implemented a memory analysis feature into the Webi Processing Server and allowed administrators to set a low/high/max memory setting to help control this.  Since we do not know how much memory will be used when a query is executed on a report, we just have to do our best to stop new queries from coming in when the “high” memory mark is reached.

What type of memory does the Webi Processing Server take into account with this Memory Analysis?  Virtual Memory of course.  Why?  Because on Windows, if a process exceeded 2GB of Virtual Memory, the process would do horrible things.  So, we had to monitor this as best we could.

I could write a whole document on that subject so if anyone has questions around how that works. feel free to message me.  I’ll fast forward a little bit to today’s 64-bit world.  Today, with BI 4.x being 64-bit, we are not worried about memory as much.  The limits are very high with 64-bit processes and we will likely never reach those limits.

Even though the limits are much higher, we still have this Memory Analysis feature implemented and enabled by default on the Webi Processing Servers.  It is still a useful feature for administrators that wish to keep an eye on their system’s resource usage and limit users from executing unusually large requests.  Out of the box however, our default settings are low and this can cause some hard to explain behavior if you don’t know what you are looking at.

Enough background, here is a little known issue that I wanted to bring to everyone’s attention.

It seems that some versions of Red Hat EL 5.x and 6.x ship with an updated version of the glibc libraries.  Starting with around version 2.10 of glibc, the default malloc() functionality changed slightly.  For those that aren’t aware, malloc stands for Memory Allocator and the new default behavior was added to potentially increase performance on some applications.  The new functionality will allocate an “arena” of memory per thread that a process uses and will do this based on the # of CPUs that the server has.  So on multicore machines, we are seeing that a HUGE amount of virtual memory is being allocated right on the startup of our processes.

Right out of the box, my 16 CPU test system was allocating up to 11GB of virtual memory for some processes.  Webi processes (WIReportServer) were using 1.7-2.0gb of VIRT memory right out of the gate and were climbing up to 4-5gb after a single report refresh.  The java processes were way up there too and often started in a range of 8-10gb.  This becomes a concern for our Webi Memory Analysis feature as Virtual Memory is what it monitors for the Webi processes.

This results in stopped/failed Web Intelligence Processing Server and end users start reporting a variety of errors.  Server is Busy, Out of Memory, etc…

So, what’s the solution?  I have started a Knowledge Base article that documents the known work-arounds for this issue.  It is linked below:

https://service.sap.com/sap/support/notes/1968075

In short, there are 2 options.

  1. Add an environment variable to revert back to the old malloc() functionality (where virtual memory was not blown way up)
  2. Disable the Memory Analysis feature in Webi.

I would recommend option 1 personally.  The KBA above gives the details of which environment variables to test with.

The below – shows you the ‘top’ output of my BI server right out of the box after starting my BI processes with ‘startservers’

pre-malloc env change.png

And the below image shows the memory usage AFTER setting the environment variables.
post-malloc env change.png

Notice how much lower they are once we revert the malloc settings back to the old way of doing things.  It’s not just Webi that is lowered, you can see the reduction in VIRT is across the board.

So, if you have a RHEL system with glibc version 2.10 or higher, you may want to check your VIRT usage to see if it is abnormally high.  If it is, it might be worth testing these environment variables to see if it helps you keep that under control.

If you read this far down!  Thanks for tuning in.

Jb

To report this post you need to login first.

13 Comments

You must be Logged on to comment or reply to a post.

  1. Arvind Pandalai

    You really do talk too much.. 😉 😀 However we really need detailed information from experts such as you and hence would appreciate more blogs from a talkative person. 😉

    Thanks for sharing the blog, really help to troubleshoot and narrow issues.

    (0) 
      1. Arvind Pandalai

        Hello Jonathan;

        I re-read your post to understand more about the malloc function you have mentioned and hence to have a better understanding, I went through your note mentioned in the blog which again provides a detailed analysis and references to some blogs and redhat posts on glibc.

        In couple of posts there is reference to setting the Malloc function as 0 instead of 1. So my basic understanding here is not clear. How does the function exactly work?

        The note you have created mentions link

        https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

        I got clear picture of how the memory allocation used to work before its availability, however it does not mention again how this would work out with values such as 0 or 1

        With reference to same link

        “Setting MALLOC_ARENA_MAX to a low number will restrict the number of memory

        arenas and bound the virtual memory, with no noticeable downside in performance

        – we’ve been recommending MALLOC_ARENA_MAX=4. We should set this in

        hadoop-env.sh to avoid this issue as RHEL6 becomes more and more common”

        Here I would like to understand what / how much would be the number allocated. Do the numbers actually mean the # of cores. I found n references to how the values for this variable are being used w.r.t memory allocation.

        Please assist.

        (0) 
        1. Jonathan Brown Post author

          Hi Arvind,

          I have tried to answer your questions as best I can.  I am by no means an expert on the glibc memory allocation stuff so you may want to try some other Linux related forums for more details on that.

          My understanding is that MALLOC_ARENA_MAX specifies the maximum number of memory pools to allocate, regardless of the # of cores on the machine.  By setting this to 1, you are hard coding the number of “arenas” that will be allocated.  I have found no documentation on what setting this to 0 would do.

          The documents I read mentioned that setting:

          MALLOC_ARENA_MAX=1
          MALLOC_ARENA_TEST=0

          I read a lot of documents though and I can’t seem to find the one that referenced that so perhaps we can just drop the MALLOC_ARENA_TEST=0 recommendation for now.  In my testing, the memory allocation was the same with or without that one.  The key is setting that MALLOC_ARENA_MAX to a smaller number if your goal is to use less VM.

          This is really a tuning exercise though.  I can’t really recommend what you should set that option to but it does give you the option of finding the “sweet spot” for your system.  Some people suggest using 4 instead of 1 as well.

          As for how this affects memory allocation, it essentially hard codes a maximum # of arenas (memory pools) that are allocated for a thread instead of scaling it by the # of cores.  That’s my understanding anyways.   There is very little useful info on this out there!

          Hope that helps.
          Jb

          (0) 
          1. Arvind Pandalai

            So what about the reduction in virtual memory as mentioned in the SAP note. The memory could have changed due to a restart of services and might increas our some duration, may be in a day or two depending on user logins and usage,i.e, if my understanding is correct as per the note.

            (0) 
            1. Jonathan Brown Post author

              I took those screenshots on my very isolated test server.  Those screenshots were taken right after a complete shutdown and restart of the SIA/Tomcat and SQL Anywhere database.  I don’t even have a Webi report in that system so there is no way those values include any usage memory.  Adding that environment variable definitely did reduce my VM allocations on startup.

              That being said, we have at least two customers that have tested these in real world environments as well.  Their results are significantly less VM usage on startup and over time.  There are still scenarios where VM will climb up though, so it doesn’t necessarily mean that VM will be lower always.   Keep in mind that VM allocation vs usage isn’t a 1:1 ratio either.  Just because a top is reporting 11gb of virtual memory being allocated, doesn’t mean that the process is using 11gb of virtual memory.

              I’m probably in over my head now as far as memory talks go.  I’m an amateur at best so if anyone else has input, I’d be happy to hear it!

              (0) 
  2. tilak mishra

    Jonathan,

    In our case we use 4.1 SP2 Patch 4 on a Suse Linux VM.

    As we have never experience an abnormal VIRT value against to the processes specially for Webi Processing Server, but we seeing more often Out of Memory and Server busy errors. So do you advise the above analysis is still valid in our case?

    In our current configuration Memory Analysis option  is  enabled/checked state as that’s how it comes out of box. Why is it so? why SAP is not changing the default value?

    We have set MALLOC_ARENA_MAX=1, but didn’t notice any significant change in system behavior.

    Also, i don’t fully agree with your below line because the Webi Processing Services still has a limit of 6GB? why is it so?

    I’ll fast forward a little bit to today’s 64-bit world.  Today, with BI 4.x being 64-bit, we are not worried about memory as much.  The limits are very high with 64-bit processes and we will likely never reach those limits

    (0) 
    1. Jonathan Brown Post author

      Hi Tilak,

      This sounds more like a sizing issue to me.  If you are starting to see more of these errors then this may indicate that you need to go through a sizing exercise again based on the real usage of your system.  Most people size their system initially but forget that sizing should be done regularly to ensure you are not outgrowing your original estimates.

      That being said, it sounds like you are NOT seeing the issue I am referring to in this post but are probably dealing with some large documents that are running.  This is normal and as I eluded to above, is more a concern around sizing.

      I would recommend that you consider increasing the memory limits or disabling the memory analysis piece all together if you have lots of unused memory available on the Linux box.  The default settings are really just a safety net so that poorly sized systems do not have runaway memory issues.   You can increase those limits to be much higher if your system can support it.  I just tested setting my Max Memory limit to 12GB without issue.

      SAP doesn’t change these defaults because it is expected that each BI System Administrator will size their environment according to usage and available resources.  Increasing this by default for all machines could cause more problems then it solves.

      The memory analysis piece was implemented in XI 3.1 because of the 32-bit memory limits of 2gb (windows) and 4gb (UNIX).  It remains in the BI 4.x product, even though it is now 64-bit, because the functionality can still be useful in some scenarios.  If you do not see the value in using the functionality, you can disable it.  Best of both worlds 🙂

      I hope this helps!

      Jb

      (0) 
      1. tilak mishra

        Yes some of the Webi document consumes 1.5g of WPS memory and when 4 sessions gets loaded to one single WPS it max out the 6gb allocation and throws OOM error. However we have 48g each two nodes in cluster and total 6 WPS available. Do you think increasing max memory of webi processing server beyond 6gb can be an option to look at? I was under impressions the WPS has a 6gb restriction as it uses C++ code compared to other APS where we can allocate unlimited memory (subject to condition,availability of sufficient memory in the box)

        (0) 
        1. Jonathan Brown Post author

          There is no limit aside from the 64-bit memory addresss limits that would apply to any 64-bit application.  You can definitely set it higher than 6gb. 

          In bi4.x you can use a lower number of WIPS servers with a higher max memory.  The sizing in XI3.1 was scaled out due to 32-bit limits but that changes in BI4.

          (0) 
          1. tilak mishra

            Thanks Jonathan for advising this. I was always thinking 6gb is the upper limit for WPS. I would surely try increasing the memory limit higher than 6gb. Why in 3.1 we are not experiencing the OOM of WPS so often though there same large documents are open? Is there anything else changed in 4.1 WPS architecture and working principle compared to 3.1 ?

            (0) 
            1. Jonathan Brown Post author

              I’m not sure why the reports take more memory in BI 4.1 in comparison to XI 3.1 in your case.   There were a LOT of changes between XI 3.1 and 4.1 though and that includes support for new features and stuff.  Its possible it just needs more memory to support some of the new features of 4.1.  We’d have to look into it deeper if it becomes a major concern for your system sizing. 

              (0) 
  3. Claudio Sanft

    Thank you for this post.

    You don’t talk too much. You simply tell things the way they are supposed to be told.

    Beyond this, consider that “a lots of books would be shorter just if they weren’t so short”

    (1) 

Leave a Reply