Using the SAP Cloud Platform (SAPCP) I have been collecting data about the SAP Community and this blog will cover some of the analysis of this data. Previously I had covered a way to collect data via RSS feeds from but due to limitations of the RSS feeds the actual data was not consistent for further analysis. Using the SAP Search API (this API is also the backend engine that drives the OneDX search page and when you search this site) I have collected data from from 10th October 2016 to 30th April 2017. I am happy with the quality of data however first some background about the data set I will use. The base level technical details of how I collected the data can be found here in my SAP Cloud Platform blog,


Data Quality

I have collected questions since the beginning (10th Oct 2016) of the new SAP community via the SAPCP and SAP Search API and extracted such items as primary tags used in the questions. This does lead to lets say user classification issues,  i.e do you pick the right tag when asking a question? I am sure that you do. Although I am also sure a moderators job on this site must cover a process to retag questions on a daily basis! And as you are reading this site I will assume you have familiarity with the concepts such as tags and how this site works (or do we know 🙂 how it works)

I collected the data on a bank holiday Monday at the start of May here in the UK. It was a one time loop over the dates (Oct2016 – Apr 2017) and therefore it is now out of date. There is also obvious potential for some questions to have been updated/retagged/deleted etc. I do not work for SAP and there are certain aspects of the site that I can’t collect or have access to. For example, I have the questions and the text of any answers but I do not know if answers are being accepted to these questions.

All charts were created with Lumira and I use out of the box features where appropriate. Lumira is accessing my data set in the SAP Cloud Platform. I have googled some ideas on how to present this data but data analysis is not my day job. My main primary goal was to use SAP technology and a challenge to see if I can present/analyse the data in an interesting way, let me know if I succeed or not with that objective! I’ll cover some of my conclusions at the end of the analysis. If you are interested in seeing the raw data I have collected then let me know and I’ll work out a way of sending it on to you. ( Although as mentioned I do cover the technical details in my other blog )

So with those details in mind and that again 🙂 the data is an unofficial snapshot in time of the status of this site. I’ll get to my point and …. continue.


Total Questions and Authors in the Data Set 10th Oct 2016 – 30th April 2017

In total I collected over 41k questions and 23k Authors. The Author data is based on the unique AUTHOR_ID which links to the individual profile page. If you are anything like me then you may have multiple SAP IDs which may or may not be registered with the SAP Community. I chose to count Author_IDs only so there is potential for multiple Author_IDS linked to one individual or multiple Authors depending on shared use of one logon ID :). Does that make sense! And if not let me set an example and say hello to…

The Arun Kumars

Arun Kumar is the most popular Author name in the dataset and linked to multiple Author_IDs. I have used the Author_ID as the measure in all other charts (where appropriate). In some cases that could be the same person or not. However I will be using Author in the title of the chart but linked to Author_ID, it did sound a better option at the time of analysis.

Moving onto primary tags which are part of my data set.


Top Primary Tags For Questions

ABAP Development is the top primary tag for questions

Talking of tags, I had the idea to focus on the “Using” primary tag that would cover questions about this site since its launch. If you have an issue using this site then you can search for similar problems. If you can’t find an answer you can ask a question over here Using Questions

I was curious to find out how many questions there would be for this tag and see if I could highlight any trends. This is what I found….

“Using” Primary Tag Stats

A quick view of the overall totals

Overview of the timeline Oct 2016 – Apr 2017

Now the trend for questions is going down from the wild west days at the start!. What had just happened to this site on October 10th 🙂 Maybe some retagging or moving of questions had taken place as well.

I had some thoughts about this,

a) It is a good sign that there are no major peaks after the initial start of the new community.


b) Are questions being answered and is there an accepted answers to the question? Unfortunately as mentioned I do not know that detail from my data set. There is no data that I can find that indicates questions are closed with an acceptable resolution (or even just closed/deleted due to any other reason).

However closing/accepting answers to questions is an issue that generated a quite a bit of talk at the coffee corner of this site.

Link to the coffee corner.

Link to the conversation about accepting answers to questions

Steve Rumsby started that particular conversation and at the time of analysis it was going for 136  days. The discussion was created on Oct 18th 2016 and 3rd March 2017 was the last update.


and finally

c) Are people still using the site? This was another thought from various statements I have read around here and else where. I knew I had the overall data of AUTHOR_IDS (with the risk of duplicates pointing to the same user as mentioned for that metric), here is the author data by a timeline. The chart below covers all tags (and not limited to Using tag)

The data set I have indicates a stable view of the site use. No major rise or fall in author_ids. A slide to the new year and then maybe a very slight increase via the running average line in the chart. I took the 3 day default for running average in Lumira.

A look at the overall questions by date and it indicates a very similar picture.


One option for user migration may be the availability of another forum to ask questions. I was interested in one of the top primary tags SAPUI5 for authors over the time period.

I did notice a steady increase since the start of the year (maybe new year resolutions to learn UI5 ;)) so a good sign that more people are using the SAPUI5 tag but I make no comments about the quality of these questions. If you have direct involvement in UI5 primary tag I would be interested in your thoughts on the activity the above chart indicates and availability of other forums.

Back to Using Tag

As in a similar process to the longest lasting discussion on coffee corner, what question generated the most characters on this tag. (say what does that mean 🙂 ?) Well I used a simple character count to find the top question on Using

Link to the top question in the screenshot.

The question was “Has The New SAP Community Killed The Community?”, well I am not sure in every aspect but I am still using the SAP Community and I will continue to use the site.

However that triggered a thought to use HANA’s Sentiment Analysis on the Using forum. I have not used Sentiment Analysis before so first thing was to check out an OpenSAP course

Overall table of sentiment for “Using” tag is below. A link to Open SAP course on the subject of  “Sentiment Analysis”

I have been checking the analysis and my dataset is based on a question/answer forum and the SAP analysis is “Voice of a customer” I am not sure if the use case is an exact match. As I said a the start, I googled 🙂 and some statements about “sentiment on forums works differently” and that is only one link but I found some other statements matching that one. Also another that compared many different  sentiment algorithms for forums sites as well. However I don’t believe everything I read on the internet but it did make me question the value of the sentiment analysis on my dataset. However I will use the process to highlight the positive and then the negative, as this leads me to make some points later on!

Top Positive Words

*a good sign that manners and thank you are common on the site 🙂

Top Negative Words

I will pick out one negative word, and my word is frustration. The most frustrating thing for me on this site is this….

Frustrating and annoying especially browsing the answer forum on a mobile. I click accidentally and regularly! And never find my way back to where I started after I click on the “Show More” icon. I wish it goes away. I would prefer something like the screenshot below for navigation and seeing more content. It is used in the search pages of SAP Search API (OneDX). I know I will return to an expected place using the search pages.

I have up voted an idea on for the SAP Community about navigation also this one, I know from reading the comment that this idea as a whole about using Fiori will not be accepted. I voted to show I do not like the current navigation. Hopefully “+ show more” will be no more soon.

Back to my own data set and something in the text of questions that triggered my interest.

I decided to use only SQL commands and a base text analysis approach over the entire 41k questions. It seemed sentiment on so many different subjects/tags would not be of value but as I say I only have beginners knowledge of text analysis. So what could I find out about 41k worth of questions/answers with SQL and core text analysis. By core text analysis I used the “EXTRACTION_CORE” option which “.. extracts entities of interest from unstructured text, such as people, organizations, or places mentioned” source The people part of that option sounded like a possible way to see who was answering questions. However I begin with…

The Hour Of The Guru

I thought I would see how many questions open with a “Hi Gurus,”  in the data set. That seems a popular opening line to any question. I used a straightforward SQL statement to try and find out how popular it was. It seemed a better fit to look for the phrase ” Gurus,” as there are variations to the theme of guru, such as “Hi Gurus,” and “Hi SAP Gurus,”.



Original SQL Analysis

Top 5 Primary Tags saying Hi to the Gurus,

Not as many as I initially expected but probably an issue in how I am looking for the Gurus in the data.



If you are using Internet Explorer then the above Original SQL Analysis will be visible any other browser you can click the arrow to see the original query looking for Gurus.

I had a “light bulb” or maybe a “Doh! that was obvious thing to try” moment prompted by Jürgen’s comment below.
I realised I should use the in built HANA search engine with whats known as a Fuzzy index with my SQL query. It now seems a pretty obvious thing to have tried with HANA from the start. As the SAP Search API was the source of my data but I ignored that for the analysis! Well something for me to learn about and try right now. As that was my intention anyway to learn new things about SAP alongside this data analysis.
So the query I came up with is below for Guru’s

From Jürgens comment a query for “Hi Experts” on Primary Tags


I moved on to identify the day and hour when most questions are asked on site.

As with my Data Geek entry trying to find the best date and time to blog on SCN. From that analysis (link below) I found it best to publish a blog on Wednesday at 13:00


So what day and hour do we need the gurus most?

Top day for questions

Drilling down into the top day Wednesday

Hour with the most questions

So calling all gurus 😉 , we need you standing by your keyboards most on Wednesdays at 10am.

Err, when is 10am for you? probably not the same 10am as me. I.e. SAP Community is a worldwide site covering many timezones. We need to co-ordinate the gurus coming together at the right time 🙂

So my dataset is GMT/UTC so work out what 10am is in your timezone gurus and boot up the laptop and logon and be ready to answer some questions :).


Core Text Analysis

The final text analysis as a reminder is based on “EXTRACTION_CORE” option which “.. extracts entities of interest from unstructured text, such as people, organizations, or places mentioned” source 

I was hoping to identify actual people who answer questions by full name. As it turns out I failed to do this as the analysis picked out mostly only first names and SAP product names. As shown below. However some of the names in the list I am sure I do know the individual full name that has created most of the entries. I am impressed by the contributions they make to this site. The SAP Community site wouldn’t be the same without them and actually I didn’t need any analysis to know how much they do contribute ;). I left the SAP product names in place and I am sure you know some of the real full names below as well. This is what the HANA text analysis identified. Who is this BADI in amongst us though 🙂


Isn’t it Ironic

During the process of putting this blog together it actually triggered my first question on I used Lumira to analyse the data in my trial SAPCP via opening a database tunnel – technical info here. This does not work consistently at the moment and also I can’t connect via Eclipse to my trial account. It has delayed me completing this blog.

For my question, I have actually changed the way I comment on this site and that is due to this data analysis. That change is thanks to Diego Lother and the way he contributes to the site (not sure if mentioning people works on this site but I will try this @diego.lother ). Diego’s full name(well at least first name and surname ) appeared in my text analysis process and that is different to others. I was curious to see why and to my knowledge (I have not gone through all of his content) Diego uses his full name every time he answers a question or contributes to the site. It seems kind of obvious to me now as others use first name and sometimes no name at all. I was initially hoping for AUTHOR_ID to uniquely identify users but alas that was not the case. However I will try and use Diego’s method myself and use my surname on content/comments, well apart from any future contributions to coffee corner! Not that I propose I will try this analysis in the future but it seems a good method to use when commenting on this site. I am curious though if Diego manually types his name every time or uses some sort of short cut key signature method? If you do read this blog Diego then can you answer that for me? Or maybe I should ask a very specific question to Diego on the Using answer/question forum 😛



Validation Of Data And Missing/Deleted Questions

As mentioned in the Data Quality section at the top I was conscious to validate the data set and ensure at least what I had was valid. As I do not work for SAP I relied on the search API and the actual SAP Community site to cross check my data. It is a snapshot from the 1st May 2017 and during the process of writing the blog (and delay due to Lumira/SAPCP access issues!) I ran some random checks on the data. What I did notice was missing/deleted questions is common, so I took some time to ensure a random sample was at least accurate and valid for a snapshot collection.

E.g. Maybe a bit technical so bear with me, I took 1000 URLs and ran that through a unix command to check for the the HTTP return codes. Out of the 1000 then 20 questions 2% had gone missing. I still had valid data though in a snapshot sense. For example this URL is no longer found on this site.

However it exists in google cache so I am happy the data set is valid as much as I can prove it to be valid 🙂

The 2% out of my sample seemed high though for deleted/removed questions. Although I do not know what if any the average for deleted questions should be on a forum site.



The phrase “Steady as see goes” seemed appropriate initially from the stats. However I would clarify, I see that in a free from fluctuation sense and not stability of the site. That is linked to the running average of questions/authors and overall statistics remaining flat. Also I was slightly disappointed missing a key metric of analysing answers to questions. While my intention is always to use SAP technology and not necessarily linked to SAP Community site, I do enjoy the process of collecting/analysing data. I will keep the data set for a while longer to see if I can improve my text analysis skills :0 or any other related SAP tech as well.

I have found and continue to find some of the site functionality frustrating as well. I have a lot of time for the SAP Community and taken out more than I will put back in, so I will be around for as long as I work in the SAP field.

Thanks for reading and I am left with one thing to do and that is to sign off 😉

Best Regards

Robert Russell



To report this post you need to login first.


You must be Logged on to comment or reply to a post.

  1. Michael Appleby

    Hi Robert,

    Fascinating study you did from outside of SAP.  I need to go through it again with a fine tooth comb.  Perhaps I will also start answering with my full name as well.   😉

    I am a Moderator for SAPUI5 tag and will try to respond to your request after a review of the tag’s Questions.  I will also go back to see if I can figure out when the tag got changed from UI Development Toolkit for HTML5 to its present label of simply SAPUI5.  Not sure it will make much difference as I was editing those posts to correct the Primary Tag during that time.

    I will also be forced to reread your blog so I understand all the aspects you mentioned including a look at the code and techniques that you used.

    Now onto the other blog you wrote just before this one.


    Cheers, Mike (Moderator)
    SAP Technology RIG


    1. Robert Russell Post author

      Hi Mike,

      Thank you for the comment and I am glad you found the blog of interest. Also thanks for offering to check the SAPUI5 tag information, I would like to hear that feedback.


      Best Regards

      Robert Russell 



      1. Michael Appleby

        To follow up on the saga of SAPUI5, the name was changed in the December update to Semaphore Workbench (not really sure of the date, but I think it usually around the third week of the month).  It was definitely in place by January 3rd, so it was the December update.

        In general those topics which have really active subject matter experts (SMEs) and Moderators do pretty well even with the deficiencies of the new community.  In some cases, like SAP Fiori, this is in spite of many subordinate tags (56?).  SAP Fiori has a lot of active SMEs and a lot of active Moderators which compensates for the dispersal of Questions across way too many tags.  For some of the other rather active Tag areas, like SAPUI5 and SAP Process Integration, the tags stand alone (i.e. no subordinate tags).  So when SMEs go looking for interesting questions to answer, they really only have to go to one Tag page or one Answers (single tag) page.  Both SAPUI5 and SAP Process Integration have strong SMEs and Moderators as I mentioned about Fiori.  I would bet that almost all the most active tags have similar support.

        Another item which I will mention which affects the new community and specific to communities for which I recruit Moderators.  I have mentored a lot of new moderators and continue to do so in these days.  In the past, Moderator’s duties were monitoring content (yes, all of it) that came through a Space, adjudicating Moderator Alerts, and teaching good practices to newbies.  There was more, of course, but that was what I focused on when helping new Mods.  These days, I ask them to answer questions.  Period, dot, end of story!  This is not all a Moderator should be doing, but with the state of the community, this is the greatest need.  New Moderators since go-live tend to be members of Product Management teams.  They are often interested in marketing their particular applications or services, so it can be a bit of an educational headache to steer them in the right direction, more so with blogs than questions.

        One last observation on a peculiarity of SAPUI5.  Before the name changed to SAPUI5, most folks would label their Questions with SAP Fiori.  Fiori applications are built on SAPUI5 and often Questions are technical in nature on extending, customizing those same applications and/or building “Fiori-like” applications.  Since I and several of the Moderators of SAPUI5 are also active in SAP Fiori, we were spending a lot of time fixing wrong tags.  So when the Tag was renamed appropriately, it was actually a time savings for those of us retagging.

        So getting back to looking over SAPUI5, there are at present 1623 Questions with 1239 Unanswered (not marked with Accepted Answers).  If you look at the Q & A stats for all questions, the definition of Unanswered is different.  There the Answered percentage comes in at slightly over 60%.  But that definition is that a Question has had an Answer posted, not necessarily accepted.  So we are Apples and Oranges, but I think that having 23% with correct answers identified is pretty good under the circumstances.  A quick survey (not particularly significant statistically) of the last 100 questions shows 32 without Answers.  So at 68% and including recent (including today which had two questions with one already Answered), I think it is well above average in responsiveness.  I don’t know if the percentage would be much higher if I went back to prior to the name change or given a longer time to generate responses.  I would guess it to be roughly the same for many of the same reasons.  Active members, active Mods, and with retagging, a single entry point for content on the subject.

        Again surveying the responders (last posts) and again without statistical rigor, I found 6 recognized (I know them all as such) SMEs who were the last responders for a total of 10 last responses.  Only two were Moderators, one was a mentor, and the other three were regular members.  Of the last responders for the 100 question, an estimated 3/4 were not SAP Employees which I consider extraordinarily encouraging.  I really would not have expected the fraction to be that large.  I am quite pleased to see external members so active, though I expect many are searching for solutions.  Even so, that they are still coming means the efforts of the Moderators and SMEs are paying off in keeping engagement active.

        In closing, I think what you will find in the top activity tags, active SMEs and active Moderators doing everything they can to keep their areas of interest going.  It makes me proud that so many are doing so.  But does this scale for the entire community?  Maybe…  Or maybe we can build on what still active areas exist and once fixes are in place for some of the most egregious deficiencies, others won’t need quite the extra effort.

        Thanks for your efforts and it was educational as well as encouraging to me personally to have done the review your work requested.

        Cheers, Mike (Moderator)

        SAP Technology RIG

        1. Robert Russell Post author


          Hi Mike,

          Thank you for such a detailed investigation and reply. I must state I am so glad that it was a positive outcome from your research into the SAPUI5 questions. I knew by only focusing on the primary tagged questions I would miss out on secondary tags. I would not include all questions that had a secondary tag SAPUI5. My thinking at the time of my analysis of the data was that the primary tag would be the best bet to focus on and be of higher quality. I.e an OP would place the question correctly or a moderator would fix the primary tag first. I admit only a theory and I tried to mostly focus on the overall data for the analysis.  The overall stats to me showed a steady use of the site, no major fluctuations (with the 3 day averages helping me to state that).  The reason I chose SAPUI5 was mainly the other potential forums out there for javascript, i.e. another theory I had (seems I have many !) is that for example the ABAP tag would be only really relevant here on the SAP community. Javascript forums are more prevalent and in my own experience when I have Googled I have had success with UI5 questions being answered on other sites. I was surprised to see the stats for SAPUI5 increasing in the dataset as per the chart in blog. I had not expected that result to be honest, hence the question to see if anyone had knowledge of the SAPUI5 tag. I really appreciate the time you have taken and agree that all involved with the SAPUI5 tag should be proud of the effort and support on the site. It is positive to see many questions being answered and from all parts of the community.

          Best Regards


          And now more formally 😉 Robert Russell

          1. Michael Appleby

            Hi Robert,

            Like I said, it was educational and worth the time it took (for me at least).

            an OP would place the question correctly
            or a moderator would fix the primary tag first.

            Well, I am almost tempted to let Juergen respond to the first since he does an awful lot of retagging.  There are too many bloody tags period, dot, end of story.  With so many and so many poorly labeled tags, a new member often selects the first one that comes moderately close or contains something that contains the right words.  So we see people selecting SAP Fiori for Solution Manager where the question has nothing to do with Fiori, but is specifically for SolMan.  Or look for CRM and only see some of the subordinate tags since the tag Customer Relationship Management does not have an alias (same for SCM, and others).  At least MM, SD, PP and a few others managed to get the common terms along with the full name including in the label, but it was not consistent for all major function areas.

            Yes, it is something I continue to harp and why I pushed rather hard to get the SAPUI5 name in place of the former label.  It still took two months to achieve.

            Juergen is a stalwart retagger, but when we only have about 300 active moderators and 3000+ tags with more than half un-moderated, it is beyond our capabilities to keep up with even a significant portion of the average of 200 new questions per day.  I only do it incidentally and only about once a week.  Many of those I catch are Fiori applications that are really poorly labeled with common function names that strangely do not include the term Fiori.  If you were posting an SD question on Create Sales Order, guess what pops up as the tag to use?  A Fiori application tag.  Arrggh!

            Cheers, Mike

    2. Jelena Perfiljeva

       Perhaps I will also start answering with my full name as well.

      Please don’t! No disrespect to Diego but this has been discussed some time ago. Forum post is not an email and does not require greetings and signatures. Besides, this is also confusing as heck when you see one name in the profile and then another in the signature. And it’s not just short name (like Mike for Michael) but a completely different name.

  2. Jürgen L

    I don’t trust any statistic that I have not faked myself, nevertheless, nice work, and hats off for time and effort you spent for this.

    I am sure that there are more than 9 “Gurus” in MM, I mean as salutation….

    I just browsed through the MM questions of the last 19 days and found already 6, which would be a huge increase if all previous months had just 9 together. Google lists 1320 Gurus since the start of the new community. Unexpected low, felt it is more, by the way, “Experts” is used more than 8000 times.

    I also agree to Mike’s statement about SAPUI5, with the name change this tag was found while the questions were posted everywhere before. This should give the people responsible  for tags and its names some feed for thoughts.



    1. Robert Russell Post author

      Hi Jürgen,

      Thank you for the comment and it made me smile at the “statistic that I have not faked myself” comment. I surely am an amateur at analysis, I was surprised at those results as well.

      You also had me googling to spell your name out of respect as I did lose that in the twitter @scnblogs analysis blog I did previously.

      Thank you for taking the time to comment , I’ll check out the experts in the data set as well.

      Best Regards

      Robert Russell


      1. Robert Russell Post author

        Hi Jürgen,

        Thanks again for taking time to comment you did prompt a lightbulb moment for me and I got to update the blog and learn something new. I ran HANA SQL fuzzy search queries for Guru and “Hi Experts”. Google is the master of search so I wont argue with the results you had 🙂

        Best Regards

        Robert Russell.

  3. Jelena Perfiljeva

    Wow, what a treat! Excellent blog and the amount of effort put in – simply amazing!

    Thanks so much for posting this. Great use case for the SAP technology. Although I’d be curious what if this data could be used in Tableu or other competing product. 🙂

    I’m guessing sentiment analysis may need to be adjusted for sarcasm, which happens a lot on SCN. (Well, in my posts, anyway. 🙂 )

    As Mike correctly pointed out, some tags could be “spread out”. Even in the good old SD we have tags “SD Billing” and “SD Sales” for no good reason. These spaces should’ve been eliminated and absorbed into SD back in 2012 yet here we go, they are now tags.

    Activity by dates – do those dips correlate with the weekend? Not sure I can tell clearly from the 15 day interval..

    Thank you!

    1. Robert Russell Post author

      Hi Jelena,

      Thank you for taking the time to comment and good to read that you found the blog of interest.

      I did enjoy the process/challenge of trying to collect and present the data in an interesting way. I try to limit it to SAP tech so that way I get to learn new things along the way.

      Also from my experience Lumira was best with SAP Cloud Platform via the specific connection method I use. Using a database tunnel to the SAPCP with Power BI and Tableau (a few years back ) was always slow and a disappointing experience. Maybe it’s lot better with the latest versions of those products. ( HANA PowerBI/Tableau may be a better experience with direct HANA connections but I do not know)

      And as for the dips -they do reflect weekends , I agree that the interval of the chart it is not so clear for the timeline ones. However the “Top day for questions” chart is the overall totals and does reflect Saturday and Sunday dips.

      And finally for me to sign off…. erm no I’m off to do some more Googling after reading your reply to Mike’s comment and Google how to make up my mind to sign off forum posts 😉 in those other discussions. I note that yourself and Jürgen do no not use greetings and signatures in these comments…. However I opened with a greeting so close for now 🙂

      Best Regards




Leave a Reply