Using the SAP Cloud Platform (SAPCP) I have been collecting data about the SAP Community and this blog will cover some of the analysis of this data. Previously I had covered a way to collect data via RSS feeds from blogs.sap.com but due to limitations of the RSS feeds the actual data was not consistent for further analysis. Using the SAP Search API (this API is also the backend engine that drives the OneDX search page and when you search this site) I have collected data from Answers.sap.com from 10th October 2016 to 30th April 2017. I am happy with the quality of data however first some background about the data set I will use. The base level technical details of how I collected the data can be found here in my SAP Cloud Platform blog, https://blogs.sap.com/?p=483124
I have collected questions since the beginning (10th Oct 2016) of the new SAP community via the SAPCP and SAP Search API and extracted such items as primary tags used in the questions. This does lead to lets say user classification issues, i.e do you pick the right tag when asking a question? I am sure that you do. Although I am also sure a moderators job on this site must cover a process to retag questions on a daily basis! And as you are reading this site I will assume you have familiarity with the concepts such as tags and how this site works (or do we know 🙂 how it works)
I collected the data on a bank holiday Monday at the start of May here in the UK. It was a one time loop over the dates (Oct2016 – Apr 2017) and therefore it is now out of date. There is also obvious potential for some questions to have been updated/retagged/deleted etc. I do not work for SAP and there are certain aspects of the answer.sap.com site that I can’t collect or have access to. For example, I have the questions and the text of any answers but I do not know if answers are being accepted to these questions.
All charts were created with Lumira and I use out of the box features where appropriate. Lumira is accessing my data set in the SAP Cloud Platform. I have googled some ideas on how to present this data but data analysis is not my day job. My main primary goal was to use SAP technology and a challenge to see if I can present/analyse the data in an interesting way, let me know if I succeed or not with that objective! I’ll cover some of my conclusions at the end of the analysis. If you are interested in seeing the raw data I have collected then let me know and I’ll work out a way of sending it on to you. ( Although as mentioned I do cover the technical details in my other blog )
So with those details in mind and that again 🙂 the data is an unofficial snapshot in time of the status of this site. I’ll get to my point and …. continue.
Total Questions and Authors in the Data Set 10th Oct 2016 – 30th April 2017
In total I collected over 41k questions and 23k Authors. The Author data is based on the unique AUTHOR_ID which links to the individual profile page. If you are anything like me then you may have multiple SAP IDs which may or may not be registered with the SAP Community. I chose to count Author_IDs only so there is potential for multiple Author_IDS linked to one individual or multiple Authors depending on shared use of one logon ID :). Does that make sense! And if not let me set an example and say hello to…
The Arun Kumars
Arun Kumar is the most popular Author name in the dataset and linked to multiple Author_IDs. I have used the Author_ID as the measure in all other charts (where appropriate). In some cases that could be the same person or not. However I will be using Author in the title of the chart but linked to Author_ID, it did sound a better option at the time of analysis.
Moving onto primary tags which are part of my data set.
Top Primary Tags For Questions
ABAP Development is the top primary tag for questions
Talking of tags, I had the idea to focus on the “Using SAP.com” primary tag that would cover questions about this site since its launch. If you have an issue using this site then you can search for similar problems. If you can’t find an answer you can ask a question over here Using SAP.com Questions
I was curious to find out how many questions there would be for this tag and see if I could highlight any trends. This is what I found….
“Using SAP.com” Primary Tag Stats
A quick view of the overall totals
Overview of the timeline Oct 2016 – Apr 2017
Now the trend for questions is going down from the wild west days at the start!. What had just happened to this site on October 10th 🙂 Maybe some retagging or moving of questions had taken place as well.
I had some thoughts about this,
a) It is a good sign that there are no major peaks after the initial start of the new community.
b) Are questions being answered and is there an accepted answers to the question? Unfortunately as mentioned I do not know that detail from my data set. There is no data that I can find that indicates questions are closed with an acceptable resolution (or even just closed/deleted due to any other reason).
However closing/accepting answers to questions is an issue that generated a quite a bit of talk at the coffee corner of this site.
Link to the coffee corner. https://answers.sap.com/content/kbentry/list.html
Link to the conversation about accepting answers to questions
Steve Rumsby started that particular conversation and at the time of analysis it was going for 136 days. The discussion was created on Oct 18th 2016 and 3rd March 2017 was the last update.
c) Are people still using the site? This was another thought from various statements I have read around here and else where. I knew I had the overall data of AUTHOR_IDS (with the risk of duplicates pointing to the same user as mentioned for that metric), here is the author data by a timeline. The chart below covers all tags (and not limited to Using SAP.com tag)
The data set I have indicates a stable view of the site use. No major rise or fall in author_ids. A slide to the new year and then maybe a very slight increase via the running average line in the chart. I took the 3 day default for running average in Lumira.
A look at the overall questions by date and it indicates a very similar picture.
One option for user migration may be the availability of another forum to ask questions. I was interested in one of the top primary tags SAPUI5 for authors over the time period.
I did notice a steady increase since the start of the year (maybe new year resolutions to learn UI5 ;)) so a good sign that more people are using the SAPUI5 tag but I make no comments about the quality of these questions. If you have direct involvement in UI5 primary tag I would be interested in your thoughts on the activity the above chart indicates and availability of other forums.
Back to Using SAP.com Tag
As in a similar process to the longest lasting discussion on coffee corner, what question generated the most characters on this tag. (say what does that mean 🙂 ?) Well I used a simple character count to find the top question on Using SAP.com.
Link to the top question in the screenshot.
The question was “Has The New SAP Community Killed The Community?”, well I am not sure in every aspect but I am still using the SAP Community and I will continue to use the site.
However that triggered a thought to use HANA’s Sentiment Analysis on the Using SAP.com forum. I have not used Sentiment Analysis before so first thing was to check out an OpenSAP course
Overall table of sentiment for “Using SAP.com” tag is below. A link to Open SAP course on the subject of “Sentiment Analysis”
I have been checking the analysis and my dataset is based on a question/answer forum and the SAP analysis is “Voice of a customer” I am not sure if the use case is an exact match. As I said a the start, I googled 🙂 and some statements about “sentiment on forums works differently” and that is only one link but I found some other statements matching that one. Also another that compared many different sentiment algorithms for forums sites as well. However I don’t believe everything I read on the internet but it did make me question the value of the sentiment analysis on my dataset. However I will use the process to highlight the positive and then the negative, as this leads me to make some points later on!
Top Positive Words
*a good sign that manners and thank you are common on the site 🙂
Top Negative Words
I will pick out one negative word, and my word is frustration. The most frustrating thing for me on this site is this….
Frustrating and annoying especially browsing the answer forum on a mobile. I click accidentally and regularly! And never find my way back to where I started after I click on the “Show More” icon. I wish it goes away. I would prefer something like the screenshot below for navigation and seeing more content. It is used in the search pages of SAP Search API (OneDX). I know I will return to an expected place using the search pages.
I have up voted an idea on Ideas.sap.com for the SAP Community about navigation https://ideas.sap.com/D42410 also this one,
https://ideas.sap.com/D40252. I know from reading the comment that this idea as a whole about using Fiori will not be accepted. I voted to show I do not like the current navigation. Hopefully “+ show more” will be no more soon.
Back to my own data set and something in the text of questions that triggered my interest.
I decided to use only SQL commands and a base text analysis approach over the entire 41k questions. It seemed sentiment on so many different subjects/tags would not be of value but as I say I only have beginners knowledge of text analysis. So what could I find out about 41k worth of questions/answers with SQL and core text analysis. By core text analysis I used the “EXTRACTION_CORE” option which “.. extracts entities of interest from unstructured text, such as people, organizations, or places mentioned” source The people part of that option sounded like a possible way to see who was answering questions. However I begin with…
The Hour Of The Guru
I thought I would see how many questions open with a “Hi Gurus,” in the data set. That seems a popular opening line to any question. I used a straightforward SQL statement to try and find out how popular it was. It seemed a better fit to look for the phrase ” Gurus,” as there are variations to the theme of guru, such as “Hi Gurus,” and “Hi SAP Gurus,”.
Original SQL Analysis
Top 5 Primary Tags saying Hi to the Gurus,
Not as many as I initially expected but probably an issue in how I am looking for the Gurus in the data.
If you are using Internet Explorer then the above Original SQL Analysis will be visible any other browser you can click the arrow to see the original query looking for Gurus.
I had a “light bulb” or maybe a “Doh! that was obvious thing to try” moment prompted by Jürgen’s comment below.
I realised I should use the in built HANA search engine with whats known as a Fuzzy index with my SQL query. It now seems a pretty obvious thing to have tried with HANA from the start. As the SAP Search API was the source of my data but I ignored that for the analysis! Well something for me to learn about and try right now. As that was my intention anyway to learn new things about SAP alongside this data analysis.
So the query I came up with is below for Guru’s
From Jürgens comment a query for “Hi Experts” on Primary Tags
I moved on to identify the day and hour when most questions are asked on Answer.sap.com site.
As with my Data Geek entry trying to find the best date and time to blog on SCN. From that analysis (link below) I found it best to publish a blog on Wednesday at 13:00
So what day and hour do we need the gurus most?
Top day for questions
Drilling down into the top day Wednesday
Hour with the most questions
So calling all gurus 😉 , we need you standing by your keyboards most on Wednesdays at 10am.
Err, when is 10am for you? probably not the same 10am as me. I.e. SAP Community is a worldwide site covering many timezones. We need to co-ordinate the gurus coming together at the right time 🙂
So my dataset is GMT/UTC so work out what 10am is in your timezone gurus and boot up the laptop and logon and be ready to answer some questions :).
Core Text Analysis
The final text analysis as a reminder is based on “EXTRACTION_CORE” option which “.. extracts entities of interest from unstructured text, such as people, organizations, or places mentioned” source
I was hoping to identify actual people who answer questions by full name. As it turns out I failed to do this as the analysis picked out mostly only first names and SAP product names. As shown below. However some of the names in the list I am sure I do know the individual full name that has created most of the entries. I am impressed by the contributions they make to this site. The SAP Community site wouldn’t be the same without them and actually I didn’t need any analysis to know how much they do contribute ;). I left the SAP product names in place and I am sure you know some of the real full names below as well. This is what the HANA text analysis identified. Who is this BADI in amongst us though 🙂
Isn’t it Ironic
During the process of putting this blog together it actually triggered my first question on Answers.sap.com. I used Lumira to analyse the data in my trial SAPCP via opening a database tunnel – technical info here. This does not work consistently at the moment and also I can’t connect via Eclipse to my trial account. It has delayed me completing this blog.
For my question, I have actually changed the way I comment on this site and that is due to this data analysis. That change is thanks to Diego Lother and the way he contributes to the site (not sure if mentioning people works on this site but I will try this @diego.lother ). Diego’s full name(well at least first name and surname ) appeared in my text analysis process and that is different to others. I was curious to see why and to my knowledge (I have not gone through all of his content) Diego uses his full name every time he answers a question or contributes to the site. It seems kind of obvious to me now as others use first name and sometimes no name at all. I was initially hoping for AUTHOR_ID to uniquely identify users but alas that was not the case. However I will try and use Diego’s method myself and use my surname on content/comments, well apart from any future contributions to coffee corner! Not that I propose I will try this analysis in the future but it seems a good method to use when commenting on this site. I am curious though if Diego manually types his name every time or uses some sort of short cut key signature method? If you do read this blog Diego then can you answer that for me? Or maybe I should ask a very specific question to Diego on the Using SAP.com answer/question forum 😛
Validation Of Data And Missing/Deleted Questions
As mentioned in the Data Quality section at the top I was conscious to validate the data set and ensure at least what I had was valid. As I do not work for SAP I relied on the search API and the actual SAP Community site to cross check my data. It is a snapshot from the 1st May 2017 and during the process of writing the blog (and delay due to Lumira/SAPCP access issues!) I ran some random checks on the data. What I did notice was missing/deleted questions is common, so I took some time to ensure a random sample was at least accurate and valid for a snapshot collection.
E.g. Maybe a bit technical so bear with me, I took 1000 URLs and ran that through a unix command to check for the the HTTP return codes. Out of the 1000 then 20 questions 2% had gone missing. I still had valid data though in a snapshot sense. For example this URL is no longer found on this site.
However it exists in google cache so I am happy the data set is valid as much as I can prove it to be valid 🙂
The 2% out of my sample seemed high though for deleted/removed questions. Although I do not know what if any the average for deleted questions should be on a forum site.
The phrase “Steady as see goes” seemed appropriate initially from the stats. However I would clarify, I see that in a free from fluctuation sense and not stability of the site. That is linked to the running average of questions/authors and overall statistics remaining flat. Also I was slightly disappointed missing a key metric of analysing answers to questions. While my intention is always to use SAP technology and not necessarily linked to SAP Community site, I do enjoy the process of collecting/analysing data. I will keep the data set for a while longer to see if I can improve my text analysis skills :0 or any other related SAP tech as well.
I have found and continue to find some of the site functionality frustrating as well. I have a lot of time for the SAP Community and taken out more than I will put back in, so I will be around for as long as I work in the SAP field.
Thanks for reading and I am left with one thing to do and that is to sign off 😉