Skip to Content

I’m not really sure how I ended up at Innovation Weekend 2010 in Vegas.

Background and history 

For me it goes back to Hackers night 2009 in Vienna – when Craig Cmehil (our very own Simon Cowell) and I were talking late at night, not long before Craig, Tom Jung and Duane Chaos and I got kicked out the MesseCentrum building. We were discussing the RIA hackers night, which I felt had turned into a late night learning session based on our own laptops. And whilst it was still a cool thing to do, I for one felt that it was missing something. Mostly, real hacking and a deliverable. And real purpose. And what I didn’t realise at that time was that Marilyn Pratt had already taken a load of hackers into the BPX slam and those guys were really hacking.

Craig and I kept in touch and whilst I don’t know how much I shaped the end product – he probably already had a plan, he usually does – the end result was the Innovation Weekend. An all night hackers night running from Sunday to Monday. I wasn’t able to attend in Berlin and as luck had it, the cheap tickets to Vegas were on Saturay, meant that I was just about recovered from jetlag by Sunday at 1pm.

Now I have to take you back again 6 months to the SAP Inside Track in London that Darren Hague kindly organised. I met a PhD student called Sarah Otner, who was doing a PhD on the recognition system in the SAP Community Network. I loved her passion and interest in the system and she was really frustrated, because she needed data in order to do the mining she needed to do to write her thesis. SAP were blocking her desire to get the data out, either for technical or legal reasons. I don’t think that it was an orchestrated attack – but rather that it was the typical problems that you see in a large corporation.

I saw her in Berlin last week and she looked slightly downtrodden – no progress on data in the preceding 6 months since I saw her in SIT London. I felt that for SAP there was no downside – free research and exposure for one of the most exciting community networks in the word. 

Fast forward to Vegas

… and I found myself in the amazing Innovation Weekend masterminded by Marilyn Pratt and Craig Cmehil. Without those guys it would be nothing.

They had prepared 8 BPX focussed business cases and one of these was as follows:

8. “Physician: Heal Thyself”: Improving the SCN from within!

Posted by:  [Sarah Otner  | http://www.sdn.sap.com/irj/scn/bc?u=oczhwhdeywc%3d] GOAL:  Improve the recognition systems of SCN by examining the historical data
–          Does the SCN recognition system reward the right kinds of behaviors and contributions?
–          What’s the 

real

 value of being a Top Contributor?
PROBLEM: Initial attempts to pull the source data already available on SCN into Excel failed as they only returned 10 lines and the same 10 lines upon each request (a problem when one Top Contributor  table has 17,000 individuals).
CHALLENGE:
–          A database of community members and their activity year-on-year for as many years as is available.

–          Scrape the Contributor Recognition Program, the Top Contributors’ lists, the Topic Leaders’ lists, and the Mentors’ rosters into a format easily manipulable (by me! J) for analysis 

What next?

Fellow SAP Mentor Thorsten Franz turned up at the table along with a number of other great individuals. And it became clear that this was a pretty easy technology challenge, provided we could get the data. So I set about getting the data whilst Thorsten, Arun, Laurant and others worked on analytics and presentation.

Mounting a DOS on SCN (aka making friends and influencing people)</p><p>So it turns out that the only way to get points data out of SCN is to read the RSS feeds on the contributor pages. Only the contributor page version is broken. The company version does however work, and it is possible to see points – by Company by Person by Year by Development Area. Can you see where I am headed?</p><p>So if you want to find out the contributors for Bluefinsolutions.com – for 2010 and for Mobile, you can go here:</p><p>feed://www.sdn.sap.com/irj/sdn/topcontributorsrss?periodid=y10&minimumpointscount=20&areaids=g&organization=bluefinsolutions.com </p><p>So all I needed to do was to write a script to get this for all companies, all years and all points areas. Simple, right. Here’s the bash script to do it:</p><p>for year in `cat ../year`; do for devel in `cat ../devel`; do for comp in `cat ../companynames`; do wget -O $year,$devel,$comp ‘http://www.sdn.sap.com/irj/sdn/topcontributorsrss?periodid=’$year’&minimumPointsCount=20&areaIds=’$devel’&organization=’$comp; done; done; done</p><p>Note that I downloaded the years, company names and development areas using the same techniques and put them in files – and note that the filename is cued to be part of the CSV name. But… I forgot to escape the & by surrounding it in inverted commas. So in doing so, I opened up 2500 threads (I used the top 2500 companies). And SDN died for 3 hours.</p><p>After SDN came back up I fixed my script and parallelised it by year – so just 8 threads running. It took 10 hours to download all the data into some 180,000 XML files. Thankfully, we have lots of CPU power these days. So I wrote some scripts around that too.</p><p>First, files that are 409 bytes long don’t actually have any data in them. So we strip them out the list of files to process as follows:</p><p>for a in `find -not -size 409c -print | sed ‘1d’| cut -c3-100`; do echo $a; done > ../filled </p><p>And then we strip the XML out, turn it into a flat file and append the filename that relates to it, to each line.</p><p>for a in `cat ../filled`; do cat $a | sed ‘1,9d’ | more | sed ‘:a;N;$!ba;s/</title>
/,/g’ | sed ‘s/<title>//’ | sed ‘:a;N;$!ba;s/</link>
/,/g’ | sed ‘s/<link>//’ | sed ‘:a;N;$!ba;s/</description>
/,/g’ | sed ‘s/<description>//’ | sed ‘:a;N;$!ba;s/</pubDate>
/,/g’ | sed ‘s/<pubDate>//’| sed ‘:a;N;$!ba;s/</scn:rank>
/,COMPANY/g’ | sed ‘s/<scn:rank>//’| sed ‘:a;N;$!ba;s/</item>
//g’ | sed ‘s/<item>//’| sed ‘:a;N;$!ba;s/</rss>//g’| sed ‘:a;N;$!ba;s/</channel>//g’ | sed ‘1d’ | sed ‘$d’ | sed s/COMPANY/$a/; done >> ../fillout.csv</p><p>This gives us a bunch of data that looks like this:</p><p>Jon Reed,
https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.sdn.businesscard.sdnbusinesscard?u=glyawsx5bmi%3d,80,Tue, 10 Feb 2009 2:43:19,1,y08,P,jonerp.com</p><p>All we do then is convert the data and replace some years and development areas, and we’ve got a nice big CSV file with people by year, development area and company.</p><p>The rest is easy

The rest of our demo was easy – we uploaded the big CSV file into SAP’s cloud BI Service – http://bi.ondemand.com and used SAP BusinessObjects Explorer to look at the data. We also used the new beta BUPA dashboarding service which worked pretty well.

Conclusions

Well, we have done what we set out to achieve. We have 7 years of SCN data explorable by most of the metrics that Sarah was looking for. There are some things that were hard to do – especially scraping the master data from SCN business cards and that is a work in progress.

But what we’re hoping, and there’s a number of us that share this vision, is that as the SCN team start to realise the value of analysis by students of the data, we are able to break down the walls of getting more detailed information available to people like Sarah who want to run PhD theses into the community.

Huge thanks to Marilyn and Craig for making it possible. To Kai and Mark and Chip and everyone from the SCN team who I inconvenienced. Sorry about that. 

To report this post you need to login first.

20 Comments

You must be Logged on to comment or reply to a post.

  1. Gregor Wolf
    Hi John,

    thank you for sharing the story of innovation weekend. Great to see how a bit of wget + sed magic can make a difference. I hope with you that the SCN Team provides better ways for extracting data out of SCN. And yes, the business card is a mess :-(.

    I’ve talked to Sarah at SAPTechEd Berlin too and pushed her to add her case to the Innovation Weekend Busines Case’s. I’m glad that it worked out so well.

    You, Thorsten and the rest of your team did an amazing job!

    Best regards
    Gregor

    (0) 
    1. John Appleby
      Gregor,

      I’m sorry I didn’t credit you already and I’m sorry for all the other people I have probably not credited either. Thanks because without you, there would have been no business case 8.

      Regards,

      John

      (0) 
  2. Sarah Otner
    Thanks to everyone for giving my eleventh-hour submission a whacking great effort — and to Gregor W. for giving me the push to submit it! I can’t wait to see what the results look like, and to have a first pass at some data analysis.

    I was particularly touched by John’s evaluation of my drive and my frustrations: spot on. Moreover, I would like to underscore that SAP did not purposefully “torpedo” my research; some of the difficulties to date have arisen due to the different priorities and timelines that function in academic research versus *large* industry. However,after TechEd Berlin, I am more excited than ever about making SCN my thesis case. (<– Geek Girl alert!)

    So, “watch this space” for news about my research, and keep your fingers crossed for progress and juicy results! Thanks again,

    ~Sarah~

    (0) 
    1. Zal Parchem
      Hello Sarah – what is the objective of your thesis?  I have started a slice-and-dice on some info in SAP B1, and was wondering if your looking into classifications by application, type of blog written, author being SAP or not, etc?  Any quick points on your outlook?  Regards – Zal
      (0) 
  3. Ivan Femia
    I like your project, well organized data on SCN user easy to browse!!!

    Is it possible to see the final result on BI.ondemand.com?

    Congratulations guys

    Regards,
    Ivan

    (0) 
      1. Marilyn Pratt
        Well I don’t know much about the privacy laws (I imagine Anton raises some good points here but I’m not the proper one to address them) but as far as the technology is concerned I just connected with Ilan Frank who is product manager for the BI on demand piece and I’m sure (as long as it is legal) he would love to help you facilitate the technical part of what you were trying to do John during the Innovation Weekend showcasing SAP technologies and hacking away to try to create useful outcomes.
        Should be easy to connect the 2 of you here on the exhibit floor/clubhouse of SAP TechEd.  Let me know.
        (0) 
  4. Anton Wenzelhuemer
    Although I like the technical challenge been taken here and although I think Sarah should be supported by SAP, I think what you proudly present here is pretty illegal in a lot of countries with respect to privacy regulations. SCN should have measures in place to prevent such things.
    Moreover I wonder why a hack to bring down SCN easily is openly blogged on SCN.
    But maybe you just had too little sleep to consider the consequences because of the 30 hour marathon of the great Innovation Weekend.

    anton

    (0) 
    1. John Appleby
      Hi Anton,

      Would love to hear some specific feedback on this because this is all publicly available information that we aggregated. No login to SCN is required.

      Regards,

      John

      (0) 
      1. John Appleby
        Sorry I forgot to add that this isn’t the code that I wrote that brought down SCN. The code above will run just fine and pose very little traffic.
        (0) 
        1. Jim Spath
          The “ho hum” is more about data mining.  Since SCN exposes nearly everything to spiders such as Google, everything John and team aggregated has already been seen by the search engine Cos.  That’s why I’ve heard people say they use Google to find things in SDN instead of using the internal (TRex?) search engine.
          As to any controversy about showing how easy it is to get these data, if it’s illegal in some countries it’s probably sufficient to say so.  Not that I’m a lawyer either.
          Jim
          (0) 
      2. Anton Wenzelhuemer
        to my knowledge, in my country for instance, you need an official permit(datenverarbeitungsnummer; sounds SAPish doesn’t it?) to store personal data on some storage and to aggregate and process such personal data, no matter what sources the data comes from. afaik, this is applies even to transcribing the public phone book to some excel file.

        anyway, I am neither the police nor a lawyer, I just feel slightly uncomfortable with this.

        apart from that, have fun at TECHED10 LV.

        (0) 
      3. Sarah Otner
        For clarity: The tables of data which I asked for help in aggregating are available on SCN.  I asked SAP for help with accessing the data in a more manipulable format, any meta-data available, and any relevant organizational variables not necessarily already published.

        Therefore, what John & team facilitated was the primary level request, and hopefully SAP will be able to help with the second request.

        Thanks to all!

        (0) 
    2. Sarah Otner
      Anton – you are correct that data protection statutes govern the collection and use of such information.  In addition, as a researcher, I subscribe to a code of ethics that similarly guides my conduct.  It is my firm conviction that SAP have only delayed giving my project the green light as they subject it to rigorous scrutiny and compliance with their strict privacy regulations — which is why I have not yet seen a bit of data outside the published tables, and why any results I would seek to publish would follow similar regulations.

      I hope that together we soon will find a solution!  It will be my pleasure to deliver valuable results to this fine community.

      ~Sarah~

      (0) 
  5. Community User
    OK as much as I really enjoy giving John a hard time about it – truth be told he was not the reason for SCN going down the other night simply good timing is all 🙂

    As for SAP officially supporting Sarah, I’m sure Sarah will attest to the many wonderful people she has met along the way and with a little bit of luck she will be overloaded with data here thanks to some meetings and discussions that took place in Berlin – she won’t be able to publish all of the data she will hopefully be getting soon but she will be able to hopefully move forward very quickly.

    As for the legality of it, nothing of this data effects data privacy as the data does not fall under the “personal data” in that sense and John took advantage of something we added to SCN back in March – RSS Baby! RSS! – we could down into all of the legal debates about it but the paragraphs I have from legal experts kind of trump everything for this particular set of data. What Sarah needs on the 2nd level is where things are tricky and taking so long to figure out (because we are not legal experts and having to keep asking questions and getting answers) and the part of the reason this first level of data is considered OK is because of nature of how you register and the “display name” you choose to give when doing so (as I have been informed).

    Sorry to spoil the fun on the “I hacked and brought SCN down” and “we stole SCN data” but didn’T want you all to lose too much more sleep over all this 😉

    (0) 
    1. Marilyn Pratt
      Oh man, trust Craig to do a reality check on #geekbravado LOL.
      What I think we all witnessed was incredible passion and intelligence (kudos John) being implemented real-time to help the community (and our management by extension) and the newest member of our SCN family, Sarah.  Thanks Craig for the intro to another “Jersey Girl” and I hope we get to work together quickly.
      (0) 
    2. John Appleby
      As the story goes, the myth of why SCN went down or whether the data was legal was just adding to the fun of the weekend.

      The real point is that we seem to have moved Sarah on a few steps down her journey of getting all the real data she wants.

      The rest of that data should be scrapable from the SCN business cards. Do I see a part 2 coming on?

      (0) 
    3. Sarah Otner
      As usual, Craig is right. 😮

      I absolutely will agree that SAP is people-powered, and would like to borrow Marilyn’s “family” language. I always enjoy myself AND learn something when interacting with SCN members, which helps me to love my job.

      A great big “thank you” to everyone who has helped me – and who will help me; I am lucky to have had the “Otto Gold experience”!  And now, back to the books…

      (0) 
    4. Otto Gold
      Thank you, people, a ton!
      Sarah is so nice and I really wanted to help her, but didn´t know how to do what you did (and am sure I wouldn´t make it alone). I am sure it was a great challenge and you made it!
      I hope we will see some result from her side and will get something insightful about our community/scn which could help us with bettering this place:))
      All the best,
      Otto
      (0) 
  6. Laurent BRIDE
    Thanks John for sharing your experience on the Innovation Weekend, it was indeed a great moment!

    To display the data, we used a prototype called Exploration Views (project codename BUDA) that leverages the SAP BusinessObjects Explorer engine, CVOM graphic libraries and BIOnDemand platform.

    It should be available on the http://www.sdn.sap.com/irj/boc/innovation-center website the week of November 15th.

    All the best!
    Laurent

    (0) 

Leave a Reply