Skip to Content

On March 22, at about 15:00 GMT, Google was finally allowed to enter, crawl and index the New SCN.  This marks another step foward in the roll out of the New SCN and, given Google’s importance as a source of traffic for the site, something my colleagues were waiting for ever since the New SCN launched ten days earlier.

For me, I have to admit that I was very excited because it’s not everyday you get to open the floodgates on a site with such a massive amount of quality content and witness its re-indexing by Google: it meant validating (hopefully) the months of planning that went into ensuring that SCN would be back up on it’s feet quickly with a minimal interruption to the search experience the community enjoyed before migration.

Here’s a rough scope of what this migration included and the dual challenge faced for SEO:

  • Migrating three sites (and three systems) to a single and new URL:
    forums.sdn.sap.com + weblogs.sdn.sap.com + www.sdn.sap.com –> scn.sap.com
  • Migrating three sites worth of content into a single, unified one:
    • 2million+ discussion (forum) threads and 20million+ messages (replies)
    • 25,000+ blogs
    • 25,000+ library assets (documents and videos)
    • 1,100+ webpages
    • New URL structure
    • Millions of redirects activated (millions)

I’m not going to say that things went smoothly but my point is that the numbers are huge and so is the task of getting all those millions of SCN search results updated to their new URLs, that also happens to be on a brand new new sub-domain that was never crawled by Google. Updating the search results is organic and all we can really do is suggest to Google where and how to crawl the site: something that I have likened to waltzing with an elephant over the past few months.

Why not open sooner?

We had good reason to wait. Google crawlers add a considerable amount of load to any website: there were well founded fears that this additional load from external search engine crawlers could potentially bring down the site. (Crawlers, bots and spiders can represent anywhere from 30% to 50% of site traffic.) This fact was simply undeniable and we had to accept it despite our eagerness to open up to Google and other search engine crawlers as soon as possible. With a little negotiation and creative thinking by our IT team, we finally flipped the switch last Thursday*.

ℹ My IT colleague Elad Rosenheim posted a nice blog on the topic of load if you want more insight: Load Testing as Science and Art

(*I can’t say the timing was great: I got notice just as I was preparing dinner for my screaming 16 month old and at the start of my weekend so I had to juggle feeding, cleaning and working on the computer to submit to Google and other search engines the relevant information all at the same time 😥 )

 

So we opened the doors and then what?

In short, this:
     Pages Crawled by Google - New SCN.jpg

As you can see, Google started crawling immediately after we allowed it and submitted our new sitemap. Within hours there were already thousands of pages indexed, complete with their previews visible from the search results. After three days, most of our content has now been crawled and indexed, and even new content such as discussion appear within minutes of being posted:
     Sample Recent Discussion Post - New SCN.jpg

Our space overview pages for popular topics like “SAP Mobile” and “ABAP” are ranking on the first page of results too so that’s really positive given that we offline for ten days and migrated these pages to a new sub-domain!

The redirects also worked in our favor: before opening the site, I could see that Google was already recording the redirections and building it’s own starting points ahead of the great crawl.  Every day the number of URLs listed–but not indexed–grew: 135 on day one, 160k two days later, 400k by the end of the week and 1 million a week after launch.

See for Yourself (Search Tips)

Have a look at the New SCN search results.  The site operator is a handy way of limiting your search results to a specific domain:

Search Prefix/Format Description Example Query Strings
site:scn.sap.com [query] Displays results from SCN only site:scn.sap.com ABAP
site:scn.sap.com/thread/ [query] Displays results from SCN Discussions (forums) only site:scn.sap.com/thread/ time value
site:sap.com [query] Displays results from SAP domains: SAP.com, SCN, Help, EcoHub, etc. site:sap.com Mobile
site:scn.sap.com [query] inurl:blog Displays results from SCN Blogs only site:scn.sap.com/ workflow inurl:blog

Next Steps

There’s still lots of work to be done. We have to update incoming links we can influence (Like Wikipedia: please help us!), tweak our crawl instructions to Google, fix any broken redirects, slowly shut down the old sites when we’ve got the most possible of their redirects possible…yup, there’s still lots to be done.

You, the community, have done a fine job creating lots of new quality content on the New SCN so you should just keep doing that while also sharing, bookmarking, liking and rating any content you have an opinion about: all these little extras help Google to identify good/bad content and ultimately help users find the content.

     Social - New SCN.jpg

Happy searching!

To report this post you need to login first.

10 Comments

You must be Logged on to comment or reply to a post.

  1. Elad Rosenheim

    That “Google Webmaster Tools” thingmajig is more impressive than I imagined…but I guess this isn’t Google Wave or anything, we’re talking here the main business case for Google Corp.

    (0) 
  2. Jarret Pazahanick

    Great update Jason and just curious on what timeframe would expect for example to have the ~50 blogs I have written searchable on google. Should I expect days, weeks, months?

    (0) 
    1. Elad Rosenheim

      I guess Jason would be on it when he sees it. Basically Google went over a large portion of our site already, and is now slowly closing some gaps I think. We could definitely help to expedite this process, to make sure it’s days…

      (0) 
    2. Jason Lax Post author

      There are already about 36 so far. I found them using this query: site:scn.sap.com/ “Posted by Jarret Pazahanick” inurl:blog

      That’s a pretty good start 🙂

      We’re now about to go into a new crawling phase where we will tweak our instructions and expand our sitemap to cover more and recently moved/created content. (Kuddos to Elad Rosenheim who helped with this.)

      ❗ At the same time, we can’t expect Google to index 100% of site content we expose because it has its own evaluation system for determining what to index: is the content fresh, linked, good quality, etc.

      (0) 
  3. S A

    All said and done, you should have went live after checking everything is working properly, searching for the required content is a nightmare in the new site! My two cents..

    (0) 
      1. S A

        Thank you. atleast the custom search works, appreciate it. Hope the original google search gets restored soon.

        (0) 

Leave a Reply