Opensearch experiments: Trying to increase my SCN relevance . . score
- Standard SCN Search
- Using SQL Anywhere with the Opensearch RSS feed
- Comparing standard SCN search with my SQL Anywhere Opensearch service
- 3 Blogs and 3 different attempts to improve relevance:score ranking.
- Action Plan
- The results
- Hold your horses, what just happened.
- My SQL Anywhere search results in an SVG image
- One final thing to do
- Updating tags 5/7/2015
When Vasco Miranda posted his update on Opensearch http://scn.sap.com/community/about/blog/2012/11/13/opensearch-on-scn it gave me an idea to try and get a Data Geek Challenge entry out of the Opensearch tool. I had tried out Opensearch previously after reading Vasco’s original blog and knew I could use the tool via SQL Anywhere web services, procedures and functions. Therefore I thought about some search topics around the Data Geek theme. In the end I could not get any dataset I wanted for a Data Geek entry however I attempted to find out more about one of the ratings in Opensearch and that rating was “relevance:score”.
When searching in SCN the default sort option is “Relevance”. So first I attempted to confirm that my use of the Opensearch tool matched this default sort option and ultimately if I could work out how to improve this rating for my own blogs/contributions on SCN. An improved rating would in theory get my content to appear sooner in a specific SCN search result.
Standard SCN Search
I would base my attempts on the search term –data geek. A screen shot showing the standard SCN search page with default sort option “Relevance”
Using SQL Anywhere with the Opensearch RSS feed
I do not use SQL AnyWhere in my day job and was not directly aware of any Sybase products previously until I found Eric Farrar’s post on using SQL Anywhere with Openlayers on SCN. http://scn.sap.com/community/sybase-sql-anywhere/blog/2010/11/02/using-sql-anywhere-with-openlayers-part-2
I have a hobby & part time general interest in spatial topics. So after trying out SQL Anywhere I found that I really enjoy using it. It’s a superb database with so many possibilities with such a small footprint. I have a copy of this database running on a micro instance on AWS. Also I have found Sybase documentation really useful and do get most of my information from that source as well as SCN. So with that in mind what follows is a way to setup SQL Anywhere to use the Opensearch’s RSS output format. If you spot any improvements or errors then I would be happy to hear from you.
Create an SQL Anywhere function to read the OpenSearch output
Create FUNCTION "DBA"."openscnsearch_f"( in query long varchar ) returns long varchar url 'http://search.sap.com/opensearch?' type 'HTTP:GET'
Create an SQL Anwhere procedure to use openxml with the function just created.
CREATE PROCEDURE "DBA"."openscnsearch_p"(inout tag long varchar) result ( title LONG VARCHAR, lk LONG VARCHAR, creator long varchar, sapkey long varchar, score long varchar, rank int) BEGIN select *, rank() over(order by score DESC) FROM OPENXML( dba.openscnsearch_f( 'data%20geek&format=rss&extended=label&itemsperpage=900' ), '/rss/channel/item', 1, '<rss xmlns:atom="http://www.w3.org/2005/Atom/" xmlns:relevance="http://a9.com/-/opensearch/extensions/relevance/1.0/" xmlns:sap_it="https://sap.com/it/opensearch/extensions/1.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" />' ) WITH( title long VARCHAR 'title', lk long VARCHAR 'link', creator long varchar 'dc:creator', sapkey long varchar 'sap_it:metadataAttribute[sap_it:label="Keywords"]', score long varchar 'relevance:score' ) order by score DESC ; END
The OPENXML in line 3 is the key to query the –data geek– search term with the Opensearch parameters.
*if you can improve the [sap_it:label=”Keywords”] line then let me know. With that line I get the word Keywords label as well as all content. Although I did work around that issue in the way I use the data.
Create Web Service to call the procedure just created.
I always use the SQL Anywhere wizard to create web services, by turning off any security and
**one thing to note is I have setup the ‘dd’ parameter in the screen shot however I do not make use of this option. I will be intending to make selections based on url parameters.
By using such a big number for the Opensearch results (900 in bold in the OPENXML code above) was to get around paging through data. If the search term returns more than 900 items then Opensearch would require extended parameters to page through the results. Currently the –data geek- search term returns less than 900 results.
Example output my SQL Anywhere Opensearch web service
Below is a cut and paste of the relevance:score section in my web service The arrow is pointing to the relevance:score in the image and the arrow does not come with the web service 🙂 . I would attempt to improve this score for my blogs.
Comparing standard SCN search with my SQL Anywhere Opensearch service
Once I setup the SQL Anywhere service I decided to compare the results with the standard SCN search to confirm that they match up. I used SAP Lumira to compare the SQL Anywhere output after converting the HTML into an excel file (cut and paste the HTML into excel)
First I compared the first 10 results sorted by relevance:score
SQL Anywhere results as shown in Lumira
This matched the first page (top10) on the SCN site search. A screen shot of some of that page below.
My Data Geek relevance:score Ranking
Next I focused on my relevance:score by ranking the results with SQL Anywhere. I processed the output with Lumira.
The results above of 63/67/70 would mean, in theory, my blogs should appear on pages approx 7-8 on the SCN search results.
I paged down and found my blogs between 61-70
All 3 of my blogs are on page 7 for that Data Geek search, not sure about you but I don’t think I would get to page 7 on most searches. Although this is just an example and I am only searching – data geek- as part of this blog. So my objective is to try and move my blogs to a higher position and therefore in theory more views if people are searching – data geek-
First I tried to work out how the “relevance:score” is calculated. I was hoping that via the data and some manipulation that I may be able to determine how it is calculated. I failed to do this, as the ranking for my blogs was not clear to me. Searching SCN for this relevance rating and it is outlined in the “How to use SCN search” blog post here from Martin Stockwald.
Also Jason Lax has some great content on Search Engine Optimisation (SEO), just one example here http://scn.sap.com/docs/DOC-41706 and Tim Guest has some recommendations for Google http://scn.sap.com/community/about/blog/2012/04/04/how-to-optimise-your-blogs-on-scn-for-google-searching
(and there is more great content on the topic that you can find by searching 😛 )
However I decided to ignore this great information. Alright I admit, not completely ignoring it but see if I could make a difference to the SCN search relevance score by changing my 3 data geek blogs in certain specific ways. First I took a look at the actual relevance score. It is shown below for my three Data Geek blogs, not much difference between them but something is obviously behind the rating.
The blog with a ranking of 70 is actually the one with the most views, comments, bookmarks and likes. All of these were part of my original thoughts as to what could also influence the relevance rating. So after ruling all those out, I went back to checking my own blogs for any subtle differences.
So my plan was to take 3 different recommended ways to improve “prominence” in search results and apply them individually to my 3 data geek blogs. Then stand back and see the results. I have never done any SEO before so could I pick the right areas to focus on?
3 Blogs and 3 different attempts to improve relevance:score ranking.
As I have 3 blogs in the same space on the same data geek subject and all created without really thinking about SEO, I thought I could use them as an example to see what works best for SEO only on SCN.
1) Tags- Keywords
Tags/Keywords were mentioned as a way of increasing prominence. So I took the top 10 search results for the data-geek search and analysed their SCN tags. Now in Vasco’s blog he does mention a team of search administrators that rate relevant content and as such some of these results are part of a manual process and not part of any search optimisation by the user. However what did stand out was two of the top 10 do not have any tags at all (obviously there are other ways to add keywords within the content but I focused on tags). And checking further down the result list some of the lower ranked content had many tags. So I jumped to the theory that adding tags does not increase your relevance:score for searching on SCN. Obviously there are other advantages to tagging & searching but read on to find out how I got on when I added tags to my blog.
2) Links – cross linking between pages on SCN
After reading some of the recommendations for SEO, one thing that did stand out was that the two blogs with the higher ranking actually have links to other SCN pages and my lowest ranked blog did not. This method was therefore in my mind the way to go and improve my ranking in the search stakes.
3) Updated Content
Finally I chose to simply update some content as that also is a way to increase your prominence on search engines.
There may be more viable methods to chose but I only have 3 related blogs so I chose the above methods for my experiments.
I formed the following action plan from my chosen ways to improve prominence in search results.
1) Add some links to my lowest ranked blog “Analysing the @SCNblogs twitter timeline”. These links would be to the first and second ranked blogs for –data geek. Although this would increase the use of the data geek search keyword in the actual content of the blog. The top two ranked blogs are below. They are both relevant to the blog as I have joined the challenge and not some random linking and useful for my experiment.
“How to Join The Data Geek Challenge”
“Get Your Data Geek Badges Today!”
(A side issue is that this blog will also have these links and they a relevant due to my objective of this blog – in my opinion 🙂 )
2) Add tags to my second rated blog “Can your birthday help you play football in Japan or England”, but what tags to use? Here I used the SCN platform and chose appropriate tags from those provided, based on my chosen Data Geek search term. I have always ignored these tags in the past on the blogs I have created.
My chosen tags “sap_lumira data data_geek data_geek_challenge geek”
__ 3) The first two blogs will be updated and appear at the top of the SCN default blog roll as part of the Jive platform functionality. However one final thing to do is an update of the content in my top rated blog “SAPVISI Data Geek Challenge: SDN Points”. It was a minor edit as I added an image of the same data set but using the SAP Lumira Cloud and viewed via the iPad. I was not too hopeful for this method to be a big improvement but thought I could measure any change in relevance:score.
After making the changes I decide to wait a while for anything to kick in search wise on SCN…e.g. I do not know how often any content is indexed/crawled and then given a relevance score.
Well actually, I couldn’t wait so ran the query and all my content had gone down one place. Someone had posted a Data Geek Challenge blog while I was working on this blog. What a result! I found it quite funny and then wondered if I had picked the wrong items for the keywords in Opensearch as none appeared in the results.
Ranking immediately after altering my blogs showing an improved score but lower ranking (I had altered them late on a Saturday evening)
The tags added to “Can your birthday help you play football in Japan or England” did not show up in the search, just the “Robert Russell and sap_lumira” which were the original default values. So I decided to wait a bit more to see if Opensearch would pick these up.
Hold your horses, what just happened.
As I was about to finish typing my blog into the SCN Jive editor I did a double check of the data geek search. My blog “Can your birthday help you play football in Japan or England” has just shot up to number 13 in the search charts. Not quite a top ten hit but an improvement to page 2 in the search results. Also I did doubt that adding tags to the blog would have any impact at all. Obviously more reading for me on the SEO blogs/information mentioned earlier. Also the tags now appear in my SQL Anywhere web service, so something occurred with opensearch on Monday after my changes late on Saturday
My SQL Anywhere search results in an SVG image
I thought I would try one more thing. As SQL Anywhere supports SVG, then I thought I would like to have a live ranking leader board of my data geek content on this blog via an SVG image. When this blog is published it will hopefully appear near the top of any search for – data geek although there are a few potential reasons for it not to do so from what I have learned so far on the SEO topic (e.g. data geek is not in the title, the phrase may be related to the SAP Lumira space rather than About SCN, so I will have to wait). So with SQL Anywhere I have linked the above Opensearch search to an SVG image with the results below. The Jive platform appears to support SVG, although you need a modern browser and that external internet images are allowed where you are. As this image is a link to my micro instance running SQL Anywhere I could stop it at any time. At the time of writing it does show 3 blogs of mine and their ranking in the search results for -data geek limited to a top 200 search.
The format of the image
Ranking in search: Title of blog: My name: relevance score.
A static image of the SVG web service as I will eventually remove the generated image above.
SQL Anywhere Opensearch & SVG image procedure
create PROCEDURE "DBA"."openscnsearch_svg_p"(inout tag long varchar) result ( html_string LONG VARCHAR) BEGIN declare local temporary table svg1 ( title as long VARCHAR, creator as long VARCHAR, score as DECIMAL, rank as int ); declare local temporary table svg2 ( title as long VARCHAR, creator as long VARCHAR, score as DECIMAL, rank as int, r as int ); call dbo.sa_set_http_header( 'Content-Type', 'image/svg+xml'); insert into svg1 select title, creator, score, rank() over(order by score DESC) FROM OPENXML( dba.openscnsearch_f( 'data%20geek&format=rss&extended=label&itemsperpage=100' ), '/rss/channel/item', 1, '<rss xmlns:atom="http://www.w3.org/2005/Atom/" xmlns:relevance="http://a9.com/-/opensearch/extensions/relevance/1.0/" xmlns:sap_it="https://sap.com/it/opensearch/extensions/1.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" />' ) WITH( title long VARCHAR 'title', creator long varchar 'dc:creator', score long varchar 'relevance:score' ) ; insert into svg2 select title, creator, score, rank, rank() over ( order by score DESC ) from svg1 where creator = 'Robert Russell'; select string('<svg xmlns="http://www.w3.org/2000/svg" version="1.1">', list(string('<text x="10" y="', (r * 15 ) , '" fill="black">', rank, ' : ', title,' : ', creator,' : 0', score, '</text>')),'</svg>') FROM svg2 ; END
I changed the web service related to the above procedure to a raw output type to allow the HTTP header to be set at line 16 in the procedure.
I did try and reduce the 2 temporary tables to 1 but was unsuccessful as I needed these to preserve the ranking of the search. If you have any improvement suggestions, please let me know, and my assumption from the help pages is the internal tables are created per connection. In tests it works, however I may stop my AWS 590Mb micro instance at any time which is serving the image via the SQL Anywhere web server. So the entire process is done by SQL Anywhere, a fantastic (much more than a) database. A free developer edition can be downloaded from here http://scn.sap.com/docs/DOC-31795. I previously downloaded version 12 which was the free web edition of the database and use this on AWS. The FAQ for that is here http://www.sybase.co.uk/detail?id=1057560. However just trying that link and it takes me to the developer edition, so some reading for me to catch up on…..
One final thing to do
And that is publish my blog at 13:00 on Wednesday as per my data geek blog “Analysing the @SCNblogs twitter timeline” plus now I must remember to add tags before posting this blog in an attempt to improve its search ratings.
Updating tags 5/7/2015
As part of the 1DX program the blogging platform will be all about tags and not spaces and there will be an opportunity to update blogs with primary tags which reminded me of this blog and my experiments with tags on the current platform. All my blogs potentially have a primary tag taken from the spaces I created them in, but I am not sure about all the extra user/product tags added to my blogs. As I realised some of my blogs did not have any extra tags, I thought I would add some and do a final test of the “random” search (as per Jason’s comment below) of SCN.
After I had updated my blog Data Geek Challenge 2: Can your birthday help you play football in Japan or England with extra content about the Women’s world cup semi final between Japan and England I noticed the update had a negative impact on the blog in the search results for “data geek”. The blog had moved down even though I had not touched any of the tags as part of my update.
So I took the opportunity to update all the blogs with some tags related to data geek yesterday and now after the SCN update of the search indexing I have 3 blogs on page two of a search for data geek
So I do think the edit Tags is a magic button on SCN and worth updating at random times 🙂 up to the switch over to the new platform.
I’ll have to see how searching tags works out on the new platform