I’ve just been reading an interesting report in the newspaper. According to research carried out by MSN, Belgians spend 28 hours a year searching on the internet in vain. That’s less than the European average of 32 hours. 68% of the European internet users quit their search early due to the lack of (satisfying) results. Slightly fewer Belgians (65%) give up after a while. Now what does that say about Belgians? Are Belgians stubbornly persistent people according to Caesar we were the bravest of the Gaul ? I don’t think so. My conclusion is that in contradiction to what one sometimes hears not everything can be found on the internet. So I wondered why that was and the following rose to the surface of my pineal gland:
- not all pages available on the Web have been indexed using traditional search technologies, partly due to a lack of technical resources or limitations of harvesting technologies, partly due the huge amount of information to harvest
- incorrect, or too general, search terms, certainly when one is looking for a needle in a haystack
- contaminated search results. Webmasters use a lot of dirty tricks in order to get their web site and/or commercial message highly ranked. Search engine providers aren’t charity workers either so they do their part too.
- the referring mystery. Now that’s interesting. When we look at the Google info for webmasters , we read:
If you increase the links pointing to these pages, it’ll improve the chance that we’ll find your site
I’ve been wondering for a while now why it isn’t easy to find information provided in web logs if you use search engines outside SDN itself. Could it be that indexing also depends on whether someone is referring to that site/page? Would it imply that nobody is pointing to the web log either from other web logs or from other more important forums? Doesn’t that information have any use as reference material at all? That can’t be, certainly after reading Mark’s Zero to Third Place on Google within Two Days on tweaking links for better Google indexing. So I investigated.
Talkdigger is a sort of meta search engine developed by Frederick Giasson (a Canadian who started blogging in order to practice his English). He wanted to know how popular he was and didn’t want to have to check all web sites, so he made a meta search engine. It searches on blog sites like Bloglines, Feedster and classic search engines like Google and MSN.
I tried the engine out with one of Thomas WebServices: A real world implementation experience on Web Services in order to be sure that I could get something back. The results were very disappointing. Although Feedster reported 173 links, none of the first X pages provided Thomas’ web log. Same thing for the 4 Google results. I was rather surprised by this poor result. Nevertheless, Talkdigger could be useful for retrieving blogs.
Update: Google has just launched a blog search engine. A first test is not very promising concerning referring links. Searching on name is somewhat better, but for e.g. myself it shows only 14 hits, which only 10 are relevant.
The classic way
If it doesn’t work via specialised tools, why don’t we use the old fashioned way? So I tried the most popular search engines: Yahoo, Google and MSN. It’s just a matter of submitting link:WebServices: A real world implementation experience as search term. To put it in a nutshell the results were: nada, niente, nichts, rien, noegabolle. Whatever I tried, the result list was always empty or not correct. Then I tried the well known butler Jeeves and guess what? I got 3 pages back, but my enthusiasm cooled down fast when none of the three pages actually contained a reference to the web log I’d been looking for. But don’t despair Thomas, two other web logs were covered in this forum /community [original link is broken] according to Ask Jeeves. As you can see, the classical way of searching the internet is fairly tedious, with poor results. One can make things easier by using meta search engines. Personally, I found Dogpile rather useful. A search plugin for Mozilla/Firefox can be found at Mycroft . Just before I wanted to submit this web log, I discovered Clusty. This meta search engine is able to cluster the result by e.g. topic. When I search on +Thomas +Jung, I got 6 results in the cluster SDN blogs with only a couple from Thomas himself. Again poor results. But when I add ABAP I get 21 SDN related hits. Wow (sic)! I even learn that Thomas is also active at devx.com. Firefox plugin lovers can find one at Mycroft again. More goodies later.
Desperately seeking results, I wondered whether it’s also a sorry state of affairs within SDN itself. So I used the internal search engine within the forums. The first result was again disappointing, no hits were returned. Was it the fact that the search engine couldn’t understand URL’s? Could be. So it tried it in a shorter way and searched only on blog=/pub/wlg/1282 and hurray, finally some serious results. Justice at last! But the search engine isn’t flawless. It doesn’t find the web log in the new BSP web logs ;\ New BSP Weblogs (and maybe in others either).
The short way
In my search for Thomas’ web log, I’ve been trying several things. After a while I was fed up typing URLs and stuff, so I wondered if I couldn’t make it shorter. You might remember my Foxy ISO SDN 4 LTR on a Mycroft search plugin for the SDN search. I tried to do the same for the above mentioned search engine. I soon discovered that this was quite a task which ended unsuccessfully. The problem is that the Mycroft standard doesn’t provide any means for concatenating strings which the URL structure of web logs demanded. If I defined the search string I ended up with blog=/pub/wlg/=1282 as best result. The SDN search engine dumps on this.
But I still had a rabbit in my head. As mentioned in my other MIE ISO SDN 4 LTR 2, it’s also possible to query a search engine from bookmarklets. It has some advantages:
- it’s easy to implement
So without further ado, here are the bookmarklets. It all comes to the same. You will be prompted for a web log, where you only need to type the web log id. I refer to my earlier MIE ISO SDN 4 LTR 2 for further info on what the code does and how to install it. Please remember to copy and paste the code as one line!
Search SDN web log referrers via Talkdigger
Search an SDN web log via Dogpile
Search an SDN web log via Clusty
Search SDN web log referrers in the SDN forums
Search SDN web log referrer in the SDN web logs
Search SDN web log referrer in the SDN articles (now part of SDN library)
As mentioned in my previouse When you get what you want, but not what you need, there is another way to include search engines in MIE. Daredevils have to start the registry editor and serach for the entry:
Add a new key entry for the serach engine you want to include. Let’s try to replicate the above Clusty example, so clusty would be a nice name for this. Then double-click on this entry and enter the value
%s is the search string. Close the registry editor and start MIE. If you now type clusty, followed by the number of the weblog in the address bar, you get the same result as with the bookmarklet.
To be honest, I’m only left with questions. Is the outside world not interested in the SDN community? SDN information (if any) is even hard to find in SAP related sites like SAPGenie , SAPdevelopment and SAPtopsites ? Why isn’t SDN ranked in the latter? Does one even think that the information isn’t valuable outside that community? Why don’t the most popular search engines, with Ask Jeeves as an exception, find any references to web logs in the SDN forums and web logs? Is there still something wrong with the structure of the site? Is it too dynamic? Is it the way of coding? Luckily we have the SDN search engine itself and of course Craig’s web log on SDN Tips: “Weblogs Related to…” . As Mark suggested, we can go further into this matter at the SDN Contributor Meet BoF session at TechEd Vienna.
TechEDI or TechEddy?
Speacking of TechEd. Besides my presentation and BoF session (together with Die Mensch-Maschine) on the Honey Pot Project, I will be also availabe for the Top SDN Contributor Q&A Sessions. I hope to meet you all there. Meanwhile my colleague Gerrit De Bremme aka Sh3Ll4C made a cartoon about this.