Automatic Personalized Search at the Internet Archive
Anna Patterson is the Queen of the geeks, at least fro me 🙂 On the side, while pregnant, she wrote a search engine for the Internet Archive which clocks in at 150,000 lines of plain POSIX C code for the indexer and 50,000 lines for the server side, with some neat features like automatic personalization. Wow.
We taped her presentation at the Future Salon last Friday, but the sound didn’t get recorded, so you have to make do with eye witness reports like this one. Anna’s search engine datamines the corpus before indexing, which leads to a higher level of knowledge per page and enables new features like: automatic content organizing, trend analysis, and personalized search.
The personalized search you can try for yourself. Put “Alexa” in the recall engine. You will see that some of the results are not worksafe, so don’t click these. Next search for Brewster Kahle the founder of the Internet Archive who also created and designed the Alexa navigation service, which got bought by Amazon.com. If you now search for “Alexa” again, the results are tailored towards your previous interests, all of the first page hits are about the Alexa search engine. It’s a bit eerie at first, but the results will convince you, and of course you can always clear your search history.
It is all still in beta, and needs some fine tuning. For example if you search for “SAP” (make sure that you capitalize it, otherwise you don’t get any software related results.) under the people category in the right hand column you don’t find Hasso Plattner. If you you do it the other way round and search for “Hasso Plattner“, the top result under Topics is “SAP’s”. I don’t know why there is an “SAP’s”, but no plain “SAP” connection to Hasso Plattner.
Overall very interesting concepts and results that you can check out today. I am convinced this is not the last time we hear from Anna Paterson and her search engine. Here are her quick overview slides.