Skip to Content
Author's profile photo Mark Finnern

Automatic Personalized Search at the Internet Archive

Anna Patterson is the Queen of the geeks, at least fro me 🙂 On the side,    while pregnant, she wrote a search engine    for the Internet Archive which clocks in at 150,000 lines of plain POSIX    C code for the indexer and 50,000 lines for the server side, with some neat    features like automatic personalization. Wow.

We taped her presentation at the Future    Salon last Friday, but the sound didn’t get recorded, so you have to make    do with eye witness reports like this one. Anna’s search engine datamines the    corpus before indexing, which leads to a higher level of knowledge per page and    enables new features like: automatic content organizing, trend analysis, and    personalized search.

The personalized search you can try for yourself. Put “Alexa” in    the recall engine.    You will see that some of the results are not worksafe, so don’t click these.    Next search for Brewster    Kahle the founder of the Internet Archive who also created and designed    the Alexa    navigation service, which got bought by    If you now search for “Alexa” again, the results are tailored towards    your previous interests, all of the first page hits are about the Alexa search    engine. It’s a bit eerie at first, but the results will convince you, and of    course you can always clear your search history.

It is all still in beta, and needs some fine tuning. For example if you search    for “SAP”    (make sure that you capitalize it, otherwise you don’t get any software related    results.) under the people category in the right hand column you don’t find    Hasso Plattner. If you you do it the other way round and search for “Hasso    Plattner“, the top result under Topics is “SAP’s”. I don’t    know why there is an “SAP’s”, but no plain “SAP” connection    to Hasso Plattner.

Overall very interesting concepts and results that you can check out today.    I am convinced this is not the last time we hear from Anna Paterson and her    search engine. Here are her quick    overview slides.

Assigned tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.