Skip to Content
Author's profile photo Mark Finnern

Bay Area Future Salon Friday 16th: Search and Analysis

When Bay Area Event: Scott Rafer CEO of Feedster at Emerging Technology SIG, no one objected against posting Bay Area Events. Here I go again and sorry for the late notice. Category shameless self-promotion, although is it really self-promotion, when I am only the facilitator?

In short: Future Salon about the latest in Search and Analytics this Friday 16th of January 7-9pm SAP Palo Alto Building D, Room Southern Cross.


“Wow, she is great” said Brewster Kahle, founder of the Internet Archive, when he heard that Anna Patterson will present at the next Future Salon. He knows, after all the search engine that Anna developed is searching over the 11,094,924,000 pages of his archive.

Here her abstract:

This talk presents a search engine,, and its underlying framework.

This search engine datamines the corpus before any indexing takes place. Modeling the corpus leads to several new features: automatically organizing the content, trend analysis, and personalized search.

I present a unifying mathematical model which captures index compression, ranking, duplicate detection, disambiguation, clustering and personalization from a single underlying order on pages.

This system is running over an index of 11 billion pages at the Internet Archive on a pair of Linux machines. The addition of personalization takes about 1ms over a normal search.

Anna Patterson received her PhD in Computer Science and then became a research scientist at Stanford doing phenomenal datamining. She left to start her own search company Xift and lately has been doing search research out of the Internet Archive in San Francisco.

Second part of the evening an introduction to a solution that is complimentary to search. InfoTame technology, which is used to analyze massive amounts of text based data.(above 10.000 documents):

InfoTame technology is used to analyze the content of millions of documents and extract relations and hidden facts from it (you do not know them and therefore do not know how to search for them)

InfoTame uses Statistical Spectral Content Analytics. This mathematical approach is language independent and analysis the relationships between words and word pairs for query results in comparison to the entire database.

The calculation of a significance rating for words and word pairs is the basis to do trend analysis and comparisons from text-based information.

Dirk Wentzel, a former SAP colleague and friend of mine, who is now VP Customer Services of InfoTame will present their solution.

It will be very interesting. I am so looking forward to it.

Friday 16th of January 7-9pm. Bring future stories you stumbled over the last month to share.

If you have time, join us afterwards for dinner and discussions until late in the night at a restaurant in Palo Alto.
“Oh, how nice, I haven’t done something like that since I was a student” was Anna’s reaction when I told her about our after hours plan.

Location: SAP Labs North America
Building D Room: Southern Cross
3410 Hillview Avenue
Palo Alto, CA 94304
(Driving directions to building D via Highway 101)
(Driving directions to building D via Highway 280)

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Benny Schaich-Lebek
      Benny Schaich-Lebek
      unfortunately it's about 10.000 kilometers away.
      Will you webex this session?
      Author's profile photo Mark Finnern
      Mark Finnern
      Blog Post Author
      Hi Benny,

      We will try to tape it and make it available to a bigger audience. I wil keep you posted.

      All the best, Mark.