Last week I was able to to attend the Gilbane Conference in San Francisco. For those not familiar with the Gilbane Group (and that included me before I was invited to this event), they are an analyst and consulting firm who track developments and trends in the content technologies market. They also put on a couple of conferences each year which attract folks interested in topics such as web content management technologies, enterprise search, text analytics and semantic technologies, and collaboration and social computing. The speakers are generally experts in their fields and bring practical knowledge with examples of case studies and best practices. The SAP Developer Network & Business Process Expert Communities also had a presence at the conference through Mark Yolton’s session on “Case Studies: Collaboration in Action.” For a nice summary and re-cap of Mark’s session read the blog by ToolSmith.
While Collaboration and Social Computing was one of the tracks, I decided to check out some of the sessions in the Enterprise Search track. With the ever growing amount of content inside enterprises, this is an area that has gotten more attention and focus in recent years. One of the main points that I took away from it is that search is not easy and you have to be willing to invest resources, including humans, to make search work well. I’ve captured some of the more interesting notes and food for thought here.
The conference began with a keynote from Google’s VP of Engineering, Udi Manber. He talked about how new innovations come about in his group. An engineer with an idea doesn’t have to ask for permission to pursue it or need to write up a use case. Basically they just take their idea, implement it on a test machine and then bring the results (eg. meaningful data) to the next team meeting where it’s then discussed to determine it’s usefulness and how it could be used. This open process allows them to innovate quickly even though right now, as he pointed out, user expectations and needs are growing much faster than innovations.
One of the interesting things Google is looking at is cross language information retrieval. They’re trying to break the language barrier by making all of the world’s content searchable and then translating it into the language of the user. One of the main issues is that today there’s not enough non-english content on the internet. So, the idea is that if a user in Turkey, for example, does a search using the Turkish Google site for information about New York they would get English language content that’s been automatically translated (via machine translation) into Turkish. This would allow the user to get the most relevant info on the topic since the best content about visiting the Big Apple is most likely written in English.
Continuing on the topic of search was a session by Stephen Arnold who has just written a report called “Beyond Search: What To Do If Your Enterprise Search System Doesn’t Work”. He began his talk with a statement that certainly got the audience’s attention. He claims that there’s “no one size fits all” for a search solution since it really depends on your specific search need. You need to use different tools to access different content for different users. For example, searching for chemical structures is very different than searching across emails.
The important thing is to understand the content that you want to search across and what your user’s needs are. He listed several things that you should understand when considering a search system: the content that you want to process, the amount of content, the frequency of change and addition of new content, and the user interface. If you make an error in any of these areas your search budget will go into a “death spiral” (his words, not mine).
Arnold’s recommendation is to buy one search system and then add specialty tools to do controls such as taxonomies or entity extraction. He also talked about the importance of controlled vocabulary, metadata, and taxonomies, saying that this is best done by a human being such as a librarian. While a search engine can assist with this it can’t be done using the tool alone. The reality is that there’s no product that will work out of the box. Humans are needed to do search well and because of the complexity and details involved you have to be prepared to work really hard to do search well.
When asked which search vendors to keep on eye on he chose three companies. The interesting thing here is that these are all smaller companies (not the big players generally associated within this market) and only one of the three is a US company.
ISYS – quick to get up and running.
Exegy – good at handling high volume real-time data for financial industries.
PolySpot – business intelligence meets workflow plus it’s scalable.
The other search-related session I attended was a panel of vendors from ISYS, Coveo, Vivisimo, and Access Innovations, discussing their insights into what they think is hot, and what’s coming in the search arena. Content is growing at such a crazy pace and people are more likely to keep content and are not deleting anything. For this reason refinement and mining are going to become even more important.
We see ways this is already being done with “social search” which allows users to vote, rate, tag and annotate content. This is a great way to attach metadata to community-generated content that otherwise has none. One concern with social search has been the idea of “tyranny of the majority,” meaning that if only a few people use these rating or voting functions they will bias the content, but this hasn’t really been happening. User ratings are a good way to clean things up since they push lesser ranked items down in relevancy. One of the panelists, in a rather optimistic prediction, said that eventually search will be just like electricity – you turn it on but don’t even know it. While I think we’re a long way away from that, there are a lot of exciting things happening in the world of search and it is definitely only going to grow in importance.