A Real Semantic Search Engine is Coming: Q&A With Swingly CEO Andy Hickl
March 18, 2009
I had the pleasure of meeting Andy way back at Pitch Camp Dallas when he gave a short presentation on Swingly. Andy and his team have been hard at work moving towards the release of their search engine, so I was glad when he was able to respond to a few questions I had shot him a while back via email.
FWSB: Introduce yourself and tell us what is Swingly on both a layman’s and more technical level.
Hi, I’m Andy Hickl, CEO of Swingly.com.
Swingly is a new kind of search engine that’s aiming to change how people search for factual information. We’re looking to accomplish this by combining two powerful types of “semantic” search engines: automatic question-answering and adaptive recommender systems.
First, we think that it should be a search engine’s sworn duty to find the particular piece information that you’re looking for, no matter if you’re asking a complex question like
How much money did Top Gun make outside of the U.S.?
or just putting together a simple keyword query like
Top Gun international gross
If you’re like me, you don’t care if Google’s index contains 295,000 pages which could potentially help you. Who’s got time to read all of that? You only want to know one thing: the amount of money that “Top Gun” made in countries other than the United States. (It’s $168 million, by the way.) That’s where Swingly comes in. Instead of returning hundreds (if not thousands) of pages for you to read, Swingly goes out and finds the exact piece (or pieces) of information sought by a query (or a natural language question) from billions of web pages or large structured databases (such as DBPedia/Wikipedia, IMDB, MusicBrainz, etc.).
Second, unlike “pure” question-answering sites (like MIT’s START (start.csail.mit.edu/) or Answers.com’s BrainBoost (www.answers.com/bb)), Swingly’s “smart” enough to understand when someone’s not asking a question, as well. In this case, Swingly searches its index for information that might be relevant to your query — and presents it back to you in the format reminiscent of a FAQ. For example, given a query like
Sixth Floor Dallas
Swingly responds with the following set of question-answer pairs:
#1. Q: Where is The Sixth Floor Museum at Dealy Plaza? A: 411 Elm Street Dallas, TX 75202
#2. Q: What is The Sixth Floor Museum? A: The Sixth Floor Museum at Dealey Plaza chronicles the assassination and legacy of President John F. Kennedy
#3. Q: When is The Sixth Floor Museum open? A: Open every day except Thanksgiving and Christmas
#4. Q: How long should you spend at The Sixth Floor Museum at Dealy Plaza? A: Plan to spend at least 3-4 hours here if you want to really take most of it in.
Swingly generates most of the question-answer pairs it returns directly from the text of WWW documents. (Our index also includes coverage for all of the questions in the popular Q&A sites (Yahoo! Answers, Mahalo, etc.) as well as questions found in FAQs and forums, as well.)
FWSB: How is Swingly funded?
We’re 100% privately funded. We’re not looking for external funding right now.
FWSB: How does Swingly factor in to the future on search as you see it?
Let me start off by saying how completely and utterly blown away I’ve been by all of the attention that so-called “semantic” technologies have recently attracted over the past few months. It’s really unbelievable, especially when I think about where the market was — even as recently as last September when we were starting up — and where it is now.
I think it’s safe to say that the availability of large, high-quality data sources (such as DBPedia or Freebase) has really sparked everyone’s imagination. If you’ve got access to the right data set — and you understand what you’re sitting on — it’s now possible to roll out search apps that have the power to dramatically change how people interact with information. We’re seeing this in a number of different verticals: Jinni (jinni.com) is providing semantic search for movie information, Sortuv (sortuv.com) is doing it for restaurants, and guys like Evri (evri.com) are looking at different ways that biographical information can be provided. And that’s only the tip of the iceberg.
But where does all this data come from? While a source like Wikipedia seems inexhaustible, it’s important to recognize that it’s ultimately limited in its coverage. There’s tons of knowledge that’s out on the Web that’s just not in there.
So what’s a believer in semantic technology to do? Right now, if you’re not getting the crowd to create the data for you (like Yelp (yelp.com) or any of the community-based QnA sites (Mosio, Fluther, Yahoo! Answers, Mahalo, etc.)), you’ve got to rely on data made available by content aggregators — or you’ve got to go out and create it yourself. (Believe it or not, there are people who are actually going out there and creating content by hand — TrueKnowledge (trueknowledge.com) is one example of a company that’s going out and doing it the right way.)
With Swingly, we’re going in a different direction. Instead of involving humans in the task of content creation, we’re using natural language processing (NLP) technology to extract factual knowledge from the text of any document found on the open Web. We’re big believers in the power of NLP: if done right, it can be used to unlock a nearly-limitless supply of knowledge from text. There’s no limits to the knowledge that we can include in our index — short of what’s available out there on the Internet.
We’re fortunate enough to have licensed some great, wide-coverage extraction systems from our parent company, Language Computer Corporation (LCC), which allow us to extract hundreds upon hundreds of types of entities, attributes, relationships, and events from text. The coverage of these extraction systems is key: in the not-so-distant future, we’re expecting to be able to automatically convert raw, unstructured texts into rich, RDF-style semantic representations that can be searched or mined by any semantically-aware app.
FWSB: What’s your launch timeframe?
Alpha coming in late much and a public beta in mid-May. The alpha version will use an index of about 850 million question-answer pairs (more than all the Q&A sites put together) and will only be searchable. The beta release will consist of about 5 billion question-answer pairs and will include full questions & answer plus semantic search capabilities.
