Archive for the ‘development’ Category

Hungry? Get your Hot CNet Chunks!

Saturday, June 30th, 2007

Yesterday, Pluggd and CNet unveiled our HearHere search capability over a sampling of popular ZDnet shows.

It’s exciting, because this is the first time people are able to search within media using a heatmap experience.

Before this, users had two choices: full meal deal or get up from the table.
Now they can find and snack on the tasty chunks they are looking for.

Try it out for yourself by doing a search for “iPhone” or “Apple” on this episode:
http://blogs.zdnet.com/BTL/?p=5411

Check out what some folks are saying about the HearHere release:

http://blog.seattlepi.nwsource.com/venture/archives/117373.asp
http://venturebeat.com/2007/06/29/pluggd-begins-delivery-of-better-audio-search/
http://mashable.com/2007/06/29/pluggd-launches-audio-search-player-on-cnet/

More to come!

Pluggd in the Economist

Friday, June 8th, 2007

We’re excited to be mentioned in a story in the Economist today about
speech recognition. The article does a good job of surveying the space, but
what really makes Pluggd different (and speech reco useful for video search)
is the chunking technology we’ve developed (read more about chunks).

It’s great to see the company getting highlighted this way in such an
outstanding publication.

- Alex Castro

More about Chunks: The Parts You Want

Wednesday, May 30th, 2007

We got some questions about what these chunk-things are after my last post. Some folks asked how this is different from just searching for the utterance of a word in video. It’s quite a bit different. Matt Marshall at Venture Beat did a good job describing how Pluggd works in this post (http://venturebeat.com/2006/12/06/pluggd-perfects-audio-and-video-search-raises-165m/) after we last spoke with him.

Let’s dig into this a little more by investigating user intention. When a user searches within video for the word ‘golf’, are they thinking, “The person who created this video has really good enunciation, I wonder how they pronounce the word ‘golf’?” I don’t think so. This is the type of user experience enabled by using speech recognition by itself.

Instead, the user’s intention is more likely to be, “I am really interested in golf, find me the segment within this video where golf is talked about.” This requires identifying a distinct and relevant conversation, what we call a ‘chunk’, within the video. Speech recognition alone isn’t enough to accomplish this. We combine speech recognition with some very interesting semantic analysis and information retrieval techniques to identify chunks. We are able to identify a chunk by recognizing when related words and word phrases (e.g. golf, Tiger Woods, green, Vijay Singh, under par, over par) are used in sequence within an area of video.

There are several interesting implications of chunking:

1) Far superior results than speech recognition by itself

Because we are using the presence of related words, as opposed to the presence of a single word, we are able to achieve results that are far superior to even the best speech recognition engines.

The diagram below illustrates how this works for a scenario where a user searches for a chunk by typing in the query term - “Vijay Singh.” The word phrase “Vijay Singh” might prove difficult for a speech recognition engine, including the one we use, to identify. However, our chunking technology compensates for this.



2) Increases a user’s media consumption

Because users can jump to exactly what they are interested within the video, they don’t ‘bail out’ of the video. Users often start watching a video clip only to become frustrated when they don’t immediately see what they were expecting, and they are too impatient to wait for the video segment they do care about. They just leave. Our experiments show that a very high number of users ‘bail out’ of video within the first 30 seconds.

By allowing users to jump to what they are interested in, users become satisfied, and spend more time watching more of the video. In fact, we’ve found evidence that users display some of the ‘browsing’ behavior in video that they exhibit with hyperlinks and text web pages. In a future post, I will share empirical data from some of the AB testing we’ve conducted over the past few months.

- Alex Castro