Google Videonomatopoeia
25 January 2005 @ late evening | Comments (0)
Finally! Something interesting enough to make me feel like typing again.
It really is quite fascinating to me the rate at which we are progressing toward indexing the entire world’s media spew. Not long ago archived media materials were only available in hard-copy formats, such as book/microfiche (print) or in the archival libraries of networks or film studios (video/film). By contrast, now we have the Internet Archive that currently holds over a petabyte of data and grows at a rate of 20TB/mo.
Google recently started indexing television programs with its latest (as always) beta product, Google Video. Currently it only shows a few screenshots and some indexed dialogue, but it’s still enough to make M$oft sweat.
Now, I realize that the copyright implications of Google including actual video clips rather than screenshots would be enough to drag even them into a financial quagmire (clearing the rights for all programs would be true legal insanity), but I’m curious as to the copyright status of the closed-captioned text that their search engine indexes. I’m sure it’s copyrighted in some way, but in most cases the content isn’t even accurate to what was truly said on-screen. There seems to be some loose formatting standard, but the frequent misspellings and strange keyboard vomit that results from realtime captioning of non-speech events is often hilarious.
With ever-improving speech recognition technology this will likely change. But with the current state of Google Video and captioning, we’re now granted access to a world of language that has never before been accessible in this searchable, consolidated format. We now have a growing library of non-speech events, translated to text. And it now proves that most of network television’s content carries about as much value transcribed by keyboard-trained monkeys as it does straight from the boob tube.
Search Google Video for any common phrase used for non-speech events (I’ve chosen “music continues” because it seems to pop up frequently). Check out some of these transcriptions; the best seem to come from Boobah, whatever the hell that is:
at 7 minutes 30 seconds
Shh. (Techno music begins playing) (tinkling)
at 15 minutes
Booh! Bah! Booh! (Laughing) (peeping as rhythmic music begins) (poofing as music plays)
at 15 minutes 30 seconds
(Squealing) (boinging as music plays) (Squealing) (poofing individually)
at 16 minutes 30 seconds
(Squealing) (boinging as music plays) (Squeaking) (poofing individually)
at 22 minutes
(Languid music begins playing) (giggles)
What the hell is poofing individually?
Anyways, at least they’ve gotten 72 episodes of The Simpsons indexed.