Try out text-to-speech
(Visited 4952 times)Mike Rozak is looking for folks to try out text-to-speech stuff; he’s interested in large part because of its usability for virtual worlds.
For many years now, I’ve been posting that (in the long run) text-to-speech will be an important technology for MMORPGs.
If you want to listen to the latest bleeding-edge text-to-speech research, AND at the same time help improve text-to-speech, then please run through the listening tests at http://homepages.inf.ed.ac.uk/mfraser2/blizzard2007/register-R.html .
A quick explanation of what’s going on in the test:
Modern text-to-speech voices are made by taking several thousand recordings of someone speaking, analyzing them, and then producing a voice file. In the case of the blizzard test, 6000 recordings were sent out to 16(?) different companies a few months ago. The voices were then used to synthesize the test sentences that you hear.
In each section you’ll listen to one sample from each company’s voice, as well as one sample directly from the original speaker. You then have to give the sentence a score about how realistic it is, or type in what you heard, depending on the section.
These scores are then tabulated, and text-to-speech engines are ranked. (Mine will be near the bottom this year, but I’ll get it better for next year.) Participants then write papers describing what they did, and use each others papers to improve their algorithms for the following year.
You might find the test interesting because a few of the 16 companies have produced voices that are really good… although they still sound like a bored telephone operator. 🙁
PS – Forward this around, since the more participants, the more accurate the tests.
6 Responses to “Try out text-to-speech”
Sorry, the comment form is closed at this time.
I’m not sure what the target is for this but I almost think speech to text would be more useful in the big picture. As I see it, the “push” to integrate VoIP into games will make social interaction harder for people with hearing loss. Unlike chat, voice also makes stepping away from the screen for a minute and catching up impossible, not to mention some kinds of parallel conversations.
Simple: text to speech means NPCs can all speak (audibly) to your character without having gigabytes of pre-recorded voice-acted speech installed. And you can use it for dynamically-stitched-together quest text and stuff like that. (Oh, and the NPCs can say your name, too. Opening up a whole new kind of griefing, but it would still be cool.)
TTS was made available in Unreal Tournament 2004 — if people typed “gg” in chat, it would say “good game” or “laugh out loud” if you typed LOL, as well as simply speaking out what was typed. The first thing griefers started doing was typing in long strings of punctuation in order to spam the TTS channel with “exclamation-mark-ampersand-ampersand-question-mark-ampersand-semi-colon….” droning on and on and on in peoples’ headsets. 99% of players turned off the feature within days of the game’s release I think.
I think it’s a great idea, as long as you can contain the misuse somehow.
That application makes sense Moo, but most game developers could just leave the cutscenes out of games and save us all a lot of disk space :^)
[…] Try out text-to-speech […]
I am interested in text-to-speech for virtual worlds (and other games) because of a combination the following factors:
– I want intelligent NPCs that I can talk to (minimum of typing/menus, eventually speech recognition) and which have personalities. (http://terranova.blogs.com/terra_nova/2007/05/artificial_inte.html#more)
– I find it more immersive and easier on the brain for the NPCs to talk back (using speech). I find it easy to watch an image and listen to speech, but difficult to watch an image and read text.
– Not only don’t I have the budget to record hundreds of hours of voice talent, I feel that even hundreds of hours would restrict NPC speech too much. (The “speak the player’s name” example is one small reason.)
Thus, I need to use text-to-speech, despite all its faults. (I see TTS’s quality akin to 3D graphics in the early 1980’s, where Battlezone was the best (and only?) 3D app. 25 years later, sprite technology from the 80’s a niche and a much-improved 3D is ubiquitous.)
When I post these reasons for using TTS on MMORPG boards, I get the following responses: (1) No one wants NPCs, and (2) I’d rather read text than listen to speech. However, my gut tells me that most players want NPCs and most wan’t real speech; they’re just not playing MMORPGs because any players that did care about NPCs or recorded speech gave up on MMORPGs long ago.