Try out text-to-speech

Posted by Raph Koster(Visited 5032 times) Game talk

May 072007

Mike Rozak is looking for folks to try out text-to-speech stuff; he’s interested in large part because of its usability for virtual worlds.

For many years now, I’ve been posting that (in the long run) text-to-speech will be an important technology for MMORPGs.

If you want to listen to the latest bleeding-edge text-to-speech research, AND at the same time help improve text-to-speech, then please run through the listening tests at http://homepages.inf.ed.ac.uk/mfraser2/blizzard2007/register-R.html .

A quick explanation of what’s going on in the test:

Modern text-to-speech voices are made by taking several thousand recordings of someone speaking, analyzing them, and then producing a voice file. In the case of the blizzard test, 6000 recordings were sent out to 16(?) different companies a few months ago. The voices were then used to synthesize the test sentences that you hear.

In each section you’ll listen to one sample from each company’s voice, as well as one sample directly from the original speaker. You then have to give the sentence a score about how realistic it is, or type in what you heard, depending on the section.

These scores are then tabulated, and text-to-speech engines are ranked. (Mine will be near the bottom this year, but I’ll get it better for next year.) Participants then write papers describing what they did, and use each others papers to improve their algorithms for the following year.

You might find the test interesting because a few of the 16 companies have produced voices that are really good… although they still sound like a bored telephone operator. 🙁

PS – Forward this around, since the more participants, the more accurate the tests.

6 Responses to “Try out text-to-speech”

Tim says:

May 8, 2007 at 6:17 am

I’m not sure what the target is for this but I almost think speech to text would be more useful in the big picture. As I see it, the “push” to integrate VoIP into games will make social interaction harder for people with hearing loss. Unlike chat, voice also makes stepping away from the screen for a minute and catching up impossible, not to mention some kinds of parallel conversations.
moo says:

May 8, 2007 at 6:50 am

Simple: text to speech means NPCs can all speak (audibly) to your character without having gigabytes of pre-recorded voice-acted speech installed. And you can use it for dynamically-stitched-together quest text and stuff like that. (Oh, and the NPCs can say your name, too. Opening up a whole new kind of griefing, but it would still be cool.)
RickR says:

May 8, 2007 at 7:02 am

TTS was made available in Unreal Tournament 2004 — if people typed “gg” in chat, it would say “good game” or “laugh out loud” if you typed LOL, as well as simply speaking out what was typed. The first thing griefers started doing was typing in long strings of punctuation in order to spam the TTS channel with “exclamation-mark-ampersand-ampersand-question-mark-ampersand-semi-colon….” droning on and on and on in peoples’ headsets. 99% of players turned off the feature within days of the game’s release I think.
I think it’s a great idea, as long as you can contain the misuse somehow.
Tim says:

May 8, 2007 at 7:21 am

That application makes sense Moo, but most game developers could just leave the cutscenes out of games and save us all a lot of disk space :^)
External News says:

May 8, 2007 at 7:55 am

[…] Try out text-to-speech […]
Mike Rozak says:

May 8, 2007 at 1:53 pm

I am interested in text-to-speech for virtual worlds (and other games) because of a combination the following factors:

– I want intelligent NPCs that I can talk to (minimum of typing/menus, eventually speech recognition) and which have personalities. (http://terranova.blogs.com/terra_nova/2007/05/artificial_inte.html#more)

– I find it more immersive and easier on the brain for the NPCs to talk back (using speech). I find it easy to watch an image and listen to speech, but difficult to watch an image and read text.

– Not only don’t I have the budget to record hundreds of hours of voice talent, I feel that even hundreds of hours would restrict NPC speech too much. (The “speak the player’s name” example is one small reason.)

Thus, I need to use text-to-speech, despite all its faults. (I see TTS’s quality akin to 3D graphics in the early 1980’s, where Battlezone was the best (and only?) 3D app. 25 years later, sprite technology from the 80’s a niche and a much-improved 3D is ubiquitous.)

When I post these reasons for using TTS on MMORPG boards, I get the following responses: (1) No one wants NPCs, and (2) I’d rather read text than listen to speech. However, my gut tells me that most players want NPCs and most wan’t real speech; they’re just not playing MMORPGs because any players that did care about NPCs or recorded speech gave up on MMORPGs long ago.

Sorry, the comment form is closed at this time.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
__cf_bm	30 minutes	This cookie is set by CloudFlare. The cookie is used to support Cloudflare Bot Management.

Cookie	Duration	Description
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.

Share this post:

Related

6 Responses to “Try out text-to-speech”