Avatar body language
(Visited 13234 times)Regular blog reader mrseb has a blog post up on emotional avatars in virtual worlds inspired by this NYTimes.com article (it’s behind a reg wall).
In short, the research is about how important blushing is as a social lubricant, as evincing embarrassment or shame serves to reinforce the social rules held in common by groups of people. It’s a sign that the person knows they are transgressing to some degree and is sorry for it, and people judging them tend to treat them less harshly.
Which leads Sebastian to ask (emphasis mine!),
Why are we still running around in virtual worlds with emotionless, gormless avatars?
It’s not that the question hasn’t been asked before. For example, back in 2005 Bob Moore, Nic Ducheneaut, and Eric Nickell of PARC gave a talk at what was then AGC (you can grab the PDF here)., which I summarized here with
The presentation by the guys from PARC on key things that would improve social contact in MMOs was very useful and interesting. Eye contact, torso torque, looking where people are pointing, not staring, anims for interface actions so you can tell when someone is checking inventory, display of typed characters in real-time rather than when ENTER is hit, emphatic gestures automatically, pointing gestures and other emotes that you can hold, exaggerated faces anime super-deformed style or zoomed in inset displays of faces, so that the facial anims can be seen at a distance… the list was long, and all of it would make the worlds seem more real.
I was at that talk, and in the Q&A section, which was really more of a roundtable discussion, the key thing that came up was cost.
There are two equally challenging barriers to better emotional displays from avatars: input and output.
In text-based worlds we had very low-cost output. Text is extremely malleable, with very little overhead. You don’t need to bake out phrases in advance inside of dedicated software and ship it on a disc. It takes up little bandwidth. It has very low hardware requirements for display. And you aren’t confined into tricky complex formats: nobody asks you what the poly count is of a sentence, or whether you have room for one more bone in your paragraph.
The flip side is that the creation tools for puppeteering are then very freeform and very simple. Just about all text worlds — even IRC and IM — offer the following two simple tools:
- canned social commands (“smile” “laugh” etc). Execute the command (which these days is often just a button press) and a pre-written line of text is delivered to the interlocutor. The advantage here lies in easy puppeteering and the expense of flexibility.
- the “emote” command, also know as “me” or “pose.” This is the reverse; you type in your own text, so you have maximum puppeteering control, which also takes the most work.
Once we got out of text, however, we were in trouble. Graphics adds very heavy load on the output side of the equation, reducing the available palette to only canned choices — and few of them at that! Systems such as Microsoft Research’s Comic Chat (remembered today as the place where the much-reviled Comic Sans font was born) offered “emotion wheels” and facial expressions, and pioneered text parsing to add mood to the avatars.
In the mid-90s, the addition of text parsing into the mix gained a lot of popularity in non-VW circles, allowing emoticons to get parsed automatically out of lines of chat and play sounds or insert small graphics like this one: 🙂 Emoticons themselves, of course, were a way of adding emotional tone to text, a way to puppeteer in a very low-intensity way.
Text worlds merrily went on developing more elaborate systems, such as moods and adverbs, and eventually these made the jump to a couple of graphical worlds. We also saw systems like that of There.com, which managed to convey intensity of emotion in smilies by using tricks like including more pieces of punctuation in a social command in order to make it “stronger.”
When you type these commands, they appear in your chat balloon and your face and gestures match the emotion. If you don’t want the Smiley to appear in your chat balloon, type a tick before and after the command, such as ‘surprise’.
You can intensify a gesture by beginning it with two or three ticks. If you think something’s pretty funny, type ”laugh. If you’re really excited, type ”’yay.
Needless to say, the number of these was limited by the graphical output. There had quite a lot of other features along these lines including automatic head tracking to recent speakers, and eyegaze on the head matching your mouselook.
The puppeteering challenge remains, of course, even as graphics have gotten better and the range of possible emotional animations has risen. In Star Wars Galaxies we had moods and a subset of them could affect body language, causing the base idle of the avatar to change. We also supported the stuff that There did (which I cheerfully stole). But the cost of all this stuff definitely adds up. Adding modifier animations on top of all the other animations that are required for actually playing the game can easily get prohibitive.
The other direction to go here, of course, is procedural. A glance over the animation work of Ken Perlin shows that it is quite possible to convey emotions in facial and body animation using relatively few procedural datapoints. (You’ll need Java for these). His Responsive Face demonstrates how you can create 2d faces of remarkable expressivity by adjusting a few sliders; another demo shows how you could use this same system as an animation tool to build more complex sequences of emotion. Both of these are based in part on the research of Paul Ekman (best known these days for inspiring the TV show Lie to Me). Aspects of this are reportedly in Valve’s Half-Life 2. Finally, his Emotive Virtual Actors demonstrate how a fully procedural animation system can get across highly subtle information such as “interest in a member of the opposite sex” purely via body language!
I’ve wanted to make use of this stuff for forever… and have never quite found a way. The barrier is that it requires that the entire system be driven procedurally, which is a larger step than most art departments are willing to take.
These days, of course, all the news is around cameras instead, providing direct puppeteering of avatars by motion-capturing movements on the fly. This has gotten more and more sophisticated since the days of pioneering art installations like those of Zach Simpson, or even the EyeToy’s simple games. Among the demos of Project Natal at E3 was Milo, which mirrored your actions and did facial recognition.
The step beyond this is direct brain interfaces — which are no longer science-fictional crack. We can control movement via a brain interface today, and it is not a stretch to imagine avatars displaying emotions simply because you feel the emotions!
The difference, of course, is that at that point you are no l0nger puppeteering, and are instead engaging in a form of emotional telepresence. For many applications, it will be as critical to hide real emotions as to display them; woe betide the brain interface that displays what we really feel as opposed to what we are trying to show!
This is, of course, why voice is so often used now as a method of increasing emotional engagement. The puppeteering problems are bypassed entirely. It could be that at some point we use stressor analysis in voice chat in order to puppeteer in the same way that we use emoticons today.
At the moment, then, we are caught in a mode where the displays are almost good enough but the controls are bad — getting us back to where we were with the text displays originally. Alas, for most developers of virtual worlds, particularly game worlds, all of this stuff takes a serious backseat — even though depth of emotional connection is a major predictor of retention. With the highly operational gameplay mode being dominant in MMORPGs, we see very little attention paid to avatar expression beyond customization — and even that is oriented far more around the avatar as trophy case than around self-expression.
Given the cost of doing any of this stuff beyond the minimum, the hope for better avatar body language, then, rests in the casual market, and the worldy worlds, because those are the only markets which privilege it. And even these struggle with the puppeteering, because all of these systems have interfaces that increase in complexity and likely user confusion as the capabilities increase. In the name of balancing resources and keeping the interface clean, developers lean towards less emotional bandwidth.
It might be the case that World of Warcraft could significantly extend the lifetime of a customer by adding better puppeteering, but weighing the benefits against the opportunity costs of more armor sets has thus far come down in favor of less emotion and more ease of use, less communication and more monster killing.
15 Responses to “Avatar body language”
Sorry, the comment form is closed at this time.
Totally agree Raph, there is still a lot to be done on the avatar/embodiment front. I know some of us have been researching this issue of expressiveness etc from the more sociological side going back to text-based and early graphical worlds and I am regularly amazed how little attention is really given to this in terms of dev. One person whose research I have long admired though on this is Hannes Vilhjálmsson’s (http://www.ru.is/faculty/hannes/ru_main_papers.html). I remember being pretty blown away when I first saw a demo he had where the av made really subtle eye contact (and this was mid 90s!). Folks should definitely check out his work. As for the cameras and brain stuff, I’m still waiting to hear a compelling scenario in which that kind of tech actually works within the everyday context of multi-channel (including off/online) interactions that are quite often the norm (and indeed valued).
Thanks for the credit AND the bold. I always like seeing ‘gormless’ used on a high-traffic website!
Do you think there’s more emphasis on armour sets and trophies because… that’s how technology has developed? Or how technology has been marketed?
Say a game forced you into first-person perspective around towns (or on RP servers only) and had a proper, fully-expressive communication system… would that then become a selling point?
I wonder what really IS more compelling and immersive — proper communication, or the whole gearing-up thing. It would take a big game to find out…
I’ve mentioned this before, but the mood settings are the one thing I still, to this day miss about Star Wars Galaxies.
Even considering cost, some of the others seem like very low-hanging fruit, like the mentioned “anims for interface actions so you can tell when someone is checking inventory”. City of Heroes needs this, as some inventory screens completely occlude the rest of the interface (which also makes it simplest to implement, in terms of knowing when to do it, but then if any game has an ‘Open All Bags’ button it’s equally easy to say ‘IF all bags are open THEN…’).
But, you know, the main reason I hardly use the canned emotes in CoH (which are pretty cool) is that the good ones are hard to get to. Interface ease is just as big in terms of getting people to make use of what’s there. If I just ran up and want to make a sigh like I’m tired from running, I have to be able to hit it right away, I can’t root around for it or try to remember the correct syntax, and setting up macros in advance assumes foresight regarding which emotes I’ll need.
Emotion is conveyed in movies by faces that often take up 50% of the screen, sometimes more.
The main problem with avatars is you are just not going to see another face at a reasonable size, it is always going to be very difficult to convey emotion when something is viewed from such a long distance and only takes up a relatively small amount of screen real estate.
In fact you are lucky if the other avatar is even looking towards the camera.
Hence the tendencies towards gear and clothing which is more recognizable from that distance and also why dancing is popular.
I don’t know if facial expressions would work in MMOs for 2 reasons – firstly, people tend to wear helmets that obscure their face and, secondly, it’s too hard to actually see the detail on faces.
EQ2 has a /mood function and it’s pretty cool but few people use it.
Maybe an alternative would be to use mood indicators in the chat channels.
Second Life offers near-complete control of body language through overrides of the default animations and macro emotes. The trick is that you either have to have the time and money to wade through thousands of user-created animations to find the ones that suit or the animation chops to roll your own. It’s a toss-up as to which is more time consuming(even if you’re training up your animation skills from scratch).
Facial animations are limited to a small number of presets. I’m in the process of building lollipops for the Hair Fair event, and using the “tongue out” face for the licking animation. Unfortunately, the face is a “nyah” face, so along with the tongue sticking out, the eyes narrow and the brows draw together. I’ve started referring to the result as “anger pops”.
One of the things I would hope to see out of initiatives like Project Natal would be cheap, effortless motion capture. Using a tool like that, we could build a vocabulary of body language, and then trade/sell/swap the lexicons. Thus the cam might capture my exhausted, after-work schlump, but then translate that into a bold, heroic stride or sinuous ninjutsu stalk.
But will we ever have a peripheral ubiquitous enough in the PC world to be worth developing for?
We dug deeply into this topic in the OASIS HumanML working group. Some things stood out:
1. Intensity as a routable value into and out of the emote engine to a library of animations (per avatar so a standard for avatars such as h-anim is critical) is a means.
2. Proximity and individual histories are also key (say persistence of a dynamic data set).
3. If there isn’t a base set of emotions that can compound, the results are very primitive and subtlety is vital to realism.
The problem going forward was having the spooks show up too early and too heavy handedly, and the fact that as soon as they could, a small group started filing patents. My disgust for the end of what was a promising effort could not have been more pronounced.
Advanced stagecraft is where it leads so one might as well get on with the technical reality of building an assisted puppeterring system.
I still think the most important thing is getting the camera in close enough, or meshing the text and graphical components so that they’re not separate ‘boxes’ (voice communication is a good stop-gap solution, though).
I am sure big developers have thought a lot about facial expressions and then ruled them out as frivolous — throw in another suit of armour, as Raph says. But they’re not frivolous, they just need a few other complementary technologies to make them work.
If you make a game that’s all about levelling, that’s what players will play. Advancement is just too important to players as a sign of “winning”. Everyone wants to be more powerful, more than anything else. So if that power comes from, and only from, levelling/questing, then so much for anything else.
If you want to have players willingly participate in this kind of aspect of game play, then you first have to make the game conducive to social play.
Take away the desire to level and replace it with the desire to do other things, or at least as much so. Otherwise, you end up with the RPers as a separate entity, outcasts in a way. For this emote body language, you need to make it part of the game play through NPC interactions and reactions. NPCs need to recognize and react to players moods and actions, and slot players accordingly for present and future interactions. And it needs to be advantageous to the player.
“I don’t know if facial expressions would work in MMOs for 2 reasons – firstly, people tend to wear helmets that obscure their face and, secondly, it’s too hard to actually see the detail on faces.”
You didn’t play SWG (Star Wars Galaxies), did you? When I first started playing it, I really found the system magical. You typed “no2 and your avatar would wag his/her at the same time. you typed “lol” and the avatar would laugh. Thank you and the avatar would join an animation to the typed sentiment. It really really enhanced interaction, IMO.
And this was great as the early SWG had this most social and promising of classes: The entertainer. But even my more gear oriented doctor was more alive when dealing with “patients” simply because the avatar reacted to what was said. And all that without having to trigger a canned emote (although those were there too, of course).
I really miss avatars being naturally expressive and would like to see that aspect developed more in games. I understand that the cost / return on investment ration might not be that enticing though. Pity. anything that enhances interaction and avatar feedback without needing an Animation Override hud (see Second Life) improves the immersion into the virtual world.
Wendelius
We Fly Spitfires said: “I don’t know if facial expressions would work in MMOs for 2 reasons – firstly, people tend to wear helmets that obscure their face and, secondly, it’s too hard to actually see the detail on faces.”
You didn’t play SWG (Star Wars Galaxies), did you? When I first started playing it, I really found the system magical. You typed “No” and your avatar would wag his/her finger at the same time. You typed “lol” and the avatar would laugh. “Thank you” and the avatar would join an animation to the typed sentiment. It really really enhanced interaction, IMO.
And this was great as the early SWG had this most social and promising of classes: The entertainer. But even my more gear oriented doctor was more alive when dealing with “patients” simply because the avatar reacted to what was said. And all that without having to trigger a canned emote (although those were there too, of course).
I really miss avatars being naturally expressive and would like to see that aspect developed more in games. I understand that the cost / return on investment ration might not be that enticing though. Pity. anything that enhances interaction and avatar feedback without needing an Animation Override hud (see Second Life), a much less natural process, improves the immersion into the virtual world.
Wendelius
You are right. It’s an input problem. Most people would not want to blush if they blush in RL and if they don’t then moods/gmotes just turn into looks, rituals, or games. Moods as state always fails, because people forget that they did set themselves to “angry”.
Anyway, I think all this applies to text-MUDs as well, but I believe text players are inclined to emote more because that is the only way to stand out and give your character “pathos”. So, if you want the same effect in a graphical MUD/MMO/whatever you have to give everybody the same dull avatar forcing them to emote their identity… Good luck!
In every MMORPG I’ve played, *almost nobody* chats in spatial – except for Star Wars Galaxies. Personally, I believe the game’s emotes, moods and animations play a big role in getting people to chat freely outside the confines of their raid and LFG channels.
People take for granted the fact that their characters in SWG “laugh” or perform other animations based off of their chat. It comes across as so natural that it makes you want to chat. People don’t sit there and think “I’m going to chat now because I like how the animations play.”
So if you asked people “what should we spend development time on” it’s fairly likely they won’t mention chat/emotes/moods, even though those things play a very important role in their experience. I’m sure those kinds of systems are even harder to sell to the “higher-ups” at companies like Blizzard and SOE. That is too bad.
Getting people to properly enter their emotes is a matter of practice. You could not start a brand new RPG with an intricate facial and body language emote system and expect the players to automatically start using the intricate controls.
They would have to be trained. You could do this by making the emotes part of the game play with the NPC’s. Not only do you have to make the correct conversation choices, but you also have to have the correct emotions displaying on your face.
If this was done well and people got used to it, and liked it, they would begin to use it in free-form chat with other players. If it enhanced the chat it would be used. If it is cumbersome and had no real effect, then it won’t be.
I also think it would be possible to use an inset screen which showed a facial close-up.
The key is developing the input system. Facial recognition is good, but has the problems listed above. (do you really want to emote “embarrassed”?) Keyboard shortcuts seem to be the easiest, but the system has to work well and be intuitive. Sliders might work, but I think they would have to be “generalized” in some way.
The key is not the input system if that system is predetermined or determined by the player. This is the core problem of emotes: they aren’t based on the same kinds of inputs that cause humans to emote. This is why a puppeteering/actor system is designated such. A more realistic emotion system requires the emotive expressions to be triggered internally based on a range of low to high intensity signs/signals from the environment and the internal states of different internal processes.
“Moods as state always fails”. Mood as static state fails. The basic emotions represented by these systems are not basic or primitive in a human. Angry and Happy are complex states, or really, the system exhibiting simple signs resulting from complex processes. Typically, these are not worth modeling for most games, can be worth modeling for virtual worlds and are a subject of intense and well-financed studies and projects in circles where diplomacy and guile play a significant role.