GDC2008: Master Metrics

 Posted by (Visited 11982 times)  Game talk  Tagged with: , ,
Feb 202008
 

Master Metrics:
The Science Behind the Art of Game Design

Dan Arey and Chris Swain, USC School of Cinematic Arts

Our purpose in putting this together — we are game designers, and this is directed at you as game designers. We got a survey from different folks in the industry about measurement-based design techniques, how they are used to create better play experuiences. They mostly involve analytics and measuring player behavior.

Method: talked to a lot of people we know, wanted to cover a lot of design aspects. Not intended to be a Best Of list, and if you have metrics-based techniques to share, let us know!

Important: these are tools to assist the creative process. period. We do not think you can or should put a formula on game design.

Parallels to early aviation…

If we could find a parallel, this is an example (shows movie). A lot of the early aviation stuff was trial and error. (lots of crashes, etc). The reason the Wright Bros managed to move fast is that they approached things from a very scientific point of view. They made bicycles, but they applied significant amount of methodology to it. Game design is ripe for that as well. There has been a lot of reinventing the wheel, and people are looking for ways to have repeatable success, so we can make smarter and stronger decisions.

The early aviators could build things that were land-based or on water. When they took their trial and error culture to try to make an airplane, it was a more difficult problem. The Wright Bros still had problems, but they applied technology and science to it. They created the first wind tunnel, for example.

1. Culture of trial and error
2. natural compulsion to fly, like we want to make emotional connection and tell stories
3. Creative breakthroughs driven by method.

First person who responded was Michael John from Telltale Games. Comes from an academic background. His point was, use the scientific method in game design. (Shows summary of scientific method). This is something we do not think of enough in development. Not to make formula, but form. Wright brothers were calculating things like lift and drag, etc. Once the Wright Brothers brought their plane to France, the French then adopted their techniques and methods…

So Michael John’s point is that we already quantify things, etc. We make value judgements, but what he thinks is that we should use science and algebra as a metaphor. As in algebra, move everything to one side of the equation, isolate elements so you can make smart choices.

Put this to use in a test play level. You could try to have a test level and pin down jump distance, but you have to have already picked a gravity, or else you are changing two variables at once.

Scaffold your mechanics. In order to build complex structures you need both horizontal and a vertical approach, methodical. Otherwise you end up with a barn raising, which is a mess. It’s more about discipline than genius.

On to Metacritic. Metacritic is a website that looks at reviews across all these different media and builds a weighted average of the scores. More influential critics are weighted higher. Range from 0-100. A 5 point Metascore increase results in ~50% increase in revenue. They exclude the movie games, because it tends to have a lower metascore per sale, because of the power of the license.

So can you analyze metascores to see what succeeds from success and failure? And as a creative person, do you want to?

Comparing the top game of the year per platform, versus the dud games:

1. large in scope
2. variety of player choice and activity
3. highly replayable
4. top quality visuals and sound
5. responsive and easy controls
6. engaging story and characters
7. quality interactive world and AI
8. responsive camera

To make it to the top, you have to be on par with the other top games in terms of visual quality.

Things that suppress metascore

1. undifferentiated gameplay
2. shoddy production values and controls
3. player unsure what to do or what just happened
4. mechanics disconnected from premise
5. noninteractive environment, too linear
6. does not flow, too hard too soon
7. save points too spread out
8. long repetitive load screens

Should you use this info?

inward focusĀ Ā Ā  > artistĀ Ā Ā  scientist
outward focusĀ Ā Ā  > designerĀ Ā Ā  engineer
^Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā  ^
move mindsĀ Ā Ā  move molecules

It would be unheard of for Picasso to have a focus group. But we are more in this bottom line, especially if we build for commercial purposes. (This chart from PARC).

Good to know about market trends but important not to follow literally, or you end up breaking #1, about differentiation. Know the rules to break them.

Metacritic has become a huge force in the industry, publishers do team bonuses on metacritic scores now. We are hitting what film has hit.

Fritz Zwicky, Naughty Dog. The Idea Box. From a book Thinkertoys, Michael Michalko. Morphological analysis. Great book, we’re scared about creativity sometimes, but this is about repeatable stuff, not waiting for inspiration. A tool to generate ideas via analytical approach.

1. Define aspects or params of thing to be brainstormed
– flower: petals, stem, color, color, smell, etc,

2. List as many in each param

3. then select and combine

DaVinci did this a lot, recombining for faces. Exploring combinatorial choices. There is a theory that the Mona Lisa is a composite, for example. In Last Supper, same thing.

Jak character design had columns. FPS level could have lists for location, condition, goals, obstacles, equipment, enemies, and just pick one item from each list, and do a Chinese menu to get ideas.

Dave Perry calls it “reverse deconstruction” brainstorming

1. choose area to inovate
2. deconstruct
3. combine

1. list every way to die in every game (huge list scrolls by)
2. then pick one, like say “bursting”, or crushing death — then in each one list everything as well

Then by scanning you pick stuff… jousting lance which shoots a biological virus which cases people to burst like a human balloon — a 3 minute method to invent.

This is the method used by fast food companies, car companies, etc, btw.

MS User research group… using heatmaps. When a project goes thru MS, 3 people from the user research group assess the gameplay experience. They are a real thought leader in this area.

1. usability testing – can user operate software
2. playability, does user have a good play experience
3. instrumentation, how exactly is the user playing, using tracking software

This is the first year that they are talking about this stuff publicly, the Wired article, etc. Here’s a picture showing black dots on the Halo map. So dense on deaths that there is no info. So let’s tie it to color intensity. Then patterns emerge, you can see a pattern of where people tend to die.

In single player:

– tracking time on task, red zone indicates usability problem
– comparing if designer intent matches what players do… designer maybe wants intense “speed through gauntlet” feel, but heatmap shows players moving slowly…

In multiplayer:

– tracking deaths by weapons lets designers read exactly how players use items, more useful than written reports or lists of data. Designers collectively tend to be visual thinkers.
– Designer tuned placement of items and terrain to achieve most satisfying play experience.

User researchers independent from developers. Researchers help quantify into something measurable. Designers say “We want feeling of chaos” — researchers help pin that down.

Researcher are passionate about good game experience, but dispassionate about design specifics. Developers tend to fall in love with their designs.

20 people in their group, about 3 researchers for a big title. Heatmaps built with Tableau, generic easily available tool.

By changing and adjusting spawn points, etc, can have significant effect. This opens up new doors compared to what we were doing 20 years ago in level design.

Bioware, also level design, similar. In NWN 2… crafting balanced mix of activities using time spent reports. Leader in rich story games, of course.

Time spent report; logs what players do, time per activity, cinematics, combat, convos, etc.

Shows time on average for players. Helps devs judge if the balance is right for that level.

Planet Vermeer:
5 mins in movies, 21 mins in comnbat, 13 is convos, 8 in maps,
57 in walking
10 in vehicles
114 in total…

Often the amount of time in combat is close to what there is in convos in Bioware games.

If you look at different Mass Effect levels, they have different mixes. Players like a variety. First version of Noveria showed a lot of convos, not a lot of combat, so ploayers said it was slow. So they changed the mix, added more combat, removed some convos to get the right mix. Citadel is much much more convos heavy — 99 mins of convos and 22 of combat, because Citadel was important to the story.

So if you are just making judgements why need a report? Because the playtesters’ report, versus the measurement do not match. Tester perception and actual numbers skew based on factors like cool battles or bugs.

And there’s a gap between designer perception and the actual numbers, skewed by infatuaiton with given dialogue or character or whatever.

Numbers are helpful and precise in a collaborative environment. Ian Stevens-Guille:”One measurement is worth 100 expert opinions.”

Control design: Activision Central Design, Carl Schnurr, and the book 21st Century Game Design.

Controllers are where rubber hits the road for games. Long history of different sorts of controllers. (Shows controller family tree). Look how more complex it has gotten, the Wii reverses the trend somewhat. But gives you a sense of how much more complex things have gotten… look at this bra cup controller, designed to move a Pong Paddle back and forth… foot controllers like DDR… Guitar Hero, Wii… we are getting people off the couch. Breaking the trend.

But we are also talking about what are them ovements and how do we measure these. The idea is “control dimensionality metrics.”

Andy gavin: “Control is 1/2 the battle in game development, and after that it is it he other half as well.” – Naughty Dog.

Have as few controls as possible. Can you measure complexity of controls?

Quantify “input demands” or options, and measure the number of degrees of freedom to a control spec. Compare it across a range of games.

1 dimension: left-right
2 dimensions: up down left right
3 adds in out
strafing andds 2 buttons
throttle add 1 or 2
jumping adds half of one
attach adds a half button…

Tetris: dimensional rating of 2.0 one degree of movement. Half Life is 7.0 with everything.

Shows graph of history of controllers and the dimensions. (Very cool graph!).

Sort ATVI games based on control complexity, and then get standard deviations. Then match these up to Metacritic and sales.

EmSense: measure physiological reactions : heart, motion, breath, eye, temperature… and a brain sensor. you contract with them and they measure th egamers as they play. Then they form a model of human response. They can measure adrenaline, positive emotions, thoughtm, etc, and track it through a level. Are thes emeasurements actually measuring what they say they do? And the answer seems to be yes.

Why not just ask? Because players cannot articulate their thinking.
Good or bad events skew the player percetion of the whole experience.
Biosensor data maps precisely to points in a game level.

Many clients including THQ, also run games from the market on their own to build a database. Tested top 50 games from recent years… in Zelda: Can see thought falling at start of battle, then rising as players determine a strategy, then spikes again right at victory. Can see adrenaline rising quickly in the combat. And positive emotion spikes when you determine the strategy and when you get rewarded.

Case Study from MOH:European Assault. You can watch engagement: you can see a big drop at the spot of the second restart; the first one wasn’t huge, but the second one cost about 15 minutes in terms of engagement, took that long to get back to where they were feeling good. They found this was a shelf moment for people.

Lesson: long period of long engagement are bad. More than five minutes. Long periods of long engagement are also bad, cause burnout. Short cycles of high/low engagement are good.

Example from Gears of War: Knock Knock levle, dip in engagement when you have to defend, but the time you spend there is only 1 minute. And the dip leads to a big spike, which is very satisfying. 80% of the most satisfying moments involve defense of a captured position, which really means short period of low engagement ollowed by a spike, eg forced relaxation.

in FPS games, and other games probably, players like these high/low shifts. And this tech lets you track it.

XEODesign, Nicole Lazzaro. studies player experience. Identifies 30 emotions, mapped onto a chart, and equated them with types of fun or mechanics. Fiero, mastery, hard fun.

—–game

goalĀ Ā Ā  Ā  +Ā Ā Ā  open

——life

hardfunĀ Ā Ā  Ā Ā Ā  easyfun
peoplefunĀ Ā Ā  relaxation

Games that offer 3+ types of emotion do better in the market place. Because they offer more options for the player to feel. So how do you evoke these? Certain game mechanics map to them. They form a language of choice. Hard mastery creates fiero graph of player skill vs difficulty, and oscillation betwee fiero and relief, which sort of mirrors the EmSense stuff. Boredom if you fall too low.

Easy fun: unexpected vs expected, high can be dibelief, low is disinterest. etc.

Friends create amusement, etc.

Case studies (Swain’s opinion)

Sims 2: low fiero, high curiosity, high relax/excite, high amusement. Grand Theft Auto, Nintendogs, Guitar Hero 2, etc, same deal, diff patterns but 3 of 4 if not 4 of 4.

The first 300 seconds: Dan Arey. Intros are the window to the game experience. Is it measurable? What happens on screen. You need 6 things:

– 3 oh shits
– 2 omg
– 1 no fucking way

You need to grab: eye candy, action, engage. Can you define this?

You can actually graph these in a movie, see where they are giving you these individual pieces. The Matrix has an incredible density of these in the first 5 minutes. God of War has you in combat in 1 minute, and the first OMG is in 3 mins at the Kraken. Measure it upfront, it makes or breaks a lot of stuff.

Fullerton/Swain: concept phase, pre-production, production, QA. make the user central, test at every step with users. Prototype, playtest, evaluate, in an ongoing spiral. As the loops get tightest, you launch. Measure player response, to spiral.

  5 Responses to “GDC2008: Master Metrics”

  1. EmSense has a very interesting value proposition not only for gaming, but advertising and marketing in general. I cover a very important issue regarding the uncharted territory EmSense is going into in the 24 February 2008 posting in “The Marketing Consigliere Blog.”

  2. […] the equivalent would be the trend towards metrics driven development. Raph Koster wrote up the Master Metrics presentation given at GDC by Dan Arey and Chris Swain from USC. They talk in part about Microsoft’s […]

  3. […] different folks in the industry about measurement-based design techniques, how they are used […] Read More… Published Wednesday, February 20, 2008 5:41 PM by Raph’s Website Filed under: Game […]

  4. […] 20th 2008 11:41pm [-] From: raphkoster.com […]

Sorry, the comment form is closed at this time.