Wednesday 19 March 2008

203CR: Usability Evaluation of a Digital Game

Report: Usability Evaluation of a Digital Game

Usability evaluations have been an integral part of software design for the last 20 years, which means there are already existing frameworks used to analyse and evaluate software. These frameworks have been critiqued heavily in recent times, especially for their hit and miss techniques and standards when applied to video games. However many of the standards set by these frameworks do still apply directly to usability evaluations of digital games. This means that when deciding upon a usability evaluation method, I had to look at the practicality of the chosen usability framework when applied to the specific game in question, Team Fortress 2. This game falls under the categorization as a class-based first person shooter (FPS). This genre of game for example generally has a slightly different set of principles on what is demanded from a game to have “playability”. For instance a usability evaluation of a massively multiplayer online role-playing game should examine different aspects as opposed to a flight simulator focused evaluation. It must be noted that usability based method is not the best way to evaluate a video game, yes all games require well designed shell menus and information output interfaces, but the importance of designing a game playability must be paramount. Playability can be described as a combination of both usability and fun and can be measured by the varying reasons a user enjoys playing a game. Nielson’s Heuristics are one example of a usability testing framework that does not really measure playability. It could be said that this is because they are too software specific, they do not measure the involvement, or excitement of a game, and the heuristics are just not all relevant to most games. One main concept of usability is ease of use. This concept is argued by many to be irrelevant, because if a game is in fact easy to use, it will allegedly become boring very quickly. I disagree with this because I believe that all games do need to maintain easy to use interfaces. Without a usable and well designed interface a game with good content would feel like trying to eat the finest soup with a fork. This I think is a very sound reason for either always including some form “strict” usability evaluation, be it in collaboration with a Heuristic Evaluation of Playability or not.
After further research on the use of usability evaluations to test games, I posed the question to myself of whether to produce a new evaluation method constructed solely for games, or to incorporate and modify a software centered usability evaluation. In order to choose which method would most suit my game and its evaluation, I had to see what constraints would be put on possible methods by restrictions I would experience without sufficient funding and advertising power. For instance, I wasn’t sure whether using a university computer lab was possible as my main laboratory for conducting my Usability testing. Without a lab at hand a usability test may be hard to conduct because of the lack of consistent hardware for each of the testers to use. I decided to balance the hardware and “lab” conditions, I would use my computer room at my apartment to act as the environment for my usability testing. Other hardware capability worries were also important in deciding this change to the plan of having a lab-based study. Team Fortress 2 (TF2) takes full advantage of High Dynamic Range lighting and DirectX10 technology. Even for a complex first person shooter game, TF2 demands a high performance graphics card (DirectX9.0,16 pipelines minimal), 1Gb of RAM or over , and even more importantly a capable processor (at least 3Ghz Pentium 4). To be honest I did not feel that I should conduct usability testing on systems which fall below the minimum requirements of the game, because a horde of technical issues would probably arise, clogging my evaluation with problems that generally shouldn’t happen. Also another issue I had to consider was installing software, each computer in the evaluation would have to have Steam (Digital Games Distribution Network) and Team Fortress 2 system and configuration files installed locally.
Given the constraints placed on the evaluation I decided that I would most certainly introduce an Expert Evaluation as a way of initially outlining testing problems, areas important to focus on during the full usability testing and to give myself a firmer idea of what usability issues I would be recording. I am under the impression that if I first Expertly Evaluate the game myself I will be much more prepared when designing the user experience necessary to outline focus areas and problematic issues.
I evaluated the game Team Fortress 2, a Half-Life Modification, using a Expert Evaluation initially, followed the week after by a series of separate Usability Evaluation sessions involving four recruited “games testers”. These testers should be from the demographic that is the target audience, and because the BBFC rated the game a 15 certificate, all testers must obviously be over 15. I will use two male testers and two female testers, because TF2 has notably been popular with many women as well as men. A comparative study could possibly be conducted between the results drawn from the female and the male testers, to show the theoretical differences between the perception of experiences. The testers will come to the laboratory and play one at a time for 1-2hours, whilst being observed by myself, the “Test Instructor”. Unfortunately a one way glass observing room is not available, so I will have to observe the findings myself as well as acting as the test instructor. My role as a test instructor means that I will have to ask the games testers to think aloud during the evaluation and give the testers tasks to fulfill in the game experience. Whilst doing this I will record observations made by the user, and by myself on the users actions. I will at some points ask questions or further the users own contributions to usability issues during the test, in an effort to try and get more in depth into evaluating the user’s experience. The focus of the testing is broad, including the main menu screens, and many other sub menu’s divided by class, and activity. The in-game chat shall be evaluated, along with all typical gameplay, and the in-game dialogue shall be examined. I shall use Norman’s Design Principles as a framework to base the user testing on:
Visibility, Mappings, Affordances ,Constraints (Forcing Functions),Feedback.
The visibility of interactive features of the game is important. A game where a user has to inspect every corner of the room (known as pixel hunting) before being able to guess which item is interactive would be very frustrating.
Mappings are the relationships between controls and their actions in the game world. Successful mappings are easy to memorise and poorly designed mappings can be very confusing. Common mistakes in mappings often apply when controls are “relative” and not consistent. For example a game where the WASDA keys take the user in different directions depending on the characters relative direction can be hard to get used to, as opposed to constant and consistent mappings which are intuitive and become third nature.
Affordances overlaps visibility in such that the use of items in the game must be immediately apparent to a user, for example a button is for pressing and a key is for opening a door.
Constraints specify parts of the game where the user requires direction and restrictions in place that prevent the user from doing anything detrimental to their experience when they are not being tested. For example if some non player characters are vital to the continuation of the game, then the player should not be allowed to kill them. As an example constraint, a player would be allowed to attempt to try and attack a member of his own “squad” but a dialogue would ensue where the player is reminded of their goals and the circumstances that would follow their execution. This method is more innocuous and is in an attempt to instead re-create or improve the design devices and their affordances instead of create an invisible wall around the “game’s arc”. Another more specific type of constraint is a forcing function. A forcing function restricts the user from doing one thing until another is completed. One example of this could be in an adventure game where a user is prevented from leaving a certain level without first attaining an object required in the next level. This can also apply to skill, some games where skill is measured in-game and accumulated, will allow a player to enter areas far too testing for a character of their skill and be hopelessly defenceless.
Feedback must be rapid so much so that users can directly relate to their actions. Feedback can be in varying forms, audible, visible or tactile. One example of feedback is when a user is attacked in Team Fortress 2, as well as hearing attack noises, seeing rockets/flames/bullets, red areas flash in the relative direction that the attack came from on the users visual display unit.
These principles are all subject to change however, because of the applied nature of this framework on a game. Whilst Normans principles will not evaluate how fun or interesting Team Fortress 2 is, they will at least examine whether TF2 is consistently designed, logical, and easy to get used to and play. The lacking in Normans principles will be made up by questions tailored to fill this gap based on Heuristic Evaluation of Playability. These 43 Heuristics cover areas including game play, game story, game mechanics and game usability.
In the expert evaluation process I first examined a tutorial guide for TF2, to ensure that I was up to date on all the controls and basic instructions on how to play the game. I played for two and a half hours and noted down usability issues I found in that time. The findings were based on Nielson’s Heuristics and my personal experience of human computer interaction. Whilst evaluating the game I kept in mind that these heuristics were non-game specific, and I tried to evaluate the game like I would any other piece of software.
I used both quantitative and qualitative methods to analyse the evaluation, based on my findings from preliminary expert investigation. Severity classifications were created, and possible solutions outlined wherever possible. I decided to have a three tier severity classification method, ranging from severe through mild to minor. This meant I could easily classify the problems instead of wasting time trying to decide whether a specific problem fits into specific categories. This meant that although I used Normans design principles to identify problems, I did not categorise them in a similar manner. The Usability problems were defined and ranked in severity in order to provide a template in which to structure the tasks that the games testers had to complete during the test. Along with a detailed description of the problem, a possible solution accompanied it in order to record initial ideas. Before the expert evaluation I decided in an attempt to gauge problems by severity, to rank the issues found by the amount of time they took to resolve/fix, however it was apparent immediately that the worst usability problems in a game, are the most annoying. They are the most annoying because a, they either cause the user to be angered the most, or b, they are the most repetitive, or both.
“The depth and scope of an expert evaluation are also easy to change. For example, if there is a desire for constant input from the usability experts to the development process, then conducting several smaller usability evaluations with less experts and faster reporting may be a good idea.” - www.gamasutra.com
In general, the earlier the expert evaluation takes place, the more efficient the evaluation will be.
Expert Evaluation:
Key: Severe = red Mild = Orange Minor = Green

1. Sappers reappearing on enemy engineer's buildings even if the spy who placed the sapper is dead.
2. Problems with spy backstab. (Hits registering issues such as backstabbing from front.)
3. Red are able to get behind this fence in the setup of dustbowl's second phase (a bug allowing users behind the enemy easily).
4. Sticky bombs becoming bouncy like the grenades of the grenade launcher (these bombs are supposed to be launched and stick immediately to whatever surface they land on).
5. You are able to play as spectator and while playing as spectator you don't belong to either team, and you can kill members of each team. *** edit fixed in time between expert evalu cp_well soldiers can rocket jump on top of the trains and go into the other teams area during setup
6. Pyro flamethrower shoots slower than Pyro footspeed, resulting in a shorter range than intended whilst running.
7. After each round is won, the winning team are still able to fire weapons, notably all firing Critical hits, meaning that the scores can be drastically altered after the final capture point/objective has been fulfilled. For example scouts rush into the opposing respawn points and can kill every enemy player still in spawn with a single hit (critical damage hits).
8. Intelligence falling through the wooden ramp in 2fort base if dropped (intelligence can become unreachable until it is returned to the enemy base after 1 minute).

9. Point scoring system is unclear to new users, kills are registered as 1 point, 1 point is given for picking up the case from the enemies intel room, another for capturing the intelligence to your intel room, 2 points for a headshot, and 2 points for a spy backstab. However there are defence points when stopping the enemy from disappearing with your intelligence/capturing a stronghold, and other class specific points awarded for gameplay.
10. Medic needles not shooting “straight” due to gravity. The needles drop down over distance, because they are heavier projectiles than the small bullets. This characteristic of the needle gun, is unexpected and at first many new users will steer away from using the gun because of its limited range and aiming specifics.
11. Scoreboard problems. (I.E. Score going to zero, stats not counting.)
12. Critical damage shots (blue/red glow/green indication of critical hit landing) is unclear for users as to what is occurring and why it occurs. Random.
13. Blue engineers can build inside spawn.
14. If screenshot is taken from killcam, the game saves the file as "[playername] is looking good!.tga" and if the player's name contains fordibben characters the screenshot will not be created.
15. Capturing (scoring) the Intelligence makes you stop burning.
16. Engineers can build on top of moving trains.
17. Black screen when changing resolution
18. Graphical problems with some materials. (I.E. Glass, water, doors...)

19. Some commentary subtitles differ heavily from the speech.
20. Dead bodies falling through ground. (And "swimming" on the ground.)
21. Spy disguise lost by “stab” and stab is still registered, whereas when cloaked, stab does not de-cloak, users have to uncloak then stab for a stab to be registered. This means that when cloaked a spy cannot immediately stab anyone, first a small delay occurs in which the spy has to decloak. I presume this is intentional to balance the game.
22. Sometimes engineer's buildings show as if they were damaged when they're not. Occurs after they have been repaired.
23. Ammunition or health can't be picked up if there's a engineer's building close enough.
24. When shooting teammates with rockets, sometimes the rocket goes through and sometimes not. (Tested that the teammate when the rocket didn't go through was not a spy.)
25. Dispensers give you ammunition through thin walls, but don't heal you.
26. Sniper taunt uses the same phrases as the German medic.
27. Sometimes when you fire the minigun as heavy and then die, the effects of the minigun will stay when spectating until you respawn.
28. Spy has 3 hands if you reload the revolver while you are cloaked
29. Name not changeable during game even through developer console.
Usability Test Results Questionnaire and raw data including consent forms and photos can be on attachment 1.

Usability Test Evaluation:
In the usability evaluation 35 problems were experienced. Of these 2 were severe, 18 were mild, and 15 were minor. These problems were experienced when fulfilling tasks and objectives assigned by myself the expert tester. The tasks were given to four different games testers at different times, and if a problem was experienced whilst fulfilling the task, a record of the problem and its severity would be recorded. All users ranked the game play over 4 out of 5. Half the users ranked the game score a 3, and half the users ranked the game score a 4. Three out of four users ranked the game mechanics a 4, and one user ranked the mechanics as a 5 out of 5. Overall the game design was ranked by two people as a 5, one as a 3 and one as a 4. On average the game design was ranked a 4.25.
I will outline some of the most common and severe problems experienced:
When one test subject was transporting the enemy intelligence back to his own base, he was killed on the stairs in the lobby of the opposing teams base. The intelligence fell through the stairs and was trapped between some world objects. This meant that the key could not be retrieved by the red team, and after 60 seconds the intelligence was returned to the enemy base.
Problem also outlined in expert evaluation, issue number 8.
Solution: this bug could be removed by increasing the size of the intelligence, meaning that players don’t have to get as close to the center of the intelligence to pick it up.

Two of the games testers experienced player models displaying as opposing teams. This seems to be a skin problem, and not related to the spies disguise abilities as I initially recorded in the expert evaluation. I did not therefore discover this problem in my earlier evaluation.
Solution: “Center-ID” a feature used in Team Fortress Classic and in Team Fortress the original Quake Modifcation should be enforced. This would mean that when a player points his or her crosshair at another player, coloured text(depending on the team) displaying the player name should appear somewhere on the HUD.




All of the testers at some point experienced a problem with the game loading correct textures. These issues were outlined in the expert evaluation in issue numbers 17, 18, 22, 27 and 28. This indicates that some form of graphical hardware conflict occurred throughout all of the usability tests. I tried reverting the graphics settings to dxlevel 81 to try and iron out these faults, but they were more prevalent than ever. This fault is not a severe game play fault, but admittedly it does have a negative effect on the overall look of the game.
Solution: To fully provide a solution here I would have to be able to use a Direct X level 10 graphics card, capable of displaying the graphics as they theoretically should be. This would hopefully iron out any graphics compatibility issues.

One problem which became apparent in the expert evaluation was the fact that it is impossible to change your name whilst in-game. I prompted all users who took part in the usability testing to try and change their name as the last task in the evaluation. This issue proved problematic for all users. The “name” command in console is still recognized, however no change is made on the score board. Also at the multiplayer options screen there is the facility to change your name, which also has no affect when a player is half way through a game. This seems slightly odd because there is no valid reason why a player cannot change names half way through playing, alike all other half life modifications.
Solution: If there is a valid and easy way to change your name in-game, then a tip should be shown at startup suggesting the method because many players query this when they first start playing, it was a very common question in the usability evaluation “How do I change my name?”
Below is the performance report and statistics page, on this page like many others, tips are shown in the bottom left hand corner to educate new novice players.
Other problems that weren’t identified in the expert evaluation were noted by users, such as team-mates blocking paths and causing problems, lighting conditions causing players to mistake identity of other player characters, button combinations that are regularly accidentally pressed, players not being able to use the flag direction information output interface. This gulf shows the different approaches and their different abilities to evaluate efficiently.
Usability expert evaluation and usability testing are dynamic, and systematic methodologies that provide supporting information for games development. Together they provide a broad interpretation, the experts view, and experimental data from the testing.
“"Expert evaluation is a fast and effective way to check the usability of a game. In our case, the results arrived in a couple of weeks, and they helped us solve some major design issues. We were also able to fix numerous smaller usability problems, and avoid a couple of potential pitfalls in designing and implementing new features." - Joel Kinnunen - Frozenbyte
"Usability testing provided us with a new perspective on the game. It is difficult to know how the game is played without testing it with the real users - gamers are not predictable, especially as it comes to navigating a given level. In hindsight, I wish more of the development team could have been present at the tests, so the endless amount of choices the player can make would be more clear to everyone on the team. Level designers would do well to study the various player behaviors." – Joel Kinnunen - Frozenbyte

No comments: