Rom's Rants

Free-Roaming Hostility From A QA/Developer Perspective.

Thursday, October 02, 2008

How To Blog, Er, Break Software

Dr. James Whittaker, author of the invaluable QA training resource How To Break Software, has been working at Microsoft for over two years.

I just now found out that he has been blogging.

I really need to get back into QA a bit more.

Labels:

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Wednesday, July 18, 2007

The Myth Of Testing Tools In Games

I'm a fan of test automation. I keep up on all of the latest and greatest automated testing tips and techniques from people like The Braidy Tester. I try to tie as much automated testing as possible into the applications that I write on a regular basis. When it comes to application testing, you won't find many supporters as die-hard as I am. However, I do recognize that test automation has its limits. In games testing, automation testing is for the most part only useful for verification purposes.

What does that mean? It means that you can use automated testing tools in games to verify that content is formatted as described and to a lesser extent verify that the content is "well formed," but unless your regular testers find a repeatable type of content failure and are able to train a tool to identify that particular type of content failure, you won't be able to find what is wrong with your content.

You can use automated tools to automate game UI testing and level load testing, but very little can be done to automate gameplay testing for 99% of the games on the market. You can use automated QA to generate the massive amounts of combinations for combination testing, but you still need a human to evaluate the results in most cases.

You can automate harnesses against backend servers to ensure that the proper errors are thrown and that the proper data is passed back and forth, but you still need to be testing the game itself against the server caused by humans.

While most applications can gain a real benefit from test automation and can even reduce their test headcount needs via automation, video game testing is almost the last place where flesh and blood cannot be replaced effectively at this time.

Unfortunately, many people are under the impression that automation testing for games is a lot further along than it really is. Look at Dave Perry's take on it. (I've already called him a God-damned idiot, what else can I do?) Back in March, I dug in a bit deeper against his assertions.

Long story short, investing in testing tools in games will help release a better product, but it will not replace the need for an effective tester to wield the tool.

Labels: ,

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Tuesday, July 17, 2007

Seattle Weekly Weakly Tests Testing

Sam Kalman gave a recent article from the Seattle Weekly on testing a once-over and found it lacking.

I'm in the fortunate position to have never been on the contract testing side of things, but I've had contractors working for me both at Microsoft and at Ritual.

While the article may be a partially accurate picture of how contract testing is done off-site (or on-site with Sony), contract testing on-site used to be markedly different. Note, I said "used to be."

Inside Microsoft's Redmond campuses, space is always at a premium. Offices that used to only hold a single person get doubled or tripled up nowadays...and that's for the FTE's. Cramped space doesn't make it any easier to get your work done.

In addition, Microsoft has been shifting their staffing allocations for testing. Back with "Halo 2" for Xbox, there were three test leads, three SDET's, three Bungie FTE testers and five Microsoft FTE testers for a total of fourteen testers. There were also eighteen contract testers on the game...almost but not quite a one-to-one ratio of FTE's to contract testers. Thirty-two credited testers...not bad.

Compare that to "Halo 2" for Vista. Two test leads, seven FTE testers and two credited contractors. They must not have thought it would be very hard to test...after all, it's a port. Of course, then there's the shared Tools & Technology group that is split between every single MGS release, but since they're a shared team, you really can't count them towards test.

So at this point, we have eleven credited testers, or about 33% of the number of testers that "Halo 2" Xbox had. What do you get for that? The poor performance is only the start of the issues with "Halo 2" for Windows Vista. Hell, they didn't even spell "Windows" right in the manual. (See page 31.)

There may have been more testers, but most likely they came on at the end and how I feel about that is well documented.

I don't know how I'm supposed to feel right now...I'm just afraid that things are going to get much much worse before they get better.

(Update: 8/3, fixed typos.)

Labels:

posted by Michael Russell at | 2 Comments Links to this post | View blog reactions

Saturday, July 07, 2007

The Game Is In The Data

Over at club.live.com, they've got several "ticket games" going on. Spend an hour or two playing, and you can get items ranging from song downloads to headphones all the way up to an Xbox 360.

The game with the highest ticket value is called Flexicon, however they didn't really test their game data very well.

Here is the correct solution marked incorrect.

Here is what they think is the correct solution.

Remember, you may have a bug-free game engine, but bugs in your data will frustrate your users worse than a crash many times. Especially for games, the rule of thumb is that you generally spend over two thirds of your time testing the data.

If you aren't testing your data, you aren't testing your game.

Labels:

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Wednesday, June 06, 2007

All I Ever Need To Know About Testing...

A former manager of mine sent me a link to an article over at StickyMinds.com about the relationship between lessons learned in kindergarten and testing.

There is one exception to this list that I can see, though.

Mr. Copeland says:
If you find a defect in someone's work, first tell him informally, personally, and discreetly.
It's a great sentiment, but if it's in the code, document it before you do so to make sure that you don't forget.

Update: Fixed link. Thanks, Andy.

Labels:

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Monday, March 26, 2007

Dave Perry Felt Some Heat

Well, if you go to Dave Perry's website, you'll notice that his "Dark Future for Game Testers" post isn't in the index anymore.

If you follow the link above, you'll also notice that he updated the post.

We noticed your "cover-your-ass" comment, Dave. We're just trying to show you that your numbers game is a shell game in disguise.

You tried to keep everyone focused on the "numbers," shuffling your hands around showing a poorly run test department as an example (only testing at the end is idiotic), thousands of volunteers to "test" (i.e. play) your latest and greatest, a mirage in the form of a testing holy grail (full automation testing), and pointing to QA as a massive expense on your bottom line (QA usually amounts to 10% of the budget at most on a project), while the real meat of what you were proposing (when these multitudes would join in, what the expenses of managing this mob would be [higher than you expect, given experience with external betas], etc.) fell under the table.

So here are some numbers for you from real life.

For an external beta, it generally takes three people per 250-300 testers to handle the beta. Two are communications guys. They actually communicate with the beta testers, collect the bug reports, attempt to filter duplicates, etc. The third is a tester who reproduces the bug, works closer with beta testers who actually report bugs to solidify repro steps, and then works with the development team to ensure that the bug is actually fixed. If you have more than 300 testers per trio of support staff, the feedback overwhelms the team and the model starts to fall apart. In other words, your mythical 3,000 testers are still going to require a support staff of 30 on your end just to keep up.

Now, your company is going to be working on multiple MMO's simultaneously, so it might be possible for you to prorate the cost of those people across projects, but it's still an additional expense, and as you said, it doesn't replace the need for actual testers.

Regardless, the "meat" is that your proposal doesn't save money at all, and it has done more to damage the image of quality assurance than any recently-rushed-to-market title could have.

Labels:

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Friday, March 23, 2007

Dave Perry Blogs About Testing

Dave Perry elaborated a bit in a recent blog entry about the "dark future for testers." I figured I'd address his rant point by point.
(1) To say that paid testers are better than everyone in the community would be dumb. (Considering we hire most of them from the community.)
Nobody is saying the community is dumb, and yes, most testers do get hired from the community, but then again, we're cherry picking from the community. Those with the talent rise to the top, get hired, and progress.
(2) There's no degree course in testing that I'm aware of? Meaning a professional tester is no more qualified (on paper) than a community tester? (Other than they get paid.)
There are computer science degree programs with focuses towards quality assurance. The best one in the country that I'm aware of is Dr. James A. Whittaker's program at the Florida Institute of Technology. That said, a tester with a college degree can make over $60,000 a year in the Dallas area working on productivity software, while that same tester will be lucky to make $30,000 a year around here working on games.
(3) Never just assume you are the best at anything.
Testers don't assume. We hypothesize, experiment and report our findings.
(4) 150,000 people testing, kills 20 people testing, I don't care how you slice it. Interestingly testing needs really bad players too, yet I've never seen a test team hire some? That's the guy that jumps at a locked door 100 times because he has no idea what he's doing, and uncovers some great bugs testers don't find.
Here we're going to start getting into the numbers debates.

I get 150,000 people on an open beta. Of those, I know that only 1% on average are going to respond, so that leaves me with 1,500 responsive players. Previous data on usage for people who use an open beta shows that 95% don't spend more than 1 hour testing...period. That's not one hour per build, that's one hour total. That drops us down to 150 people who are going to be looking beyond the first hour of gameplay.

Now, out of this 150, how many are going to actually be decent testers? Most people who join open betas join to play, not to test. Play testing is a subset of testing, not a replacement for testing.
(5) Self-testing will also be a major future for games. Where the game plays itself, trying everything everywhere (to the extent feasible by the endless combinations generated) again every point on every surface will get tested for penetrations etc. A job testers can’t do accurately and it would be mind numbing to do this all day every day for months.
Self-testing is the holy grail for testing. Some companies, like Microsoft, hire testing engineers to create testing tools and automation harnesses for their games so that they can try to automate as much testing as possible, but they are approaching it from a testing point of view. Besides, your programmers are already crunching to just get their code done...are you going to have them work without sleep to get the test tools done as well?

Full content automation is still many years away. I should know, I've written content automation frameworks and used them on projects. The frameworks take time to write, the testing tools take time to write, verifying the tools work correctly takes time, etc. That was all time that I spent on my own, and they paid off, but only on products that were using stable, licensed engines. The cost/benefit just isn't there for engines that aren't stable.

But even if you can self-test, you're still missing the one big thing: how can you detect that your test failed? Failure detection is the hardest part of automated testing, especially for content automation testing, because you have to try to capture a thought process. Not all failures can be detected by code.
(6) Community testers contain hackers (rare on professional teams), these guys try every tool, every exploit on your engine and look for vulnerabilities. Again you don’t just get the one that might be on a professional team, you get dozens of them.
Most game companies that work on single-player games don't hire hackers. Most game companies that work on multi-player games have software developers in test that do white-box analysis of the code to check for issues, as well as write test tools to abuse the network connection. I'm not aware of a single MMO company out there at the moment (besides yours, it seems) that doesn't employ people who do abuse their MMO's.

The catch is that if an internal person hacks the MMO, the hack gets reported and fixed. The MMO is paying this internal person's wages, so he has a financial imperitave to report this issue. If an external person hacks your MMO, given how financially lucrative real-money trade has become, what is his imperitave to report the issue?

Dave, I understand that you're trying to cut down the cost of making your games. That said, you're forgetting the one critical role that QA plays, and the real reason your plan won't work. Developers are loyal to the code, producers are loyal to the company, but testers are loyal to the user. Their job is to try to keep the game from getting out broken...to protect the user. Most "showstopping bugs" that ship out were most likely found by your internal test teams and reported, but waived in order to make a ship date or because nobody outside of QA agreed about the severity of the bug. Out of all the titles I've shipped, the number of showstoppers I've missed is in the low teens, while the number of showstoppers I've caught but were waived number in the mid-hundreds.

If your development team won't listen to testers that you hire, pay and see every day, what makes you think that they'll listen to volunteer testers that they never see?

If you want to cut costs, approach it like the movie industry. Do more pre-production. Don't hesitate to cancel projects early on. Look for ways to reuse code and content. Create a common set of engines and tools so that we can properly harness them and automate them.

Now that said, I am going to agree with Dave on one point...the days when a tester could have little to no technical know-how are nearing their end. Testers nowadays need to know how to write automation scripts, read code, analyze stack dumps, trace through and find a flaw in a debugger, etc. If you can't do that, you won't be of much use to most modern QA departments.

But beyond that point, Dave is merely showing his ignorance about current industry practices with QA, and it's a shame that industry QA doesn't have a more public face to counter this schmuck. Until industry QA does get a "celebrity," the discipline is going to continue to lose support and have misinformation like Dave's missive spread like wildfire, because unfortunately CEO's do read him.

Labels:

posted by Michael Russell at | 4 Comments Links to this post | View blog reactions

Thursday, November 30, 2006

Seperation of Topics

As part of my process of refining this blog, I've decided that effective immediately, personal posts are going to be going to a private, friends-only blog.

Industry anecdotes, testing tips and discussion, and discussion about products will remain on this blog.

However, don't expect the removal of one topic to affect how far I go on these other topics. As this recent dust-up proves, people are not used to straight talk coming from this industry. Everyone assumes that we have an agenda, so we must be lying our asses off. One thing that caused issues for "SiN Episodes" was that we promised 4-5 hours of gameplay, so everyone assumed that it meant 2-3 hours because of "time inflation." Right now, our average playtime is 4h57m, so I'd say we were dead on.

I'm a very straightforward person. I do my best to say things as they are. While it may drive PR departments insane when I open my mouth, I'd be doing a disservice to myself and to quality assurance if I toned down my words, omitted more than was legally necessary, or intentionally misled people.

And as for an agenda, I do have one. My agenda is to bring quality assurance out of the basement and into the light. QA has become an army of disposable temps in this industry, and is seen as the invisible enemy of most development teams and the automatic scapegoat for most customers when something goes wrong. This perception will only fester and grow if nothing is said or done about it.

Labels: ,

posted by Michael Russell at | 1 Comments Links to this post | View blog reactions

Wednesday, November 29, 2006

Open Letter To QA

I've been watching the news today, and I've been seeing my comments regarding my brush with Sony's FPQA department taken with various levels of sensationalism and intrigue, so I figured I'd take a brief moment to address several of the comments I've been seeing and then drop this sordid matter before it goes much further.

To all developers: QA should be an integral part of any development process, not an afterthought. This doesn't just mean developer testing (unit tests, integration tests, etc.). This doesn't mean the certification process by the platform holder. This means real testing with real testers. Playtesting should not be how you find bugs. Shipping the product should not be how you find bugs. There are people out there who excel at finding these types of problems before they pound your review scores to dust...get them, keep them happy, and put them to work. Most importantly, listen to them. Testing without action is masturbation.

To Sony QA: I realize that your staffing structure is a direct result of cost cutting measures. However, several people in your in-house development houses joke about the bugs they receive. A big part of the reason they get laughable bugs is that when you're bringing that many people on for such a short period of time, the quality of training the testers receives suffers, as does the quality of bugs. Test leads do what they can to filter bad bugs from getting through, but there is a finite number of hours in the day and the longer the hours are that your testers work, the more items that are going to slip through their fingers.

If you want to adjust the perception, bite the bullet. Hire great testers, bring them on full-time, work them a reasonable number of hours, pay the benefits. It takes time to change a culture, but change has to start somewhere. A defeatist attitude like "the 5% rule" I was told about only proves the culture's point. (And if testing is going on from day 1, 5% should never happen.)

To the Press: A lot of people place blame on any bugs in a shipping product solely at the feet of quality assurance. Some people believe that bugs making it out are the result of QA sloppiness, or QA "not fighting hard enough" for the customer. To be honest, there are times when that is the case. However, knee-jerk accusations towards QA don't help anyone. In fact, it is reactions like that which have led many publishers to believe that since the highly-paid testers "missed this issue," they may as well employ "controller monkeys" instead. After all, they're cheaper, work longer hours, and are disposable.

And when you get an article like this, don't just take my word for it! While I stand by everything that I said, nowhere have I seen any attempts to contact Sony for a statement. Nowhere have I seen a response from Sony. The only response I've seen have been from former Sony QA members who said, "Yep, sounds right." Please try to present a balanced viewpoint.

To my regular readers: Sorry for the distraction. I didn't think sharing my experiences would lead to such a hubbub.

Labels:

posted by Michael Russell at | 1 Comments Links to this post | View blog reactions

Monday, November 27, 2006

Quality Assurance at Sony

READ ME FIRST: Given that so many news outlets have been taking portions of this out of context, I need to say this.

1) I have nothing against Sony's QA department, contrary to what some reporters have said. I was commenting on the impression I got of how QA was perceived within Sony, not QA in Sony.

2) I talk about the impressions that I got seven months ago. Things may have changed, I don't know.

3) The department in question is the "last line of defense" inside Sony. From what I have been told, individual internal developers may have their own QA staffs on top of these.

4) These were my impressions, and are not necessarily the opinions of my past, present or future employers.




Sam Kalman made a post on November 22nd about a bug in Genji found by Chris Kohler, and it begs for the following story to be told.

Back in April, I was interviewed for a FPQA Manager position at Sony Computer Entertainment America's San Diego office. Sony was extremely nice. They flew me down and back first-class, took me out to lunch, etc.

Everyone I met there was a consumate professional, but there was a lot of underlying tension. I signed an NDA so I can't go into specifics, but there was talk about issues that only came up on production UMD's for PSP games, major friction between test and development teams with little to no management backing for test, little to no shared technology, extremely lax "user effect" bug metrics for determining whether or not to fix something, and a variety of other fairly hefty issues, not just from a process standpoint, but a overall culture standpoint. Microsoft is known for giving QA a bit too much say in the products that are developed, but the feeling I got inside Sony was that QA was seen as nothing but a bunch of monkeys with controllers.

The straw that broke the camel's back came in the last hour of my interview. I was told that the way that Sony tests their games is that there are one or two test leads on a project starting at about six months out. At T-8 weeks, between 80 and 100 temporary testers are brought on to test the game for those eight weeks. That's it. This was done for financial reasons, and as a QA Manager, I would be expected to run test the same way. Obviously, I didn't feel that was a valid way of handling QA.

The following morning, I sent an E-mail to Sony removing myself from consideration for the position because I didn't feel that I could run test the way that they wanted me to.

At Microsoft, the stringent QA processes often strangle creativity. At Sony, the lax QA process allows creativity to squash quality. It's hard to walk a middle ground where QA and creativity work hand in hand, but it is a tightrope that this industry is going to have to learn to walk if it is going to succeed in the 21st century and beyond.

(Update: Welcome, visitors from Sony/Psygnosis and readers of the Escapist. Please don't take this as criticism of Sony, just of the practices as they were described to me. No company has QA perfected, and Sony has released some wonderful titles over the years. However, past success is not a guarantee of future success as this incident proves. Trust in Sony's ability to deliver is already shaken, not only from a consumer standpoint, but a developer standpoint as well. [Hell, I still haven't received my taxi fare reimbursement...]

First-party games are supposed to push the envelope with killer gameplay, crystal-clear graphics and first-rate quality. First-party games are supposed to sell not only the abilities of the console, but the promise of the platform.

Consider this a prod towards delivering the true promise of the platform: next-generation gaming for the masses. The masses don't like patching.)

(Edit: 10/24/2007, 9:30am: Added sponsored links.)

Labels:

posted by Michael Russell at | 5 Comments Links to this post | View blog reactions

No Respect...

Just getting ready to go into work this morning, but I saw this article over at Kotaku and had to say something.

Look closely at the linked picture, and tell me what department is missing. (Hint: Playtesting != QA)

Labels:

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Friday, November 17, 2006

Automating Games QA (part 3)

In this installment, we're going to be talking about combination testing.

Most games nowadays have some sort of customization system, be it your character, your "crib," your vehicle, etc. Testing the entire gamut of combinations by hand can actually get to the point where it is impossible to test within the time available for testing.

For example, let's say that you have a standard human avatar with a customizable shirt and pants. There are 10 different shirts available and 10 different pants available. That is 100 combinations right there. Let's add 10 hairstyles. That bumps it up to 1,000 combinations. Add 10 different fleshtones...10,000 combinations. Add a second gender...20,000 combinations. Add 5 different faces per gender...100,000 combinations. It adds up quickly.

Combination testing is designed to hit the two simplest types of bugs: single-value and two-value settings.

Now, if you look at the example above, while there are 100,000 combinations, there only a few individual settings: 10 shirts, 10 pants, 10 hairstyles and 5 faces per gender. That can mean either 40 settings if you assume everything else equal between the genders beside the faces, or 70 settings if everything is seperate between the genders. An automation script to individually cycle each of these single-value settings can quickly help eliminate bad items, and if the script screencaps each item, manual verification of item appearance will go fairly quickly.

One last thing: while handling the single-item tests, check the amount of memory that each item uses. A good additional test is to set all of your settings to their most memory-intensive setting and play the game that way to check for borderline out-of-memory conditions.

The second most-common type of bug is when two values are interdependent on each other. For example, you may have a hairstyle that clips through some geometry on a certain shirt in this example. Now even with automation, you're still going to have to manually verify the screenshots, so you want to minimize the number of shots you are looking at. This is where all-pairs testing comes in. This type of testing is also called pairwise testing, and there is a very in-depth example here.

This gets very easy if your combination lists are data-driven. Feed your lists into a tool like ALLPAIRS from James Bach and pass those lists into the game for testing. Pass the created list into your framework and have at it.

Automation testing is all about getting the grunt work done by the machine so you can focus on the non-automatable tasks. Always be on the lookout for tasks like this that you can automate.

The next games QA automation column will be on automating content testing. It's probably the hardest type of automation, and it will not be a good fit for most studios, which is why I left it for the end. Stay tuned.

Labels: ,

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Tuesday, November 14, 2006

Right/Wrong vs. Right/Left

A lot of people go into games QA with one of two mentalities. Either they come in thinking, "cool, I get to play games for a living," in which case they generally leave the industry after five months never to return. Or they come in thinking, "there is so much wrong with games, this is my chance to make things right."

There is a lot of potential if you come into the games industry believing that you can fight the good fight and win the war against poor quality crapware, but I've found that people who keep that attitude burn out fairly quickly. It isn't that it's a wrong attitude, but the way that the games industry works, "right vs. wrong" just isn't...right.

I bring this up because I like making fun of commercials. Recently, they've been showing a commercial for DVD boxsets for the old Superman TV series and the last season of Lois and Clark. In the commercial, Dean Cain as Superman states in a matter-of-fact fashion, "I stand for what is right!" I always reply saying, "I stand for what is left!" For the most part, my reply is a joke, but I started thinking about it, and it does actually apply to how people tend to survive in QA.

Everyone on a team can pretty much agree about the "right" bugs to fix. Everyone on the team can also agree on the "wrong" bugs to fix: the ones that will result in additional instability, the ones that nobody will ever see, the ones that only occur if you noclip out of the world, the stupid shitty bugs that never work. However, between the "right" and the "wrong" bugs are the bugs in the grey area...the bugs that are left.

As testers, we stand for the bugs that are left. We fight for the bugs that aren't slam-dunk "must fix," but will have a serious impact on our customers. We wade into the grey, and escort our issues into the light.

Shifting from a "right/wrong" mentality to a "right/left" mentality isn't easy, but it makes survival in this industry so much easier.

Labels:

posted by Michael Russell at | 2 Comments Links to this post | View blog reactions

Monday, November 13, 2006

Automating Games QA (Part 2)

Note: Prior to reading this post, please read the previous entry (link here) as well as this article on Gamasutra. Mr. Cooke only briefly touches on the test system architecture on page 4.

This post is meant to discuss game flow testing. The role of game flow testing is to verify that the appropriate content loads and that the proper branches are followed. This is generally more complex than simple UI automation testing and can require additional hookups as part of your game hookup.

The basic flow of a game flow automation script is to start the game from a known state (generally "New Game") and pass level completion/failure states to the game so that the game will progress to the next state, and so the game will record which state it is in at each step.

For example, let's say that you are working on a linear first-person shooter with no inter-level transitions. (Think Doom 1.) Your script starts a new game and verifies that you are in E1M1. Your script then passes in a command to tell the game that you have beaten E1M1. It then verifies that E1M2 loads. If you have a failure case or an alternate test, you'd handle that in a seperate script.

That's the simple answer in a nutshell. You aren't just loading each level, you're attempting to verify the links between the levels as well. If you have a data-driven level flow, you can often automate the creation of the scripts from that dataset.

One other nice part of this is that you're also checking for compounding issues. A lot of the time, issues won't manifest themselves in a level that is loaded anew because the issues are the result of memory fragmentation or memory leaks from a previous level. This is a nice, automated way of helping to bring those problems to the forefront.

Now, this does not remove the requirement that you still play through each path. This verifies that the links exist, not that the links can be triggered through normal gameplay. But if the link is broken in the script, you can save some time during testing by avoiding doing a known broken task.

In the next installment, we'll be going over combination testing, and the final installment will go over content testing theory and practice.

Labels: ,

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Wednesday, November 08, 2006

[Testing] Automating Games QA (Part 1)

In most software houses, automation has made major inroads over the last several years. Vista has over 700,000 lines of automation code that is run against every build. Companies like WorkSoft try to make creation of automated test cases easier for average test departments. Even unit tests run by developers are, at least in some way, considered automated tests.

However, automation testing has had a difficult time infiltrating game development houses for a couple of fairly hefty reasons. The first reason is that the number of bugs contained in the code are generally dwarfed by the number of bugs contained in the content. The second reason is that it is difficult to impossible to automate most games because most games have an element of randomness to them, and as such, it makes it difficult to determine success or failure on a test case in an automated fashion. Finally, there is rarely, if ever, a standardized way of querying a game about the state that it is in, or even passing input to a game to trigger a response.

This isn't to say that automation is impossible in a game scenario, but automation does require an additional level of developer interaction and even imagination that other scenarios simply do not have. There are generally four areas where automation testing can be efficiently used in game development: User Interface, Game Flow, Combination and Content.

User Interface automation testing is where you are going to see the biggest initial gain from a QA standpoint, and is an excellent place to push for an automation start in any company. The goal should be that anyone in your QA department should be able to write automation test cases without much training. I'm going to describe a simple framework that you can share with your development team as a starting point.

A game UI automation framework generally consists of four seperate components. The game hookup, the communication component, the use case library and the test cases themselves.

The "game hookup" is a piece of code inside the game itself that listens for commands and queries from the communication component. For example, on the Xbox, your game hookup may just be a background thread that sits and listens on the debug channel. On a PC game, it may listen on a named pipe or an IP address or some other similar item.

The "communication component" is generally going to be a COM component that sits on your PC, and is responsible for brokering communication between the use case library and the game hookup.

The "use case library" is a set of user actions in user-oriented-named subs written in VBScript. Each use case sends the appropriate commands to the communication component to execute a certain action, and requests information from the communication component to verify that an action executed correctly. This is usually jointly maintained by the developers and more technically oriented testers.

The "test cases" are the actual test cases that call the subs in the use case library to handle each individual test case. Let's hook all of these up and see how these would work and evolve over time.

You are working on an Xbox title. As a test to verify that UI automation is useful, the development team agrees to create the game hookup and communication component. The game hookup will only recognize a limited command set, "getscreen," "getcontrol," "getvalue," "nextcontrol," "prevcontrol," "cancel," and "activate." "Getscreen" returns the name of the currently active screen. "Getcontrol" returns the name of the currently active control. "Getvalue" returns the value of the current control, but can later be extended to let you query more data. "Nextcontrol" and "Prevcontrol" are simply tab-order style items, and act by sending the "up" or "down" input for the main controller. "Cancel" acts as the B button. "Activate" acts as the A button or START button. The communication component takes one argument when it is created: the name of the Xbox on the network that you want to control via automation. It has a sub for passing a command, a function for passing a query, a sub for sleeping for one tenth of a second, and a sub for restarting the console as a cold boot and launching into your game.

You decide that all of your test cases are going to be designed to start from power-up. As a result, your first use case will be get to main menu. The flow at this point in development is no videos, just the "press START" screen and then the main menu. You write your use case similar to this pseudocode...

Sub GetToMainMenu(Xbox1 As TestControls.XboxControl)
Xbox1.Reboot
For X = 1 to MaxTimeOut
If Xbox1.Getscreen = "press_start" Then
LogSuccess "At Start Screen"
Xbox1.Activate
For Y = 1 to MaxTimeOut
If Xbox1.Getscreen = "main_menu" Then
LogSuccess "At Main Menu"
End If
Next Y
LogFailure "Could Not Get To Main Menu"
Exit Sub
Else
Xbox1.Sleep
End If
Next X
LogFailure "Could Not Get To Press START Screen"
End Sub
A sample test case that called this would be:
Sub TestSystemLinkMatchmaking
Dim Xbox1 As New TestControls.XboxControl("TestXbox1")
Dim Xbox2 As New TestControls.XboxControl("TestXbox2")
GetToMainMenu Xbox1
GetToMainMenu Xbox2
...
As part of your nightly build process, you then hook up a couple of these scripts to be executed after the build is done and deployed to some Xbox test kits. When you come in the next morning, check the logs.

So, let's say that TestSystemLinkMatchmaking reports a failure. It's easy to open the test case and manually try the steps inside to verify the failure. If it does fail, you can even tell the developer to just run the automated test case to trigger the failure. It saves the developer time to repro and debug the problem.

If your use cases are prolific enough, you can even write individually coded cases for regression testing and repros of manually found bugs.

It requires some effort to create a system like this, and changes to the UI require adjustments to the use case library. Major changes may even require changes to the test cases themselves. However, the time savings in comparison can be fairly hefty.

Obviously, this is an extremely simplified example, but hopefully it got you thinking.

In the next installment, we'll go over game flow testing.

Labels: ,

posted by Michael Russell at | 1 Comments Links to this post | View blog reactions

Thursday, November 02, 2006

[Testing] Where Is Our XP?

I had an interesting conversation today with a gentleman who actually seemed to feel that my Microsoft testing experience could be a hindrance to me in the future. It was interesting because he was the first person who ever had a negative leaning to that experience, but after continuing the conversation, not only did I fully understand where he was coming from, it led to lots of questions in my own mind.

First, some background. At Microsoft, testing has a level of power that is unheard of outside of in "the real world." While most test teams are kept in check by effective test leads and managers, some testers tend to abuse their power and use the bug database as a bully pulpit for design changes. On top of that, the process guides everything. There is a very rigid flow of specs, plans and test cases that have to be gone through for most items. The practical upshot of this power and process is that the test team becomes a laser-focused tool dedicated to beating the living shit out of your product and finding as many bugs as humanly possible in the short amount of time that they have.

However, there's a limitation to the Microsoft process, and it's a fairly major one...what happens if the project changes dramatically tomorrow? The Microsoft testing process is ideally suited to a waterfall development system, but does not adapt well to more iterative development methodologies. The level of prep-work that is done for a Microsoft-level testing scenario can turn out to be wasted effort after a single day of pair programming working on the core of the system during a Scrum run.

This led to the big question. We have agile development systems. We even have agile content creation systems. Where is our agile testing system, our Extreme Testing? Most testing systems are based on the work done by Kaner. What about Whittaker? While his work streamlines testing, does streamlining really make us more agile? The difficult part of all of this is that while the definition of the program can change, our duty towards the program remains the same: ensuring that it works. This does require at least some level of preparation and planning.

Personally, I shoot for more of a loose plan based on the milestone deliverables and weekly goals, and that tends to work quite well, but it still relies heavily on work done during previous weeks adding up for the end. This doesn't let me turn on a dime, but I can course correct fairly quickly. It's not like the pure MS methodology, where redirecting test is like turning a luxury liner so that it misses an iceberg...

Since I have a fairly QA-oriented audience, I pose the question to you...How do you keep your test department agile in the face of Extreme Programming methodologies?

Labels:

posted by Michael Russell at | 2 Comments Links to this post | View blog reactions

Monday, October 23, 2006

[Testing] Purpose of Third-Party QA

I had a very interesting E-mail last week, and I thought I'd share an edited response on this blog. The paraphrased question was, "What is the purpose of third-party QA? Don't the publishers have QA departments of their own?"

It's actually a very valid question, and the answer is fairly basic, but is going to require a fairly hefty explanation. The primary purpose of a third-party developer's QA department is to save the third-party developer money over time. In-house QA saves money for third-party developers in two primary ways: it reduces publisher chargebacks and can reduce development time.

You may remember when I was going on about how much money it took for a game to break even. A big part of why it takes $40 million in retail sales before a third-party developer sees dime one above their royalty advance is that publishers tend to charge back every possible expense towards the developer. Two big expenses that get shuffled back are QA and support, and in-house QA can help reduce both of those expenses.

With in-house QA, your testers can find significantly lower-level bugs than publisher QA earlier. Testers can help ensure the stability and completeness of milestone builds prior to their submission to the publisher. In addition, if your QA department has a good relationship with their publisher counterpart, they can even help ease in some of those "borderline" milestones...especially since it's the decision of the publisher test lead usually as to whether or not your milestone is going to be accepted. Both of these result in lower amounts of downtime for publisher QA, as well as reduced bug counts from publisher QA...which tends to result in a decrease in QA costs being charged back to the developer. Also, fewer bugs in the final product tend to result in a decrease in support costs and an increase in good will from the community.

Of course, the ability of testers to find these lower-level bugs directly corrolates to the ability levels of the testers you have, their knowledge of the game assets and codebase, and the degree of communication between your test department and the rest of the product team. If your team has a "throw it over the wall" mentality, the benefit you will reap from in-house QA will be greatly diminished. There is no reason to keep in-house QA in the dark about any part of your project.

The second way that in-house QA saves you money is by potentially reducing development time. Some major publishers do not put any appreciable amount of QA staff on a project until eight weeks prior to ship (*cough*Sony*cough*), and even then, they're generally low-paid contractors. However, under the infinite monkey theorem, these are invariably going to lead to a large amount of bugs coming in during the time when change management is the order of the day. By having your internal QA department finding bugs from day 1, as long as you keep fixing the bugs as they are found rather than leaving them to the end of a milestone or the end of the project, you can reduce the number of man-hours spent during the end of a project doing major refactoring and just focus on the simpler bugs that the infinite number of monkeys hurl your way.

Again, this assumes that you are in fact fixing the bugs as they are found. A good way to enforce this is via bug count metrics being included inside internal milestone requirements. (No more than X total bugs open, no more than Y sev1/2 bugs, etc.)

If you have a good in-house QA department and a good in-house QA process, the expense you incur via QA will be offset by the faster turnaround in milestones (and their related payments), the reduced amount of chargeback towards your royalties, and improved name recognition as a company that releases stable, fun products.

Labels:

posted by Michael Russell at | 0 Comments Links to this post | View blog reactions

Tuesday, April 18, 2006

[Testing] The Hidden Costs Of Testing At The End

There's been a disturbing trend in the gaming industry lately. More and more publishers are scaling back their test organizations to only having a small quantity of test leads, and then hiring on tons of contingent testers in the final eight weeks of a project. This is a bad idea.

On paper, it seems to make sense. You can get a contingent tester for about $320/week on average, while an experienced tester with three years experience is going to run you $600/week, and your test lead is going to run you just over $900 a week. (For the sake of this argument, we'll measure coverage by contingent manweeks, and consider a test lead and a regular tester as two contingents for purposes of measuring productivity. These figures also assume that nobody will ever be working overtime, which as we all know, is completely unrealistic, but the numbers would get a lot scarier if we factored in the OT.)

The way that some companies work is that they have a test lead on the project from the beginning, add three regular testers to the project about six months from ship, and then bring on about six contingents during the last four months. On a one-year project, that works out to about $115,000 for 324 contingent manweeks of testing. Under the "new" approach, a test lead is assigned six months prior to ship, and about 34 contingents are brought on eight weeks before ship. This amounts to about $111,000 for the same number of contingent manweeks, or a savings of about $4,000 on one project.

The "savings" scale the longer the project is going on. Assuming a two year project, given the above rules, the contingent spending wins $144,000 to $163,000 if you bring in 47 contingents to match the manweeks.

So on paper, these sorts of decisions make sense from an accounting standpoint, until you factor in two items: facilities costs and software development.

First off, it's significantly cheaper to come up with space and equipment for ten people than it is for 35. Space, power, computers, consoles, TV's, monitors, snacks, etc., it all adds up.

Second, let's say that after 324 manweeks on a project, your test team has found a total of 5,000 bugs. Under the initial system, those bugs would have been found and fixed throughout the development cycle. Under the "cheaper" system, all of those bugs would hit the development team at the very end.

Massive amounts of showstopper bugs at the end of a project lead to slips, and that's where the savings erode. For every week slip under the initial system, testing costs increase by $4,600 just for manpower. Under the contingent-heavy system, you're paying them all for an extra week, so your 34 contingents and your test lead just added an extra $11,800 to your project's cost. Larger teams, like the 47 CSG team mentioned earlier, add an extra $16,000 a week.

On a one year project, a one-month delay eats away all of your savings. On a two-year project, six weeks eats away your savings. Of course, some companies decide to ship and patch rather than slip, but that costs reputation and support costs as well.

So by scaling back your test organizations, what are you really saving?

(Edit: Added tag, fixed typo.)

Labels:

posted by Michael Russell at | 7 Comments Links to this post | View blog reactions

Search



Previous Posts

Archives

View Michael Russell's profile on LinkedIn

I use Blogger. Do you?

Subscribe to
Posts [Atom]

Add to Technorati Favorites
I'm an atheist.