Consulting, Culture, Cognition, Assessment, Research, and Media Production

Games and play for productivity and defect prevention

0

lets play videogames

In this post, Ross Smith, Director of Test, Windows Security, at Microsoft, and one of the authors of The Practical Guide to Defect Prevention (Microsoft Press, 2007), describes the groundwork on their approach to using games as a tool for productivity and defect prevention through Portfolio Selection, Game Theory and “crowdsourcing”.  He makes a case for a diverse portfolio of defect detection techniques and a means for utilizing crowds for human computation  for developing these techniques.  Smith asks:

“So, if the problem set for defect detection lies in our ability to balance our portfolio of discovery techniques, how can we involve the “crowd” to balance our portfolio on a grander scale?”

He states that:

The answer lies in the use of “Productivity Games.” Productivity Games, as a sub-category of Serious Games, attract players to perform “real work,” tasks that humans are good at but computers currently are not. Although computers offer tremendous opportunities for automation and calculation, some tasks, such as analyzing images, have proven to be difficult and error-prone and, therefore, using computers can often lower the quality and usefulness of the results. For tasks such as this, human computation can be much more effective. Additionally, by framing the work task in the form of a game, we are able to quickly and effectively communicate the objective and achieve higher engagement from a community of employees as players of the game.

Here is an excerpt and link to the original post:

Modern portfolio theory (MPT), based on Markowitz’s work, suggests that the return of an investment portfolio is maximized for any given level of risk by using asset classes with low correlations to one other. In other words, a diverse set of investments reduces risk and maximizes return. In a portfolio with two diverse assets, when the value of asset #1 is falling, asset #2 is rising at the same rate. MPT also assumes an efficient market—that is, all known information is reflected in the price of an investment. These factors contribute to an investor’s ability to create an “optimal portfolio” for his level of risk.

How does this apply to testing software? The effort we put forth in testing (or quality improvement) is our investment. Our return or investment yield is the number of defects discovered. Each of our techniques will yield a return of a certain number or percentage of defects. This is easily seen in the distribution of the “How Found” field of our defect-tracking database. In addition to the return of discovered defects, there is the risk of escaped defects: missed bugs that are found in the field. This is akin to investor loss.

The evaluation of our testing strategy based on the MPT principles exposes a set of deficiencies and enables us to improve the return on our testing investment while minimizing the risk of escapes, the same way investors maximize the return on their portfolios while minimizing the risk of loss of principle. The range of optimal portfolio selection, according to Markowitz, is called the “efficient frontier” and is derived by evaluating each asset’s correlation with every other asset’s correlation to determine the optimal allocation of all the asset classes. Once the efficient frontier has been determined for the asset classes being evaluated, the decision of which optimal portfolio to choose becomes a question of the level of risk tolerance.

In other words, once the efficient frontier has been determined for our defect discovery techniques (“how found” in the tracking database), we can use our tolerance for risk (how many bugs found in the field are we willing to accept as a reasonable level of risk) to estimate which test strategies to invest in, and how much/frequently we should invest. A diversified approach minimizes our risk and maximizes our return. When the defect yield of “how found = test case development” starts to wane, it’s time for “how found = customer” or “how found = ad hoc testing.” We are governed by the principle that the second bug is harder (and more costly) to find than the first. Yield curves through a project cycle illustrate this effectively. This is common sense to any seasoned tester, but the numbers give us a formula to predict and dictate the timing of behavior change.

The most important aspect of the diversified approach is to stay with the portfolio once it has been established, regardless of return. This takes a level of trust that we’re not used to at Microsoft and a belief that our techniques are good investments. Just as an investor might panic when a given investment fails miserably, we tend to over-react when we miss a certain type of bug. Just as a fund manager massages her investments to provide consistency, there are great defect prevention tools and techniques to improve our test strategies.

Game Theory and Human Computation

The relationship here is interesting. The year before winning the Nobel Prize, Harry Markowitz won the John von Neumann Theory Prize. From the Nobel Prize site:

“In 1989, I was awarded the Von Neumann Prize in Operations Research Theory by the Operations Research Society of America and The Institute of Management Sciences. They cited my works in the areas of portfolio theory, sparse matrix techniques and the SIMSCRIPT programming language.” John von Neumman was one of the leading mathematicians in his day, and instrumental in the development of game theory.

John von Neumann’s 1944 book, Theory of Games and Economic Behavior, helped set the stage for the use of math and game theory for Cold War predictions, stock market behavior, and TV advertising. He was the first to expand early mathematical analysis of probability and chance into game theory in the 1920s. His work was used by the military during World War II, and then later by the RAND Corporation to explore nuclear strategy. In the 1950s, John Nash, popularized in the film A Beautiful Mind, was an early contributor to game theory. His “Nash Equilibrium,” helps to evaluate player strategies in non-cooperative games. Game theory helps us to understand how and why people play games.

So, other than Markowitz winning the von Neumann award in 1989, how does MPT relate to defect prevention? The answer lies, seductively, in the use of crowd-sourcing and human computation: attracting the effort of “the crowd” to assist.

Wikipedia describes “crowdsourcing” as

“a neologism for the act of taking a task traditionally performed by an employee or contractor, and outsourcing it to an undefined, generally large group of people or community in the form of an open call. For example, the public may be invited to develop a new technology, carry out a design task (also known as community-based design[1] and distributed participatory design), refine or carry out the steps of an algorithm (see Human-based computation), or help capture, systematize or analyze large amounts of data (see also citizen science).”

and “human computation” as

“Human-based computation is a computer science technique in which a computational process performs its function by outsourcing certain steps to humans. This approach leverages differences in abilities and alternative costs between humans and computer agents to achieve symbiotic human-computer interaction.”

So, if the problem set for defect detection lies in our ability to balance our portfolio of discovery techniques, how can we involve the “crowd” to balance our portfolio on a grander scale?

The answer lies in the use of “Productivity Games.” Productivity Games, as a sub-category of Serious Games, attract players to perform “real work,” tasks that humans are good at but computers currently are not. Although computers offer tremendous opportunities for automation and calculation, some tasks, such as analyzing images, have proven to be difficult and error-prone and, therefore, using computers can often lower the quality and usefulness of the results. For tasks such as this, human computation can be much more effective. Additionally, by framing the work task in the form of a game, we are able to quickly and effectively communicate the objective and achieve higher engagement from a community of employees as players of the game.

One of the all-time greatest examples of a Productivity Game is the ESP Game, developed by Luis von Ahn of Carnegie-Mellon University (also well known for inventing the Captcha), in which players help label images. In the ESP Game, two players work together to match text descriptions of images to earn points. The artifacts of game play are text-based (searchable) descriptions of images (not searchable). More at http://www.gwap.com.

Following is a series of quotes and examples related to the importance, usefulness, and appeal of games.

As University of Minnesota researcher Brock Dubbels suggests, “Games provide the opportunity to experience something grand—flight simulators do not have the excitement that games do—games exaggerate and elevate action beyond normal experience to make them motivating and exciting. In World War 2, the likelihood of being in a dogfight was slim, but in the game ‘1942,’ you can find one around every corner. Games raise our level of expectation to the fantastic and our biochemical reward system pays out when we build expectation towards reward. Sometimes the reward leading up to the payout is greater than the reward at payout! A game structures interaction in ways that may not be available by default for special circumstances and projects. A game can also create bonds that hold people together through creating opportunities for relationships that one might not experience every day.”


Read the article in entirety


Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!