Games and classrooms do not have a monopoly on learning.

From an evolutionary perspective, fitting context to enable behavior should lead to extinction, not evolution.

What is meant by this is, when we become to comfortable in our environment, and we are not forced to regularly adapt to new challenges, our likelihood of evolving diminishes. What this means is, we prefer to exist using what we already know. This makes life easier, but may reduce our fitness and success when facing unforeseen challenges. But still, we tend towards the familiar, safe, comfortable, and predictable. Perhaps this is why we tend to buy the same products over and over; why we love sequels and series; and why we emphasize standardization in education.

When applied to learning and game design, this implies that people should prefer playing with patterns from memory, rather than learning new ones.

 

Perhaps this is why people prefer a sequel. They know the characters, the context, and the story arc, and will play the same game over–preferring little twists, reveals, and challenges to scenarios we have to learn from the get go. With prior knowledge, we can make predictions about the patterns in the game, and be more successful right away.

This could be a reason a game like chess, or go, or any game where we play another person is motivating–we know the rules and patterns of play, but we look for twists and variation in strategy, but really enjoy when we are surprised by a new tactic that forces us to see the game in another way. This idea of subverting player expectations is much like the experience of irony.

There are a number of ways that irony is defined. For our purposes, the situational usage seems to explain irony in the way it was intended to be used for games, and perhaps events in a novel:

incongruity between the actual result of a sequence of events and the normal or expected result.

The idea is that we might predict a pattern for how events in a story might play out.

The expected familiar patterns often found in sequels can lead to a conundrum in game design. But when used to twist and subvert the player’s expectations, the designer may move into the world of the deep, subtle, and enlightened. What we typically ascribe as art.

How does one take a single player game and create subtle adaptations to game play?

This would certainly be useful in developing learning games. As we experience these subversions of familiar patterns, we may begin to question them. We may begin to see that an action in one event, can have effects in subtle ways in the future — such as being known as a villain and untrustworthy, and having your reputation precede you. This could lead to difficulty in a game where you might need to barter with non-player characters, or build alliances with other players.

What is being described here as subversion, has been called a second order effect in scientific analysis.

Modeling this kind of effect might be important in training higher order problem solving.

The developer who can twist an expected story arc to reveal something unexpected about the characters and content, and leads to surprising game play may provide a unique learning experience, as well as engaging and memorable game play.

What is meant by this twisting of plot and story arc? It is creating the unexpected. For example, in the game Braid, did you expect that Tim was actually what the princess was running from? This twisting of plot is subversive in this game. It makes the game surprising, and perhaps rattles the cage of our expectations. But can we do the same thing in another game and expect people will be just as surprised in the futre? This pattern is something we can identify as a subversive game mechanic, or second or third order effect.

In Fable 2, did you realize that your actions may not lead to immediate reaction, but rather a reaction later?  It was not always clear which reaction is the result of a particular decision made by the player. But actions in one part of the story could lead to emergent circumstances in another. In some ways, this is like the chaos theory story of how a butterfly effect is engaged: in a deterministic system, where there are a number of choices, future choices may be based upon initial conditions, and future conditions are non-linear response to small distinctions in what happened previously. For example:

YouTube Preview Image

A second or third order effect is an unexpected narrative experience, where a cause becomes and effect, which becomes and cause leading to another effect:

A -> B: B -> C .: A-> C

So can we predict a behavior in a narrative based upon this? That our behaviors in A cause B, and perhaps keep us from achieving C?

As an example, what if a player spends all of their resources in A, which leads to compromised situation in B? Perhaps the Outcome of B leads to failure in C?

Perhaps you are a spy and your goal is to observe someone in a restaurant. You are in a foreign country, but speak the language fluently. The waiter asks for your order, you answer, but he responds in way that implies he does not understand you. You try again, and again. He offers to speak in English to help you. It is important that you not make a scene, you acquiesce and are successful in ordering. But immediately the person you are observing is made aware that you do not belong, leaves — your cover is blown.

This pattern can be labeled a subversive mechanic. It can be used to provide unexpected subtlety in a narrative be providing second and even third order effects. This mechanic embeds the mechanic into the narrative and setting. Knowing this pattern in game design can be powerful. But will copying it make for innovative game play? Potentially this pattern can be used in many ways. So a game designer must understand it conceptually, not just concretely. They must be able to employ this pattern in new contexts, narratives, but in some ways, the mechanic stays the same.

Just appropriating a mechanic is not enough. The developer needs to understand this pattern at a deeper conceptual level. The pattern can be applied in ways that still surprise if it is embedded in a different story, in a different way. But this requires that the designer can see the subversive pattern stripped away from the context. That they have a crystallized understanding of the pattern. This means that they can pull this subversive game mechanic out of a story, just as a skilled writer can structure irony in a novel.

To understand this,  we might be better directed to study patterns in player expectation and interaction. Such as subverting player expectations by presenting familiar patterns and twisting them. So perhaps grabbing a successful mechanic from one game will not lead to the same success in another. Just because I loved the cheese on my macaroni does not mean I will love it on my granola.

The key here is in understanding a pattern to be designed into a game conceptually. Just copying it will not do. There is a crystallization that allows one to generalize this pattern into a variety of plot structures and game play mechanics. This crystallization of a concept should result in generalization — the ability to apply it to new contexts.

Crystallization and Conceptual Learning.

The process of crystallizing a pattern into to a concept that is generalizable is one of the things we hope to achieve in schools. Many content areas have complex concepts that need to be fit under the umbrella of terminology. When we say “base”, what do we mean? We can explain it, we can show it, but we may have to explain and show in a number of contexts to make the concept generalizable and applicable to a number of contexts. This term means different things when contextualized in different academic contexts: what is a base in chemistry? political science? physical education? math?

The point is, that even though the term base has many connotations, there is a more general crystallized abstraction we might glean from this term as a conceptual category that can be used with specificity in a number of content domains, but is still generalizable. We learn to develop these conceptual concepts through comparison and contrast in usage, just we we look for conceptual properties from perceptual knowledge from observation in classifying an object in a conceptual category. For example, when you look at a cup, a box, a bath tub, and a hat, you may notice the ability of the objects to contain another object. The causal factor implies a usage–containment. Through looking at many objects, we may abstract to a crystallized generalization.

There is even more trouble with this idea in learning games. Many learning games are developed to teach very simple process, fact, and drill and practice. This is fine, but we typically scoff at this is hardly a game. This is just memorization with no variation You get rewarded for memorizing patterns. This is not like chess, or even tic tac toe, where there might be something surprising in playing another person–where the patterns are not so predictable.

Interestingly, psychologists studied second and third order effects as a form of learning called classical conditioning. An individual might begin to expect a reward if a bell rang. This might elicit a response from the individual. So the bell rings, the person gets cake. Second-order conditioning or higher-order conditioning is a form of learning in which a stimulus is first made meaningful or consequential for an organism through an initial step of learning–like associating a bell with delicious cake; Then that stimulus is used as a basis for learning about some new stimulus. For example, a person might first learn to associate a bell with cake (first-order conditioning), but then learn to associate a light with the bell (second-order conditioning).

 

If a game was not designed to deliver some crystallized concepts as criteria, how can we attribute any learning outcome to it? In curriculum and instruction, we are asked to identify the criteria we are looking ofr in our assessments. So when we design a classroom lesson, we can point to the concepts we are hoping to assess in the content. So if we are designing a game to deliver learning on complex concepts such as justice, or beauty, we need to have clear identifiers, or criteria–otherwise we fall prey to relativity. The learning might be in there. They might have gleaned the key criteria. Is that enough?

Serious games are very much like the tools used in psychological assessments and evaluations. Three types of assessments from psychometric methods:

  • Formative assessments –a measurement tool used to measure growth and progress in learning and activity and can be used in games to alter subsequent learning experiences in games. Formative assessments represent a tool external to the learning activity, and typically occur in leading up to a summative evaluation.
  • Summative assessments provide an evaluation or a final summarization of learning. Summative assessment is characterized as assessment of learning and is contrasted with formative assessment, which is assessment for learning. Summative assessments are also tools external to the learning activity, and typically occur at the end of the learning intervention to evaluate and summarize and is conducted with a tool that is external, not part of the training.
  • An informative assessment guides and facilitates learning as part of the assessment. The assessment is the intervention. Successful participation in the learning results in evidence that learning has taken place. The behaviors in the activity have been shown to verify that learning has taken place. No external measures have been added on for assessment.

Games are typically used in the definition of what is an informative assessment. This makes sense, as a game, by its very nature, provides an activity along with assessments, measures, and evaluation. What, why, and how a game measures learning is of primary importance—and this is why serious game designers must learn assessment methods from the field of psychometrics if serious games are to grow as diagnostic tools, assessments, and evaluations.

If a game is to act as an informative assessment, it will stress meaningful, timely, and continuous feedback about learning concepts and process that are accurately depicted. As in an informative assessment, feedback in a game can be a powerful part of the assessment process. As the learner acts in the context of the games rule environment, they may learn the rules and tools through trial and error—eventually developing tactical approaches, and potentially formulate strategies from the possibilities for action deduced from learning from the in-game assessment criteria. This can be powerful.

Evidence supports this powerful learning tool. Research findings from over 4,000 studies indicate that informative assessment has the most significant impact on achievement (Wiliam, 2007). When serious games are built with same care as an informative assessment using methods from psychometrics, serious games can be as effective as an informative assessment.

Currently, most games are not designed as informative assessments. This means that learning in a serious game might suffer from the Vegas Effect. For a game to act as informative assessment, the game must accurately measure the learning the concepts, and the concepts from the game must transfer to other performance contexts—beyond the game. In order to achieve this, the issue of construct validity must be addressed.

For a serious game to have construct validity, the training interventions that they present must have been designed with emphasis on the creation of internal and external validity—what we model, how we measure it, and how it is presented in a game:

  • External validity: the ability to generalize in-game learning to other contexts.  To what extent can a training effect from a game be generalized to other populations (population validity), other settings (ecological validity), other treatment variables, and other measurement variables?
  • Internal validity: examines whether the adequacy of the study design, or in this case of the game, that the intervention was the only possible cause of a change in the players learning.

To do this, serious game development requires valid concepts for modeling, implementation, and assessment of what is to be learned, as well as how it will be measured outside the game. This is essential for ROI (return on investment) analysis. Serious game development requires research and construct validity to conduct ROI and to avoid the Vegas Effect. Learning that happens in games should not stay in games.

BROWSE > HOME / SEARCH

 

Games and classrooms do not have a monopoly on learning.

We learn from everything. Learning is our natural state. When we quit learning, we begin dying. When we quit learning it is as if we have chosen to limit our ability to change with context. This idea implies that we seek to limit context, and thus find success  with familiar patterns.

From an evolutionary perspective, fitting context to enable  behavior should lead to extinction, not evolution. It should also imply that people prefer playing with patterns from memory, rather than learning new ones. Perhaps this is why people prefer a sequel.

Does combining awesome mechanics lead to an Übergame?

I have not found this to be so. I have found that really great games integrate mechanics as part of fluid gameplay and story. There is a reason for a mechanic–it is to move the player forward in the story, and hopefully lead to an interesting surprise in gameplay. Just appropriating a mechanic is not enough.

I have found the same issue in learning games and assessments. We cannot examine lots of serious games to determine what makes a serious game effective.  Fun and learning are an outcome of a successful blend of mechanics moving the story forward. This often comes from subverting the expectations of story patterns of the player.

To understand this,  we might be better directed to study patterns in player expectation and interaction. Such as subverting player expectations by presenting familiar patterns and twisting them. So perhaps grabbing a successful mechanic from one game will not lead to the same success in another. Just because I loved the cheese on my macaroni does not mean I will love it on my granola.

It is the same in learning games, it seems a tall order to expect that doing the same assessments in a new medium will lead to different results. We cannot take multiple choice paradigms from worksheets and put them in branching narratives and expect that game play of a AAA title. But neither can we take a AAA title and expect that it can diagnose depression or lead to improved problem solving,comprehension, or working memory. We must design with the criteria for the diagnosis of depression in the mechanics of narrative if we wish to diagnose and treat depression with a game.

Until we do this, we may be just spinning our oatmeal.

How do we structure learning in a game that leads to measurable improvement outside of the game?

Currently, many games are not required to prove that the game play leads to learning. All we have to prove is that the information was delivered. The onus is on the learner.  The learner is given the information, and they must act to learn and integrate. If not, they may lose their job or be reprimanded.

Folks who pay for serious games know that they are getting Return On Investment because of the savings in the platform.  Because of this, they do not examine the efficacy of the learning structured in the game. Why should they?

If they are delivering training to thousands, or even hundreds of employees, they have already saved money just through the delivery model.

It is not the same for educational games. Educational games are supposed to deliver learning. If we are going to say that games are more effective than traditional classroom learning, shouldn’t we have evidence to support that statement?

Games often have the same problem as teachers.

Learning may not transfer to improved performance outside of a lesson or a game. Maybe learning takes place, but it is incidental learning. Incidental learning can be useful.  An educator can leverage this as prior knowledge to build upon; They can debrief and ask the learner to draw from the tacit knowledge from game play to develop and form concepts. (read more )

But hypotheticals and incidental learning do not create comprehension. The process of taking content into concept has been called transduction, where sensorimotor stimulation memory is transduced for conceptual knowledge from perceptual knowledge. We assign meaning. According to embodiment theorists, this is done through assigning causation and other meaningful content to objects and creating conceptual categories. Take for example the concept “containment”. We may look at many different objects that contain, and then create a term like “containment”. This term must be learned from looking at many examples.

Luckily, we do have a fairly good description of how this happens behaviorally, and cognitively. But neuroscience, is still trying to parse it out topographically in the brain. But is is a good idea to study other learning games to for delivering this? Why not use the theory and apply it in a relevant way, rather than just implementing a copycat mechanic?

By studying games, it seems like we are back to a behaviorist paradigms like operant conditioning. Where the brain does not exist. Where we fixate on the object and response rather than ideas about we respond the way we do.Without a model of learning (behavior that identifies conceptualization/comprehension) we may struggle mightily to measure the learning we hope to deliver. What does learning look like?  What structures in a game deliver it? We cannot really put the game first can we?

If a game maker hopes to create a game that can diagnose or treat schizophrenia, they must have a robust model of how to measure schizophrenia, and measure “not-schizophrenia.” There must be convergent and divergent validity in the measures. What is and what is not schizophrenia.The same goes for problem solving and comprehension (learning).

How will we get this from studying a game? I am earnest in asking this.

Games can deliver many outcomes depending upon the users expectations, goals, and prior experience, and whether these align with the developer and instructor.

As an example: a child can be given a banana, turn around and make a long distance call with it, stop a bank robbery, and then have a snack with it. The use and purpose may be different depending upon the goals of the user. If the developer limits goals, they have restricted play, and diminished the degree to which it might be called a game.

So, the study of games for learning outcomes would be a task that would seemingly require specificity for each game and each purpose.

How would we create a general framework for that?

My own position is that games can be repurposed for learning. I did this with middle school students. This led to documented statistical significant improvement in standardized test scores in reading. But this was about what I put around the games. (Read more )

But serious games are different. They are both assessments and instructional interventions. They are the epitome of Informative Assessments. (Read More)

Thus, we need to have a an operational description of what learning looks like when it happens. We need this so we can design the game with mechanics that incorporate the criteria from the assessments in multiple methods and multiple traits. In 1954 Meehl and Crohnbach described this issue of construct validity a nomological network. Campbell and Fisk built on this and created the Multio Trait Multi Method Matrix. This is the MTMM model.

Often game designers say that this kills games. That it structures the fun out. Well, this is true if you cannot imaginatively integrate learning metrics into engaging game narratives and mechanics. The same is said about killing learning with assessment.Well, get better at game design, and get better at structuring assessments. This seems more like a crisis of imagination and implementation rather than oppressive testing and game design.So, if you are going to measure the effects of game on depression, you need to have criteria measures and methods for construct validity.

When we diagnose, we need a model of depression. we have the DSM-V for this. The same can be said about learning. What and how? We have lots and lots of models for this. In fact, this approach may extend research in learning. Designing games for learning is comparable to hypothesis testing.

We need models to apply to the games in both analysis and development. Blooms Taxonomy will not cut it any more. It was never meant for this any way if you read Bloom’s book. Bloom gave us general categories to begin thinking about learning process, and how to structure it. But his ideas did not lead to empirical validation of those principals. Learning games may need evidence of transfer. That learning which happens in games, does not stay in games. So we need actual psychometric measures, as well as some imagination in delivering them as game mechanics and plot twists.

 

POSTED BY  ON FRIDAY, SEPTEMBER 21, 2012 AT 9:51 AM (EDIT)
FILED UNDER FEATURED · TAGGED WITH

 

What we cannot know or do individually, we may be capable of collectively.

My research examines the transformation of perceptual knowledge into conceptual knowledge. Conceptual knowledge can be viewed as crystallized, which means that it has become abstracted and is often symbolized in ways that do not make the associated meaning obvious. Crystallized knowledge is the outcome of fluid intelligence, or the ability to think logically and solve problems in novel situations, independent of acquired knowledge. I investigate how groups and objects may assist in crystallization of knowledge, or the construction of conceptual understanding.

I am currently approaching this problem from the perspective that cognition is externalized and extended through objects and relationships.  This view posits that skill, competence, knowledge are learned through interaction aided with objects imbued with collective knowledge.

Groups make specialized information available through objects and relationships so that individual members can coordinate their actions and do things that would be hard or impossible for them to enact individually.  To examine this, I use a socio-cognitive approach, which views cognition as distributed, where information processing is imbued in objects and communities and aids learners in problem solving.

This socio-cognitive approach is commonly associated with cognitive ethnography and the study of social networks. In particular, I have special interest in how play, games, modeling, and simulations can be used to enhance comprehension and problem solving through providing interactive learning. In my initial observational studies, I have found that games are structured forms of play, which work on a continuum of complexity:

  • Pretense, imagery and visualization of micro worlds
  • Tools, rules, and roles
  • Branching / probability

Games hold communal knowledge, which can be learned through game play. An example of this comes from the board game Ticket to Ride. In this strategy game players take on the role of a railroad tycoon in the early 1900′s. The goal is to build an empire that spans the United States while making shrewd moves that block your opponents from being able to complete their freight and passenger runs to various cities. Game play scaffolds the learner in the history and implications of early transportation through taking on the role of an entrepreneur and learning the context and process of building up a railroad empire. In the course of the game, concept are introduced, with language, and value systems based upon the problem space created by the game mechanics (artifacts, scoring, rules, and language). The game can be analyzed as a cultural artifact containing historical information; a vehicle for content delivery as a curriculum tool; as well as an intervention for studying player knowledge and decision-making.

I have observed that learners interact with games with growing complexity of the game as a system. As the player gains top sight, a view of the whole system, they play with greater awareness of the economy of resources, and in some cases an aesthetic of play. For beginning players, I have observed the following progression:

  1. Trial and error – forming a mental representation, or situation model of how the roles, rules, tools, and contexts work for problem solving.
  2. Tactical trials – a successful tactic is generated to solve problems using the tools, rules, roles, and contexts.  This tactic may be modified for use in a variety of ways as goals and context change in the game play.
  3. Strategies—the range of tactics of resulted in strategies that come from a theory of how the game works. This approach to problem solving indicates a growing awareness of systems knowledge, the purpose or criteria for winning, and is a step towards top sight. They understand that there are decision branches, and each decision branch comes with risk reward they can evaluate in the context of economizing resources.
  4. Layered strategies—the player is now making choices based upon managing resources because they are now economizing resources and playing for optimal success with a well-developed mental representation of the games criteria for winning, and how to have a high score rather than just finish.
  5. Aesthetic of play—the player understands the system and has learned to use and exploit ambiguities in the rules and environment to play with an aesthetic that sets the player apart from others. The game play is characterized with surprising solutions to the problem space.

For me, games are a structured form of play. As an example, a game may playfully represent an action with associated knowledge, such as becoming a railroad tycoon, driving a high performance racecar, or even raising a family. Games always involve contingent decision-making, forcing the players to learn and interact with cultural knowledge simulated in the game.

Games currently take a significant investment of time and effort to collectively construct. These objects follow in a history of collective construction by groups and communities. Consider the cartography and the creation of a map as an example of collective distributed knowledge imbued in an object.  According to Hutchins (1996),

“A navigation chart represents the accumulation of more observations than any one person could make in a lifetime. It is an artifact that embodies generations of experience and measurement. No navigator has ever had,nor will one ever have, all the knowledge that is in the chart.”

A single individual can use a map to navigate an area with competence, if not expertise. Observing an individual learning to use a map, or even construct one is instructive for learning about comprehension and decision-making. Interestingly, games provide structure to play, just as maps and media appliances provide structure to data to create information. Objects such as maps and games are examples of collective knowledge, and are what Vygotsky termed a pivot.

The term pivot was initially conceptualized in describing children’s play, particularly as a toy. A toy is a representation used in aiding knowledge construction in early childhood development. This is the transition where children may move from recognitive play to symbolic and imaginative play, i.e. the child may play with a phone the way it is supposed to be used to show they can use it (recognitive), and in symbolic or imaginative play, they may pretend a banana is the phone.

This is an important step since representation and abstraction are essential in learning language, especially print and alphabetical systems for reading and other discourse. In this sense, play provides a transitional stage in this direction whenever an object (for example a stick) becomes a pivot for severing meaning of horse from a real horse. The child cannot yet detach thought from object. (Vygotsky, 1976, p 97). For Vygotsky, play represented a transition in comprehension and problem solving –where the child moved from external processing – imagination in action –to internal processing — imagination as play without action.

In my own work, I have studied the play of school children and adults as learning activities. This research has informed my work in classroom instruction and game design. Learning activities can be structured as a game, extending the opportunity to learn content, and extend the context of the game into other aspects of the learner’s life, providing performance data and allowing for self-improvement with feedback, and data collection that is assessed, measured and evaluated for policy.

My research and publications have been informed by my work as a tenured teacher and software developer. A key feature of my work is the importance of designing for learning transfer and construct validity. When I design a learning environment, I do so with research in mind. Action research allows for reflection and analysis of what I created, what the learners experienced, and an opportunity to build theory. What is unique about what I do is the systems approach and the way I reverse engineer play as a deep and effective learning tool into transformative learning, where pleasurable activities can be counted as learning.

Although I have published using a wide variety of methodologies, cognitive ethnography is a methodology typically associated with distributed cognition, and examines how communities contain varying levels of competence and expertise, and how they may imbue that knowledge in objects. I have used it specifically on game and play analysis (Dubbels 2008, 2011). This involves observation and analysis of Space or Context—specifically conceptual space, physical space, and social space. The cognitive ethnographer transforms observational data and interpretation of space into meaningful representations so that cognitive properties of the system become visible (Hutchins, 2010; 1995). Cognitive ethnography seeks to understand cognitive process and context—examining them together, thus, eliminating the false dichotomy between psychology and anthropology. This can be very effective for building theories of learning while being accessible to educators.

My current interest is in the use of cognitive ethnographic methodology with traditional form serves as an opportunity to move between inductive and deductive inquiry and observation to build a Nomological network (Cronbach & Meehl, 1955) using measures and quantified observations with the Multiple Trait and Multiple Method Matrix Analysis (Campbell & Fiske, 1959) for construct validity (Cook & Campbell, 1979; Campbell & Stanley, 1966) especially in relation to comprehension and problem solving based upon the Event Indexing Model (Zwaan & Radvansky, 1998).

We distribute knowledge because it is impossible for a single human being, or even a group to have mastery of all knowledge and all skills (Levy, 1997). For this reason I study access and quality of collective group relations and objects and the resulting comprehension and problem solving. The use of these objects and relations can scaffold learners and inform our understanding of how perceptual knowledge is internalized and transformed into conceptual knowledge through learning and experience.

Specialties

Educational research in cognitive psychology, social learning. identity, curriculum and instruction, game design, theories of play and learning, assessment, instructional design, and technology innovation.

Additionally:

The convergence of media technologies now allow for collection, display, creation, and broadcast of information as narrative, image, and data. This convergence of function makes two ideas important in the study of learning:

  • The ability to create of media communication through narrative, image, and data analysis and information graphics is becoming more accessible to non-experts through media appliances such as phones, tablets, game consoles and personal computers.
    • These media appliances have taken very complex behaviors such as film production, which in the past required teams of people with special skill and knowledge, and have imbued these skills and knowledge in hand-held devices that are easy to use, and are available to the general population.
    • This accessibility allows novices to learn complex media production, analysis, and broadcast, and allows for the study of these devices as object that has been imbued with the knowledge and skill, as externalized cognitionThrough the use of these devices, the general population may learn complex skills and knowledge that may have required years of specialized training in the past. Study of the interaction between of individuals learning to use these appliances and devices can be studied as a progression of internalizing knowledge and skill imbued in objects.
    • The convergence of media technologies into small, single–even handheld—devices emphasizes that technology for producing media may change, but the narrative has remained relatively consistent.
  • This consistency of media as narrative, imagery, and data analysis emphasizes the importance of the continued study of narrative comprehension and problem solving through the use of these media appliances.

 

I don’t think anyone would disagree — fostering creativity should be a goal of classroom learning.

However, the terms creativity and innovation are often misused. When used they typically imply that REAL learning cannot be measured. Fortunately, we know A LOT about learning and how it happens now. It is measurable and we can design learning environments that promote it. It is the same with creativity as with intelligence–we can promote growth in creativity and intelligence through creative approaches to pedagogy and assessment. Because data-driven instruction does not kill creativity, it should promote it.

One of the ways we might look at creativity and innovation is through the much maligned tradition of intelligence testing as described in the Wikipedia:

Fluid intelligence or fluid reasoning is the capacity to think logically and solve problems in novel situations, independent of acquired knowledge. It is the ability to analyze novel problems, identify patterns and relationships that underpin these problems and the extrapolation of these using logic. It is necessary for all logical problem solving, especially scientific, mathematical and technical problem solving. Fluid reasoning includes inductive reasoning and deductive reasoning, and is predictive of creativity and innovation.

Crystallized intelligence is indicated by a person’s depth and breadth of general knowledge, vocabulary, and the ability to reason using words and numbers. It is the product of educational and cultural experience in interaction with fluid intelligence and also predicts creativity and innovation.

The Myth of Opposites

Creativity and intelligence are not opposites. It takes both for innovation.

What we often lack are creative ways of measuring learning growth in assessments. When we choose to measure growth in summative evaluations and worksheets over and over , we nurture boredom and kill creativity.

To foster creativity, we need to adopt and implement pedagogy and curriculum that promotes creative problems solving, and also provides criteria that can measure creative problem solving.

What is needed are ways to help students learn content in creative ways through the use of creative assessments.

We often confuse the idea of  learning creatively with trial and error and play, free of any kind of assessment–that somehow the Mona Lisa was created through just free play and doodling. That somehow assessment kills creativity.  Assessment provide learning goals.

Without learning criteria, students are left to make sense of the problem put before them with questions like “what do I do now?” (ad infinitum).

The role of the educator is to design problems so that the solution becomes transparent. This is done through providing information about process, outcome, and quality criteria . . . assessment, is how it is to be judged. For example, “for your next assignment, I want a boat that is beautiful and  that is really fast. Here are some examples of boats that are really fast.  Look at the hull, the materials they are made with, etc. and design me a boat that goes very fast and tell me why it goes fast. Tell me why it is beautiful.” Now use the terms from the criteria. What is beautiful? Are you going to define it? How about fast? Fast compared to what? These open-ended, interest-driven, free play assignments might be motivating, but they lead to quick frustration and lots of “what do I do now?”

But play and self-interest arte not the problem here. The problem is the way we are approaching assessment.

Although play is described as a range of voluntary, intrinsically motivated activities normally associated with recreational pleasure and enjoyment; Pleasure and enjoyment still come from judgements about one’s work–just like assessment–whether finger painting or creating a differential equation. The key feature here is that play seems to involve self-evaluation and discovery of key concepts and patterns. Assessments can be constructed to scaffold and extend this, and this same process can be structured in classrooms through assessment criteria.

Every kind of creative play activity has evaluation and self-judgement: the individual is making judgements about pleasure, and often why it is pleasurable. This is often because they want to replicate this pleasure in the future, and oddly enough, learning is pleasurable. So when we teach a pleasurable activity, the learning may be pleasurable. This means chunking the learning and concepts into larger meaning units such as complex terms and concepts, which represent ideas, patterns, objects, and qualities. Thus, crystallized intelligence can be constructed through play as long as the play experience is linked and connected to help the learner to define and comprehend the terms (assessment criteria). So when the learner talks about their boat, perhaps they should be asked to sketch it first, and then use specific terms to explain their design:

Bow is the frontmost part of the hull

Stern is the rear-most part of the hull

Port is the left side of the boat when facing the Bow

Starboard is the right side of the boat when facing the Bow

Waterline is an imaginary line circumscribing the hull that matches the surface of the water when the hull is not moving.

Midships is the midpoint of the LWL (see below). It is half-way from the forwardmost point on the waterline to the rear-most point on the waterline.

Baseline an imaginary reference line used to measure vertical distances from. It is usually located at the bottom of the hull

Along with the learning activity and targeted learning criteria and content, the student should be asked a guiding question to help structure their description.

So, how do these parts affect the performance of the whole?

Additionally, the learner should be adopting the language (criteria) from the rubric to build comprehension. Taking perception, experience, similarities and contrasts to understand Bow and Stern, or even Beauty.

Experiential Learning for Fluidity and Crystallization

What the tradition of intelligence offers is an insight as to how an educator might support students. What we know is that intelligence is not innate. It can change through learning opportunities. The goal of the teacher should be to provide experiential learning that extends Fluid Intelligence, through developing problem solving, and link this process to crystallized concepts in vocabulary terms that encapsulate complex process, ideas, and description.

The real technology in a 21st Century Classroom is in the presentation and collection of information. It is the art of designing assessment for data-driven decision making. The role of the teacher should be in grounding crystallized academic concepts in experiential learning with assessments the provide structure for creative problem solving. The teacher creates assessments where the learning is the assessment. The learner is scaffolded through the activity with guidance of assessment criteria.

A rubric, which provides criteria for quality and excellence can scaffold creativity innovation, and content learning simultaneously. A well-conceived assessment guides students to understand descriptions of quality and help students to understand crystallized concepts.

An example of a criteria-driven assessment looks like this:

Purpose & Plan Isometric Sketch Vocabulary Explanation
Level up Has identified event and hull design with reasoning for appropriateness. Has drawn a sketch where length, width, and height are represented by lines 120 degrees apart, with all measurements in the same scale. Understanding is clear from the use of five key terms from the word wall to describe how and why the boat hull design will be successful for the chosen event. Clear connection between the hull design, event, sketch, and important terms from word wall and next steps for building a prototype and testing.
Approaching Has chosen a hull that is appropriate for event but cannot connect the two. Has drawn Has drawn a sketch where length, width, and height are represented. Uses five key terms but struggles to demonstrate understanding of the terms in usage. Describes design elements, but cannot make the connection of how they work together.
Do it again Has chosen a hull design but it may not be appropriate for the event. Has drawn a sketch but it does not have length, width, and height represented. Does not use five terms from word wall. Struggles to make a clear connection between design conceptual design stage elements.

What is important about this rubric is that it guides the learner in understanding quality and assessment. It also familiarizes the learner with key crystallized concepts as part of the assessment descriptions. In order to be successful in this playful, experiential activity (boat building),  the learner must learn to comprehend and demonstrate knowledge of the vocabulary scattered throughout the rubric such as: isometric, reasoning, etc. This connection to complex terminology grounded with experience is what builds knowledge and competence. When an educator can coach a student connecting their experiential learning with the assessment criteria, they construct crystallized intelligence through grounding the concept in experiential learning, and potentially expand fluid intelligence through awareness of new patterns in form and structure.

Play is Learning, Learning is Measurable

Just because someone plays, or explores does not mean this learning is immeasurable. The truth is, research on creative breakthroughs demonstrate that authors of great innovation learned through years of dedicated practice and were often judged, assessed, and evaluated.  This feedback from their teachers led them to new understanding and new heights. Great innovators often developed crystallized concepts that resulted from experience in developing fluid intelligence. This can come from copying the genius of others by replicating their breakthroughs; it comes from repetition and making basic skills automatic, so that they could explore the larger patterns resulting from their actions. It was the result of repetition and exploration, where they could reason, experiment, and experience without thinking about the mechanics of their actions.  This meant learning the content and skills from the knowledge domain and developing some level of automaticity. What sets an innovator apart it seems, is tenacity and being playful in their work, and working hard at their play.

According to Thomas Edison:

During all those years of experimentation and research, I never once made a discovery. All my work was deductive, and the results I achieved were those of invention, pure and simple. I would construct a theory and work on its lines until I found it was untenable. Then it would be discarded at once and another theory evolved. This was the only possible way for me to work out the problem. … I speak without exaggeration when I say that I have constructed 3,000 different theories in connection with the electric light, each one of them reasonable and apparently likely to be true. Yet only in two cases did my experiments prove the truth of my theory. My chief difficulty was in constructing the carbon filament. . . . Every quarter of the globe was ransacked by my agents, and all sorts of the queerest materials used, until finally the shred of bamboo, now utilized by us, was settled upon.

On his years of research in developing the electric light bulb, as quoted in “Talks with Edison” by George Parsons Lathrop in Harpers magazine, Vol. 80 (February 1890), p. 425

So when we encourage kids to be creative, we must also understand the importance of all the content and practice necessary to creatively breakthrough. Edison was taught how to be methodical, critical, and observant. He understood the known patterns and made variations. It is important to know the known forms to know the importance of breaking forms. This may inv0lve copying someone else’s design or ideas. Thomas Edison also speaks to this when he said:

Everyone steals in commerce and industry. I have stolen a lot myself. But at least I know how to steal.

Edison stole ideas from others, (just as Watson and Crick were accused of doing). The point Watson seems to be making here is that he knew how to steal, meaning, he saw how the parts fit together. He may have taken ideas from a variety of places, but he had the knowledge, skill, and vision to put them together. This synthesis of ideas took awareness of the problem, the outcome, and how things might work. Lots and lots of experience and practice.

To attain this level of knowledge and experience, perhaps stealing ideas, or copying and imitation are not a bad idea for classroom learning? However copying someone else in school is viewed as cheating rather than a starting point. Perhaps instead, we can take the criteria of examples and design classroom problems in ways that allow discovery and the replication of prior findings (the basis of scientific laws). It is often said that imitation is the greatest form of flattery. Imitation is also one of the ways we learn. In the tradition of play research, mimesis is imitation–Aristotle held that it was “simulated representation”.

The Role of Play and Games

In close, my hope is that we not use the terms “creativity” and “innovation” as suitcase words to diminish such things as minimum standards. We need minimum standards.

But when we talk about teaching for creativity and innovation, where we need to start is the way that we gather data for assessment. Often assessments are unimaginative in themselves. They are applied in ways that distract from learning, because they have become the learning. One of the worst outcomes of this practice is that students believe that they are knowledgeable after passing a minimum standards test. This is the soft-bigotry of low expectation. Assessment should be adaptive, criteria driven, and modeled as a continuous improvement cycle.

This does not mean that we must  drill and kill kids in grinding mindless repetition. Kids will grind towards a larger goal where they are offered feedback on their progress. They do it in games.

Games are structured forms of play. They are criteria driven, and by their very nature, games assess, measure, and evaluate. But they are only as good as their assessment criteria.

These concepts should be embedded in creative active inquiry that will allow the student to embody their learning and memory. However, many of the creative, inquiry-based lessons I have observed tend to ignore the focus of academic language–the crystallized concepts. Such as, “what is fast?”, “what is beauty”,  ”what is balance?”, or “what is conflict?” The focus seems to be on interacting with content rather than building and chunking the concepts with experience. When Plato describes the world of forms, and wants us to understand the essence of the chair, i.e., “what is chairness?” We may have to look at a lot of chairs to understand chairness.  Bu this is how we build conceptual knowledge, and should be considered when constructing curriculum and assessment. A guiding curricular question should be:

How does the experience inform the concepts in the lesson?

There is a way to use data-driven instruction in very creative lessons, just like the very unimaginative drill and kill approach. Teachers and assessment coordinators need to take the leap and learn to use data collection in creative ways in constructive assignments that promote experiential learning with crystallized academic concepts.

If you have kids make a diorama of a story, have them use the concepts that are part of the standards and testing: Plot, Character, Theme, Setting, ETC. Make them demonstrate and explain. If you want kids to learn the physics have them make a boat and connect the terms through discovery. Use their inductive learning and guide them to conceptual understanding.This can be done through the use of informative assessments, such as with rubrics and scales for assessment.  Evaluation and creativity are not contradictory or mutually exclusive. These seeming opposites are complementary, and can be achieved through embedding the crystallized, higher order concepts into meaningful work.

The visual span for reading refers to the range of letters, formatted as in text, that can be recognized reliably without moving the eyes. It is likely that the size of the visual span is determined primarily by characteristics of early visual processing. It has been hypothesized that the size of the visual span imposes a fundamental limit on reading speed (Legge, Mansfield, & Chung, 2001). The goal of the present study was to investigate developmental changes in the size of the visual span in school-age children, and the potential impact of these changes on children’s reading speed. The study design included groups of 10 children in 3rd, 5th, and 7th grade, and 10 adults. Visual span profiles were measured by asking participants to recognize letters in trigrams (random strings of three letters) flashed for 100 ms at varying letter positions left and right of the fixation point. Two print sizes (0.25° and 1.0°) were used. Over a block of trials, a profile was built up showing letter recognition accuracy (% correct) versus letter position. The area under this profile was defined to be the size of the visual span. Reading speed was measured in two ways: with Rapid Serial Visual Presentation (RSVP) and with short blocks of text (termed Flashcard presentation). Consistent with our prediction, we found that the size of the visual span increased linearly with grade level and it was significantly correlated with reading speed for both presentation methods. Regression analysis using the size of the visual span as a predictor indicated that 34% to 52% of variability in reading speeds can be accounted for by the size of the visual span. These findings are consistent with a significant role of early visual processing in the development of reading skills.
Keywords: Letter Recognition, Reading speed, Development
pmc logo image
Logo of nihpa

Developmental Changes in the Visual Span for Reading

MiYoung Kwon,a Gordon E. Legge,a and Brock R. Dubbelsb
a Department of Psychology, University of Minnesota, Elliott Hall, 75 East River Rd. Minneapolis, MN 55455 USA
b College of Education & Human Development, University of Minnesota, Burton Hall, 178 Pillsbury Dr., Minneapolis MN 55455 USA
Corresponding Author: MiYoung Kwon, 75 East River Rd, Minneapolis, MN, TEL: 612-296-6131; EMAIL:kwon0064@umn.edu
Small right arrow pointing to: The publisher’s final edited version of this article is available at Vision Res
Small right arrow pointing to: See other articles in PMC that cite the published article.
1. INTRODUCTION
Children’s reading speed increases throughout the school years. According toCarver (1990), from grade 2 to college, the average reading rate increases about 14 standard-length words per minute1each year. Learning to read involves becoming proficient in phonological, linguistic and perceptual components of reading (Aghababian, & Nazir, 2000). By age 7, normally sighted children reach nearly adult levels of visual acuity (Dowdeswell, Slater, Broomhall, & Tripp, 1995). By first grade, typically 6 years of age, most of them know the alphabet. Nevertheless, reading speed takes a long time to reach adult levels.
Many studies have addressed potential explanations for developmental changes in reading skills. Because it is often assumed that visual development is complete by the beginning of grade school, most studies have focused on the role of phonological or linguistic skills in learning to read (e.g., Adams, 1990; Goswami & Bryant, 1990;Muter, Hulme, Snowling, & Taylor, 1997). Consistent with this focus, one widely accepted view is that linguistic skills are predictive of reading performance and serve as the locus of differences in reading ability. According to this view, skilled and less skilled readers extract the same amount of visual information during the time course of an eye fixation, but skilled readers have more rapid access to letter name codes (e.g., Jackson & McClelland, 1979; Neuhaus, Foorman, Francis, & Carlson, 2001), make better use of linguistic structure to augment the visual information (Smith, 1971), or process the information more efficiently through a memory system (Morrison, Giordani, & Nagy, 1977) (as cited in Mason, 1980, p. 97). It is further argued that inefficient eye movement control observed in less skilled readers is a reflection of linguistic processing difficulty (Rayner, 1986, 1998) rather than a symptom of perceptual difference per se.
Stanovich and colleagues have critiqued the general view that differences in reading skill are primarily due to top-down linguistic influences. See Stanovich (2000, Ch. 3) for a review. Stanovich (2000) has summarized findings showing that recognition time for isolated words is highly correlated with individual differences in reading fluency. This work has focused interest on the speed of perceptual processing, rather than top-down cognitive or linguistic influences, in accounting for individual differences in normal reading performance. The differences in word-recognition time among normally sighted subjects could be due to differences in the transformation from visual to phonological representations of words, or to differences at an earlier, purely visual, level of representation. In short, it remains plausible that individual differences in reading skill, and also the development of reading skill, are at least partially due to differences in visual processing.
Five lines of evidence implicate vision as a factor influencing reading development. 1) The characteristics of children’s reading eye movements differ from those of adults, showing smaller and less precise saccades than adults (Kowler, & Martins, 1985). 2)Mason and Katz (1976) found that good and poor readers among 6th-grade children differed in their ability to identify the relative spatial position of letters. Farkas and Smothergill (1979) also found that performance on a position encoding task improved with grade level in children in 1st, 3rd and 5th grade. 3) It was found that children’s reading ability was associated with orientation errors in letter recognition such as confusing d and b, or p and q. stressing the role of visual-orthographic skill in reading (e.g., Davidson, 1934, 1935; Cairns, & Setward, 1970; Terepocki, Kruk, & Willows, 2002). 4) More direct evidence for the involvement of visual processing in children’s reading development was obtained by O’Brien, Mansfield and Legge (2005). They observed that the critical print size for reading decreases with increasing age. (Critical print size refers to the smallest print size at which fast, fluent reading is possible.) A similar character-size dependency of reading performance was also observed by Hughes and Wilkins (2000)and Cornelissen et al. (1991). 5) Letter recognition, a necessary component process in word recognition (e.g., Pelli, Farell, & Moore, 2003), is known to be degraded by interference from neighboring letters (Bouma, 1970). This crowding effect decreases with age in school-age children (Bondarko & Semenov, 2005) and is significantly worse in children with developmental dyslexia compared with normal readers (Spinelli, De Luca, Judica, & Zoccolotti, 2002). It should also be noted that there is a related debate in the literature over the role of visual factors in dyslexia, especially the impact of visual processing in the magnocellular pathway. For competing views, see the reviews by Stein and Walsh (1997) and Skottun (2000a; 2000b).
Collectively, the empirical findings briefly summarized above suggest a role for early visual processing in the development of reading skills. The question of whether there is an early perceptual locus for reading differences is an important one to resolve both for a better understanding of the reading process and for remediation purposes. In the present paper, we ask whether vision plays a role in explaining the known developmental changes in reading speed.
Legge, Mansfield and Chung (2001) studied the relationship between reading speed and letter recognition. They proposed that the size of the visual span2 – the range of letters, formatted as in text, that can be recognized reliably without moving the eyes – covaries with reading speed. They also proposed that shrinkage of the visual span may play an important role in explaining reduced reading speed in low vision. Work in our lab has shown that for adults with normal vision, manipulation of text contrast and print size (Legge, Cheung, Yu, Chung, Lee, & Owens, 2007), character spacing (Yu, Cheung, Legge, & Chung, 2007), and retinal eccentricity (Legge, et al., 2001) produce highly correlated changes in reading speed and the size of the visual span.Pelli, Tillman, Freeman, Su, Berger, and Majaj (in press) have recently shown that a similar concept, which they term “uncrowded span,” is directly linked to reading speed. The influential role of the size of the visual span in reading speed was also demonstrated in a computational model called “Mr. Chips”, which uses the size of the visual span as a key parameter (Legge, Klitz, & Tjan, 1997; Legge, Hooven, Klitz, Mansfield, & Tjan, 2002). These empirical and theoretical findings provide growing evidence for a linkage between reading speed and the size of the visual span.
We measured the visual spans of children at three grade levels to examine developmental changes in early visual processing. The size of the visual span was measured using a trigram3 (random strings of three letters) identification task (Legge, et al., 2001). In this method, participants are asked to recognize letters in trigrams flashed briefly at varying letter positions left and right of the fixation point as shown in the top panel of Figure 1. Over a block of trials, a visual-span profile is built up – a plot of letter recognition accuracy (% correct) as a function of letter position left and right of fixation – as shown in the bottom panel of Figure 1. These profiles quantify the letter information available for reading. The method of measurement means that the profiles are largely unaffected by oculomotor factors and top-down contextual factors. Trigram identification captures two major properties of visual processing required for reading: letter identification and encoding of the relative positions of letters.
 

Figure 1

 

Figure 1

Visual Span Profile. Top: Illustrates that trials consist of the presentation of trigrams, random strings of three letters, at specified letter positions left and right of fixation. Bottom: Example of a visual-span profile, in which letter recognition (more …)
We distinguish between the concept of the visual span and the concept of the perceptual span (McConkie, & Rayner, 1975). Operationally, the perceptual span refers to the region of visual field that influences eye movements and fixation times in reading. The size of the perceptual span is typically measured using either the moving window technique (McConkie, & Rayner, 1975) or moving mask technique(Rayner, & Bertera, 1979). The perceptual span is estimated to extend about 15 characters to the right of fixation and four characters to the left of fixation. Rayner (1986) argued that the perceptual span reflects readers’ linguistic processing or overall cognitive processing rather than visual processing per se. On the other hand, the visual span is relatively immune to oculomotor and top-down contextual influences, and is likely to be primarily determined by the characteristics of front-end visual processing.
Rayner (1986) measured the size of the perceptual span and characteristics of saccades and fixation times in children in second, fourth and sixth grades, and in adults. He found an increase in the size of the perceptual span and a decrease in fixation times with age. These oculomotor changes could be due to maturation in eye movement control, or to secondary factors influencing eye movement control (either bottom-up visual factors, or top-down cognitive factors). Rayner (1986) attributed the developmental changes in eye movements to top-down cognitive factors because the size of the perceptual span and fixation duration were found to be dependent on the text difficulty. For example, he found that when children in fourth grade were given age appropriate text material, their fixation times and the size of the perceptual span became close to those of adults.
To confirm that oculomotor maturation is not the major source of developmental changes in reading speed, we tested our participants with two types of reading displays. First, Rapid Serial Visual Presentation (RSVP) reading minimizes the need for intra-word reading saccades, and removes the reader’s control of fixation times. Second, in our Flashcard method, participants read short blocks of text requiring normal reading eye movements. If maturation of eye-movement control is an important contributor to the development of reading speed, we would expect to observe a greater developmental effect in flashcard reading compared with RSVP reading. To the extent that growth in the size of the visual span is a contributor to the development of reading speed, we would expect to find a similar positive correlation with reading speed for both types of displays.
We also asked whether letter size affects the size of the visual span. Print size in children’s books is usually larger than for adult books. The typical print size for children’s books ranges from 5 to 10 mm in x-height, equivalent to 0.72 to 1.43 deg at a viewing distance of 40 cm (Hughes & Wilkins, 2002). Hughes and Wilkins (2000)found that the reading speed of children aged 5 to 7 years decreased as the text size decreased below this range while older children aged 8 to 11 years were less dependent on letter size. O’Brien et al. (2005) reported that the critical print size (CPS) decreases with increasing age in school-age children, showing that younger children need a larger print size in order to reach their maximum reading speed than older children. The critical print size (CPS) for adults is close to 0.2° (Legge, Pelli, Rubin & Schleske, 1985; Mansfield, Legge, & Bane, 1996). It has also been observed that the size of the visual span shows the same dependence on character size as reading speed (Legge, et al., 2007). It is possible that the use of larger print in children’s books reflects the need for larger print size to maximize reading speed. In this study, we used two letter sizes −0.25°, which is slightly above the CPS of adults and 1°, which is substantially larger than the CPS. Our goal was to assess the impact of this difference on the size of the visual span and reading speed for children.
We summarize the goals of this study as follows:
First, we hypothesize that developmental changes in the size of the visual span play a role in the developmental increase in reading speed. To test this hypothesis, we measured the size of the visual span and reading speed for children at three grade levels4 (3rd, 5th and 7th) and for young adults. A testable prediction of the hypothesis is that the visual span increases in size with age and is positively correlated with reading speed.
Secondary goals were to 1) examine the effect of letter size on the development of the visual span; and 2) to assess the influence of oculomotor control with a comparison of RSVP and flashcard reading.
2.1. Participants
Groups of 10 children in 3rd, 5th, and 7th grade and 10 adults (college students) participated in this study. The children were recruited from the Minneapolis public schools. They were all screened to have normal vision and to be native English speakers. Students with reading disabilities, speech problems or cognitive deficits were excluded. Cooperating teachers at the schools were asked to select students in each grade level to approximately match students for IQ and academic standing across grade levels. Ten college students were recruited from the University of Minnesota with the same criteria. For each participant, visual acuity and reading acuity were assessed with the Lighthouse Near Acuity Test and MNREAD chart respectively. Proper refractive correction for the viewing distance was made. All participants were paid $10.00 per hour. Informed consent was obtained from parents or the legal guardian in addition to the assent of children in accordance with procedures approved by the internal review board of the University of Minnesota. The mean age, visual acuity, and gender ratio for participants in the different grades are provided in Table 1.
Table 1
Table 1

Mean Age, Visual Acuity and Gender Ratio for Participants
2.2. Stimuli
Trigrams, random strings of three letters, were used to measure visual-span profiles. Letters were drawn from the 26 lowercase letters of the English alphabet (repeats were possible). By chance some of the trigrams are three-letter English words (e.g. dog, fog) which might be easier to recognize. However, the chance of getting a word trigram is less than 2% which is not likely to have much influence on the overall letter recognition accuracy (c.f. Legge et al., 2001).
All letters were rendered in a lower case Courier bold font (Apple Mac) – a serif font with fixed width and normal spacing. The letters were dark on a white background (84 cd/m2) with a contrast of about 95%. Letter size is defined as the visual angle subtended by the font’s x-height. The x-height of 0.25° and 1° character size corresponded to 6 pixels and 24 pixels. The viewing distance for all testing was 40cm. The same font was used for measuring reading speeds (see below).
The stimuli were generated and controlled using Matlab (version 5.2.1) and Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997). They were rendered on a SONY Trinitron color graphic display (model: GDM-FW900; refresh rate: 76 Hz; resolution: 1600×1024). The display was controlled by a Power Mac G4 computer (model: M8570).
Oral reading speed was measured with two methods–Rapid Serial Visual Presentation (RSVP) and a static text display (Flashcard). The pool of test material consisted of 187 sentences in the original MNREAD format developed for testing reading speed by Legge, Ross, Luebker and LaMay (1989). All the sentences were 56 characters in length. In the Flashcard presentation, the sentences were formatted into four lines of 14 characters (Fig. 2.b.).
 

Figure 2

 

Figure 2

Schematic Diagram of RSVP (a.) and Flashcard (b) reading speed tasks and Sample sentences (c).
The mean word length was 3.94 letters and 93% of the 1581 unique words occur in the 2000 most frequent words based on The Educator’s Word Frequency Guide(Zeno, Ivens, Millard, & Duvvuri, 1995). Mean difficulty of the sentences in the pool was 4.77 (Gunning’s Fog Index), and 1.34 (Flesh-Kincaid Index). According toCarver’s (1976) formula5, the mean difficulty level is below 2nd grade level. Allowing for differences in these metrics, the difficulty of the sentences is roughly 2nd to 4thgrade level. Sample sentences are presented in Figure 2.c. We divided the sentence pool into three sub-pools so that there were separate, non-overlapping sets of sentences for RSVP, Flashcard, and practice. Sentences were selected randomly without replacement, so that no subject saw the same sentence more than once during testing.
2.3. Measuring Visual-Span Profiles
Visual-span profiles were measured using a letter recognition task, as described in the Introduction. Trigrams were presented with their middle letter at 11 letter positions, including 0 (the letter position at fixation) and from 1 to 5 letter widths left and right of the 0 position. Trigram position was indexed by the middle letter of the trigram. For instance, a trigram abc at the position +3 had the b located in position 3 to the right of the 0 letter position, and a trigram at position −3 had its middle letter three letter positions to the left.
Each of the 11 trigram positions was tested 10 times, in a random order, within a block of 110 trials. The task of the participant was to report the three letters from left to right. A letter was scored as being identified correctly only if its order within the trigram was also correct. Feedback was not provided to the participants about whether or not their responses were correct.
Participants were instructed to fixate between two vertically separated fixation points (Fig. 1) on the computer screen during trials. Since there was no way of predicting on which side of fixation the trigram would appear, and the exposure time was too brief to permit useful eye movements, the participants understood that there was no advantage to deviate from the intended fixation. All participants had practice trials in the trigram test, RSVP test and Flashcard test prior to data collection. Participants were verbally encouraged to fixate carefully between the dots at the beginning of a trial.
Proportion correct recognition was measured at each of the letter slots and combined across the trigram trials in which the letter slot was occupied by the outer (the furthest letter from fixation), middle, or inner (the one closest to fixation) letter of a trigram. This means that although trigrams were centered at a given position only 10 times in a block, data from that position were based on 30 trials. As described in the Introduction, a visual span profile consists of percent correct letter recognition as a function of position left and right of fixation. These profiles are fit with “split Gaussians”, that is, Gaussian curves that are characterized with amplitude (the peak value at letter position 0), and the left and right standard deviations (the breadth of the curve). These profiles usually peak at the midline and decline in the left and right visual fields. The profiles are often slightly broader on the right of the peak (Legge et al., 2001).
As described in the Introduction and illustrated in Figure 1 (i.e., the right vertical scale), percent correct letter recognition can be linearly transformed to information transmitted in bits. The information values range from 0 bits for chance accuracy of 3.8% correct (the probability of correctly guessing one of 26 letters) to 4.7 bits for 100% accuracy (Legge et al., 2001)6. The size of the visual span is quantified by summing across the information transmitted in each slot (similar to computing the area under the visual-span profile). Lower and narrower visual span profiles transmit fewer bits of information. In the Results, the size of the visual span will be quantified in units of bits of information transmitted.
Visual-span profiles were measured for each participant at two letter sizes (0.25° and 1°). In both cases, the stimulus exposure time was 100ms. The order of the two conditions was interleaved both within participants and across participants (e.g. participant A started with 1° letter size while participant B started with 0.25° letter size, and so on).
2.4. Measuring Reading Speed
Oral reading speed was measured with two testing methods: Rapid Serial Visual Presentation (RSVP) and static text (Flashcard method). For both testing conditions, the method of constant stimuli was used to present sentences at five exposure times in logarithmically spaced steps, spanning ~ 0.7 log units. For both reading speed tasks, the two letter size conditions were interleaved. The testing session was preceded by a practice session. During this session, the range of exposure times for each participant was chosen in order to make sure that at least 80% correct response (percent of words correct in a sentence) was obtained at the longest exposure time.
For RSVP, the sentences were presented sequentially one word at a time at the same screen location (i.e., the first letter of each word occurred at the same screen location). There was no blank frame (inter-stimulus interval) between words. Each sentence was preceded and followed by strings of x’s as shown in Figure 2.a. In the Flashcard reading test, an entire sentence was presented on the screen as shown inFigure 2.b.
For both tasks, participants initiated each trial by pressing a key. They were instructed to read the sentences aloud as quickly and accurately as possible. Participants were allowed to complete their verbal response at their own speed, not under time pressure. A word was scored as correct, even if given out of order, e.g., a correction at the end of a sentence, the number of words read correctly per sentence was recorded. Five sentences were tested for each exposure time and percent correct word recognition was computed at each exposure time.
Psychometric functions, percent correct versus log RSVP or log Flashcard exposure times, were created by fitting these data with cumulative Gaussian functions (Wichmann, & Hill, 2001a) as shown in Figure 3. The four panels represent four sets of data from RSVP and Flashcard tasks at two letter sizes. Five data points in each panel represent percent words correct in a sentence for RSVP and for Flashcard. The threshold exposure time, for words of a given length was based on the 80% correct point on the psychometric function. For example, in RSVP, if an exposure time of 200 msec per word yielded 80% correct, the reading rate was 5 words per second, equals to 300 wpm. For Flashcard, if the exposure time was 2 sec and the participant read 8 words correctly out of ten, the corresponding reading speed was 4 words per second, equals to 240 wpm.
 

Figure 3

 

Figure 3

Proportion of words read correctly is plotted as a function of exposure time in sec per word for RSVP and exposure time in sec per sentence for Flashcards (Participant S1, 7th grader). The top two panels show RSVP and Flashcard data for letter size 0.25°.(more …)
Three dependent variables were measured: the size of the visual span, RSVP reading speed and flashcard reading speed. We conducted one ANOVA test for each measure. The grade level (3rd, 5th, 7th, and Adult) was treated as a categorical variable rather than numerical variable for the statistical analysis.
A 4 (grade) × 2 (letter size) repeated measures ANOVA with grade as a between-subject factor and letter size as a within-subject factor was tested on the size of the visual span. There was a significant main effect of grade level on the size of the visual span (F(3,36) = 9.54, p < 0.001). There was a significant interaction effect between grade level and letter size (F(3,36) = 3.46, p = 0.02). But no significant main effect of letter size on the size of the visual span was found.
A 4 (grade) × 2 (letter size) repeated measures ANOVA with grade as a between-subject factor and letter size as a within-subject factor was tested on RSVP and flashcard reading speeds separately. There was a main effect of grade level on RSVP reading speed (F(3, 36) = 7.80, p < 0.001) and Flashcard reading speed (F(3, 36) = 9.35, p < 0.001). No significant letter size effects on reading speed were found.
The effect of grade level on the size of the visual span and reading speed
The 4 × 2 repeated measure ANOVA test showed that there was a significant main effect of grade on the size of the visual span (η2 = 0.44, p < 0.01). A pairwise contrast test also showed that there were significant differences in the size of the visual span among all pairs of grades except between 3rd and 5th grades. The mean size of the visual span averaged across two letter sizes for the 10 participants is plotted for each grade in Figure 4. These results show that the visual span grows in size from 3rd grade (mean = 34.28 ± 1.17 bits) to adults (mean = 41.66 ± 0.87 bits). The effect size (using Cohen’s d) of the difference in the size of the visual span between 3rd grade and adults equals to 2.28.
 

Figure 4

 

Figure 4

The size of the visual span for students in three grades and for adults. Each bar indicates the mean size of the visual span for 10 participants averaged across the two letter sizes. The error bars represent ±1 standard error of the mean.
We also found that there was a significant main effect of grade level on both RSVP (η2 = 0.39, p < 0.01) and Flashcard (η2 = 0.44, p < 0.01) reading speeds. Figure 5shows RSVP (left panel) and Flashcard (right panel) reading speeds (wpm) as a function of grade level. Open circles in both panels represent reading speeds for 1° letters, and the closed circles for 0.25° letters. Each data point represents the mean reading speed averaged across two letter sizes for a single participant.
 

Figure 5

 

Figure 5

Reading speed (wpm) as a function of grade level for two letter sizes. Each error bar represents ±1 standard error of the mean. Open circles in both panels represent reading speeds for 1° letters, and the closed circles for 0.25° (more …)
As shown in Figure 5, there was a linear increase in both RSVP and flashcard reading speeds with grade level. As expected from prior research, RSVP reading speed was faster than Flashcard reading speed for all groups by an average factor of 1.58, which is fairly consistent with the results (i.e. a factor of 1.44) for a similar comparison by Yu et al. (2007). The growth in RSVP reading speed across grades exceeds the growth in flashcard reading speed, confirming the view that maturation of the oculomotor system is not a major factor associated with the growth in children’s reading speed.
The increment in flashcard reading speed per grade was consistent with earlier studies of page reading speed (Taylor, 1965; Carver, 1990; Tressoldi, Stella, & Faggella, 2001). Carver (1990) estimated that the growth in reading speed was 14 standard-length words per minute per grade level (where one standard-length word is equivalent to 6 characters). The average increment for Flashcard reading speed in our study was approximately 18 words per minute each year and its transformed value into Carver’s metric is 14 wpm, equal to Carver’s estimate.
Relationship between the size of the visual span and reading speed
Flashcard and RSVP reading speeds are plotted against the size of the visual span for our forty participants in Figures 6 and and77 respectively. The closed circles, open circles, closed squares, and open squares show data for 3rd, 5th, 7th grade, and adults respectively. The best-fitting lines for predicting reading speed from the size of the visual span are also shown.
 

Figure 6

 

Figure 6

Flashcard reading speed (wpm) as a function of the size of the visual span. The solid line represents a regression line. Each point represents data for one participant. Closed circles, open circles, closed squares, and open squares represent data for(more …)
 

Figure 7

 

Figure 7

RSVP reading speed (wpm) as a function of the size of the visual span. The solid line represents a regression line. Each point represents data for one participant. Closed circles, open circles, closed squares, and open squares represent data for 3rd, (more …)
There were significant correlations between the size of the visual span and Flashcard reading speed (r= 0.72, p < 0.01), and RSVP reading speed (r = 0.58, p = 0.01).
From the regression model for flashcard reading (Fig. 6), 52% of the variability of the reading speed can be accounted for by the size of the visual span (r2 = 0.52, p < 0.01). The slope of the regression line indicates that an increase in the size of the visual span by 1 bit brings about an increase in reading speed by 22 wpm. The effect size (Cohen’s d) is 2.29 for the difference in flashcard reading speed between 3rd graders and adults. Similarly, from the regression model for RSVP reading (Fig. 7), 33% of the variability of the reading speed can be accounted for by the size of the visual span (r2 = 0.34, p < 0.01). The slope of the regression line indicates that an increase in the size of the visual span by 1 bit brings about an increase in reading speed by 28 wpm. The effect size (Cohen’s d) is once again 2.29 for the difference in RSVP reading speed between 3rd graders and adults.
As described in the Methods section, reading speed was derived from the stimulus exposure time yielding 80% correct word recognition. To determine if the results were sensitive to this criterion, we reanalyzed the data with 70% and 90% criteria for defining reading speed. We found that the relationship between reading speed and the size of the visual span was not criterion dependent – correlations between size of the visual span and reading speed remained approximately the same across all three criteria (less than 0.01 differences in correlations).
The effects of letter size on the visual span and reading speed
We did not find a significant main effect of letter size on either the visual span or reading speeds in children. Contrary to the possibility raised in the Introduction, it does not appear that the use of larger print size in children’s books can be explained in terms of optimizing the size of the visual span.
While children in all three grade levels showed no dependence of letter size on the size of the visual span, adults showed slightly larger visual spans for 0.25° letters than for 1° letters (~ 3 bits). Legge et al. (2007) studied the effect of character size on the size of the visual span for a group of five young adults. They did not find a significant difference in the size of the visual span between 0.25° and 1°. We are unsure of the reason for the small discrepancy in the two studies.
4. DISCUSSION
Relationship between reading speed and the size of the visual span
It is obvious that visual processing is critical to print reading. It is not so obvious that individual differences in reading speed are linked to differences in visual processing nor that developmental changes in reading speed are influenced by visual factors. We have taken the theoretical position that front-end visual processing influences letter recognition which in turn influences reading speed. We have measured letter recognition in the form of visual-span profiles. The shape and size of these profiles are largely immune to top-down contextual factors and to oculomotor factors, and represent the bottom-up sensory information available to letter recognition and reading. The size of these profiles has been previously linked empirically and theoretically to reading speed (Legge, Mansfield & Chung, 2001; Legge et al., 2007). More specifically, it is hypothesized that the size of the visual span is an important determinant of reading speed.
As reviewed in the Introduction, it is known that children’s reading speed gradually increases throughout the school years (cf., Carver, 1990). The principal goal of our study was to determine whether visual development has an impact on this improvement in reading speed. We addressed this question by measuring changes in the size of the visual span across grade levels. Our hypothesis was that the size of the visual span would increase with grade level, and exhibit a correlation with reading speed.
These predictions were confirmed by our results. We found that there was a developmental growth in the size of the visual span from 3rd grade to adulthood paralleling growth in reading speed. A statistically significant 34% to 52% of the variance in reading speed could be accounted for by the size of the visual span.
Why does a larger visual span facilitate faster reading? For eye-movement mediated reading of lines of text on a page or screen (such as the flashcards in the present study), a larger visual span means that more letters can be recognized accurately on each fixation. With a larger visual span, longer words might be recognized on one fixation, or more letters of an adjacent word might be recognized if the fixated word is short (parafoveal preview). The effects of changing the size of the visual span were explored using an ideal-observer model, called Mr. Chips, by Legge, Klitz and Tjan (1997). Because a larger visual span means that more letters are recognized, the reader is able to make larger saccades; the greater mean saccade length facilitates faster reading. In the case of RSVP reading, there is no need for intra-word saccades or parafoveal preview of the leading letters of the next word. Only one word is visible at a time. In this case, we might speculate that the visual span need only be large enough to accommodate mean word length of the text (3.94 letters in the present study) or possibly the longest word in the text (8 letters in our text). If so, we might expect a weaker effect of visual-span size on RSVP reading speed, and possibly a ceiling once the visual span exceeded some critical value. These effects are not evident in the present data. Growth of the visual span manifests as both an increase in the breadth of visual-span profiles and also an increase in the height of the profiles, i.e., increasing letter-recognition accuracy in the central portion of the profile. The increased height of the profile could contribute to faster and more accurate recognition, even of relatively short strings. In other words, the graded form of the visual-span profile, and its potential growth in both height and breadth, can contribute to faster reading for both flashcard and RSVP text.
We recognize that our results are correlational in nature. It is possible that independent factors could drive the developmental changes in reading speed and size of the visual span. Although a causal link between the size of the visual span and reading speed remains to be proven, stronger evidence for a causal link has been provided by Legge, Cheung, Yu, Chung, Lee & Owens, 2007). These authors have amassed convergent data from several experiments on adults showing that the size of the visual span and reading speed vary in a highly correlated way in response to changes in stimulus parameters such as contrast and character size. For example, it is known that the dependence of reading speed on character size exhibits a nonmonotonic relationship in which reading speed has a maximum value for a range of intermediate character sizes, and decreases for larger and smaller character sizes. Legge et al. (2007) showed that the size of the visual span has the same nonmonotonic dependence on character size.
Sensory factors affecting the size of the visual span
What sensory factors might contribute to developmental changes in the size of the visual span? In the Introduction, we mentioned three candidate factors—errors in the relative position of letters in strings, orientation errors such as confusing b with d, and effects of crowding. We briefly comment on additional analyses of our visual-span data to address the roles of these factors.
Errors in relative spatial position (e.g., reporting bqx when the stimulus was qbx), sometimes termed mislocation errors, were evaluated by scoring trigram letter recognition in two ways; by demanding correct relative position for a letter to be correct, or by the more lenient criterion of scoring a letter correct if reported anywhere in the trigram string. The difference in percent correct by these two scoring methods is a measure of the rate of mislocation errors. An one-way ANOVA with grade (3rd, 5th, 7th, and Adult) as a between-subject factor revealed a significant main effect of grade on the rate of mislocation errors (F(3, 36) = 4.55, p < 0.01). The rate of mislocation errors increased with decreasing grade level (mean error rate for 3rd grade = 8.43 ± 1.1% and the mean error rate for adults = 4.25 ± 0.5%). Mislocation errors could be cognitive in origin, resulting from verbal-reporting mistakes, or visual in origin, resulting from imprecise coding of visual position. We think the latter is more likely because we found that the rate of mislocation errors was dependent on visual-field location, increasing at greater distances from fixation. This dependency of mislocation errors on letter position was consistent across all age groups.
We assessed orientation errors by measuring b and d confusions, and also p and qconfusions. Orientation errors are defined when b (or p) is reported instead of d (or p) and vice versa. The number of incorrect responses out of the total number of occurrence of b, p, d, and q is a measure of the rate of orientation errors. An one-way ANOVA with grade as a between-subject factor revealed a significant main effect of grade on the rate of orientation errors (F(3, 36) = 4.98, p < 0.01). Orientation errors decreased with increasing grade level (mean error rate for 3rd grade = 5.85 ± 0.40% vs. mean error rate for adults = 3.79 ± 0.38%). Since these children and adults would typically have no difficulty in distinguishing b from d, or p from q, in an untimed test of isolated letter recognition, we expect that these confusions result from the temporal demands of the trigram task or from adjacency of flanking letters (crowding) and have an impact on the size of the visual span.
In a separate preliminary report, based on this data set, we have shown that a decrease in crowding accounts for at least a portion of the growth in the size of visual span profiles across grade levels (Kwon & Legge, 2006). Pelli et al. (in press)have recently presented compelling theoretical and empirical arguments for the important role of crowding in limiting the size of the visual span (they use the term “uncrowded span”), although they did not address developmental changes in the size of the visual span.
In short, relative position errors, orientation errors and crowding may all play a role in developmental changes in the size of the visual span.
Oculomotor factors
It is also possible that fixation errors could play a role in the observed developmental changes in the size of the visual span. Indeed, it has been reported that children’s fixation stability increases with age from 4 to 15 years (Ygge, et al, 2005). If children erroneously fixated leftward or rightward of the intended location in our trigram task, performance would on average, suffer; the mean distance of trigrams from the fixation point would increase as the size of the fixational error increases. We conducted a simulation analysis to evaluate the impact on the size of the visual span of such fixation errors. The key parameter of the model was the variability in fixation positions, represented by the standard deviation of an assumed Gaussian distribution of fixation locations centered on the correct fixation mark. An average adult visual span was used as an input parameter for each Bernoulli trial to obtain proportion correct for each letter position. Over trials, we computed the size of the visual span in bits of information transmitted. Through 100 repetitions, we obtained the estimates of the size of the visual span for a given fixation error. For example, if the standard deviation was two letter positions (σ = 2), 68% of the fixation points in the simulated trials would lie within ±2 letter positions from the intended fixation mark. As expected the greater the fixation errors (i.e., larger standard deviations), the smaller the size of the resulting visual spans. The simulation results indicated that fixation variability would need to increase from a standard deviation of 0 to more than 3 letter positions to simulate our observed reduction in visual span size from adults to 3rd graders. Moreover, fixation errors of 3 letter spaces for 1° letters would correspond to fixation errors of 12 letter spaces for 0.25° letters, producing devastating effects on the size of the visual span for the smaller print size. Because we did not observe print size effects on the size of the visual span, and because the fixation errors deduced from our simulation seem implausibly large, we doubt that fixation errors account for the developmental differences in the size of the visual span.
We also observed a substantial growth in reading speed across grades even in the RSVP reading where the need for eye movements is minimized. This result also confirms the view that developmental changes in reading speed can not be solely explained by maturation of oculomotor control.
Non-visual factors
Although we have focused on the size of the visual span as a possible factor influencing reading development, our data indicate that this factor accounts for at most 30 to 50% of the variance in reading speeds across grade levels. Non-visual cognitive and linguistic factors must also contribute to developmental changes in reading speed. It is possible that accidental correlations of one of these factors with grade level could masquerade as an effect of visual span. For example, if reading speed is correlated with IQ, and some unknown selection bias resulted in increasing mean IQ across grade level, then IQ might underlie the correlations we found between reading speed and visual span. In the case of IQ, this seems highly unlikely. Although we did not control for or measure the IQ of our subjects, we have no reason to suspect that there were increases in IQ across grade levels. Even if such a sampling bias exists, O’Brien et al. (2005) found no effect of IQ on maximum oral reading speed and critical print size in a group of children (aged 6 to 8) tested with MNREAD sentences similar to those used in the present study.
As another example, it is possible that children’s ability to recognize and speak the words used in our testing material varied across grade levels, accounting for the correlation between reading speed and grade level. For example, if children in the lower grades were unable to recognize and articulate words in the test material, even for unlimited viewing time, the missed words would count as errors in our scoring and result in reduced reading speed. We did not test word decoding skills of our subjects on a standardized test such as the subsets of the Woodcock-Johnson III Cognitive and Achievement Batteries (Woodcock, McGrew, & Mather, 2001). We did, however, screen all of our subjects with the MNREAD acuity chart (for a review of its properties, see Mansfiel & Legge, 2007). This chart, although designed as a test of the effect of visual factors on maximum reading speed, critical print size and reading acuity, uses simple declarative sentences with vocabulary consisting of the 2,000 most frequent words in 1st, 2nd, and 3rd grade text. The sentence material on the MNREAD chart is very similar to the test material in the present study. None of the words was missed or read incorrectly by our children for sentences above their critical print sizes. These observations lead us to conclude that untimed word-decoding skill was not a limiting factor influencing performance across grade levels in our study.
As yet another example of a potential non-visual influence, the oral reporting method used in the trigram task for measuring visual-span profiles might reflect more than the ability to extract visual information. Performance in this task could be influenced by articulation programming, rapid access to letter naming, memory capacity, and reporting accuracy. Many studies using rapid automatized letter naming (RAN) have shown that those component skills are highly correlated with reading performance (e.g., Denckla & Rudel, 1976; Wolf, 1991; Wolf, Bally, & Morris, 1986; Manis, Seidenberg, & Doi, 1999). It is possible that the underlying visual spans are actually stable across school age, but the observed changes in the size of visual-span profiles might be due to some later stages of processing. However, we think this is unlikely. In the trigram task, there was no time pressure to report the letters, so there were no requirements for rapid articulation and no time pressure on access to letter naming codes. It is still possible that younger children might make more phonological errors or transposition errors in reporting due to less efficient memory. Indeed, it is known that overall memory capacity including perceptual-memory improves with increasing age in children (Dempster, 1978; Shwantes, 1979; Ross-sheehy, Oakes, & Luck, 2003). However, convergent evidence has shown that children at the age of 9 are able to hold an average 5 to 6 digits or spatial symbols in their visual memory (e.g., Wilson, Scott, & Power, 1987; Miles, Morgan, Milne, & Morris, 1996). This result suggests that recalling and reporting a triplet of letters is not likely to pose difficulties for the children in our study. Manis et al. (1999) had 1st and 2nd grade students name 50 digits and letters in a random order aloud as rapidly as possible and measured reporting accuracy. They found that the rate of oral reporting errors was less than 2%, suggesting that by the end of first grade, most children know the names of all the letters and are able to report them with high accuracy.
These considerations encourage us to believe that the observed differences in the size of the visual span across age is likely to represent changes in the availability of bottom-up sensory information rather than effects of later stages of processing. Nevertheless, we cannot rule out the possibility that some other uncontrolled cognitive or other non-visual variable accounted for the apparent association between visual span and reading speed across grade levels in our study.
Effect of letter size
Finally, we addressed the effect of letter size. We expected that young children would have larger visual spans and read faster with 1° characters than with 0.25° characters. Contrary to our expectation, we found no effect of character size for either reading speed or visual span in children. Apparently, legibility as assessed by these two measures, does not account for the preference of children for larger print in books. It is possible that developmental changes in the effects of print size on reading speed are complete by 3rd grade (age 8–9 years), accounting for the absence of print size effects in our data. Consistent with this possibility, Wilkins and Hughes (2002) found that younger children aged below 7 showed a significant dependence of reading speed on letter size in the range 0.72 to 1.43 deg at a viewing distance of 40 cm, but older children above 8 years did not. Similarly, O’Brien et al. (2005)showed that critical print size (CPS) decreased with age from 6 to 8 years old, suggesting younger children need larger print to optimize reading performance. Taken together, it may be the case that the dependence of reading speed on print size becomes adult-like by about 8 years of age.
Summary
We summarize our conclusions as follows: 1) The visual span grows in size during the school years. 2) Consistent with the visual-span hypothesis this developmental change in the size of the visual span is significantly correlated with the developmental increase in reading speed. 3) Because both RSVP and flashcard reading speed increase with age, the growth in reading speed is unlikely to be due to oculomotor maturation. 4) We found no evidence that the use of larger print in children’s books reflects faster reading or larger visual spans for large print.
Acknowledgments
We are grateful to students and teachers of the Minneapolis Public Schools for their participation in this study. We thank Beth O’Brien for her helpful advice on the earlier draft of this manuscript. We are also thankful to Sing-Hang Cheung for his help with the design of experiments. We would like to thank anonymous reviewers for their comments on the manuscript. This work was supported by NIH grant R01 EY02934.
Footnotes
1Carver (1977) defined six characters in text (including spaces and punctuation) as one “standard-length word.” Measuring reading speed in standard-length words per minute is a character-based metric. Carver (1990) argued for the advantage of this metric over the common “words per minute” metric for measuring reading speed.
2The term ‘visual span’ was introduced by O’Regan (O’Regan, Levy-Schoen & Jacobs, 1983; O’Regan, 1990,1991). He defined the visual span as the region around the point of fixation within which characters of a given size can be resolved. Empirical studies have shown that normally sighted adults have a visual span of 7–11 letters. For a review, see Legge (2007, Ch. 3).
3Trigrams were used rather than isolated letters because of their closer approximation to English text. Text contains strings of letters. Most letter recognition in text involves characters flanked on the left, right or both sides.
4In this article, school grade levels refer to the American system. The correspondence between grade level and age is as follows: 1st grade (6–7 yrs), 2nd grade (7–8 yrs), 3rd grade (8–9 yrs), 4th grade (9–10 yrs), 5th grade (10–11 yrs), 6th grade (11–12 yrs), 7th grade (12–13 yrs), and 8th grade (13–14 yrs).
5We estimated the grade level from Carver (1976) who expressed the relationship between characters per word (cpw) and difficulty level (DL). According to his formula, the number of characters per word for 1st grade difficulty is approximately 5 cpw including a trailing space after each word, which is slightly above the number of characters per word (4.7 cpw) we used for our reading tasks.
6Percent correct letter recognition was converted to bits of information using letter-confusion matrices byBeckmann (1998).
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
  • Adams MJ. Beginning to read: Thinking and learning about print. Cambridge, MA: The MIT Press; 1990.
  • Aghababian V, Nazir TA. Developing normal reading skills: aspects of the visual processes underlying word recognition. Journal of Experimental Child Psychology. 2000;76:123–150.[PubMed]
  • Beckmann PJ. PhD dissertation. University of Minnesota; Minneapolis, MN: 1998. Preneural limitations to identification performance in central and peripheral vision. 1998.
  • Bondarko V, Semenov L. Visual acuity and the crowding effect in 8- to 17-year-old schoolchildren. Human Physiology. 2005;31:532–538.
  • Bouma H. Visual interference in the parafoveal recognition of initial and final letters of words.Vision Research. 1973;13:767–782. [PubMed]
  • Brainard DH. The Psychophysics Toolbox. Spatial Vision. 1997;10:433–436. [PubMed]
  • Cairns NU, Setward MS. Young children’s orientation of letters as a function of axis of symmetry and stimulus alignment. Child Development. 1970;41:993–1002. [PubMed]
  • Carver RP. Toward a theory of reading comprehension and rauding. Reading Research Quarterly. 1977;13:8–63.
  • Carver RP. Word length, prose difficulty and reading rate. Journal of Reading Behavior.1976;8:193–204.
  • Carver RP. Reading rate: A review of research and theory. San Diego, CA: Academic Press; 1990.
  • Chung STL, Mansfield JS, Legge GE. Psychophysics of reading. XVIII. The effect of print size on reading speed in normal peripheral vision. Vision Research. 1998;38:2949–2962.[PubMed]
  • Cornelissen P, Bradley L, Fowler S, Stein J. What children see affects how they read.Developmental Medicine and Child Neurology. 1991;33:755–762. [PubMed]
  • Davidson HPA. A study of reversals in young children. Journal of Genetic Psychology.1934;45:452–456.
  • Davidson HPA. A study of the confusing letters b, d, p, and q. Journal of Genetic Psychology.1935;47:458–468.
  • Dempster FN. Memory Span and Short-Term Memory Capacity: A Development Study.Journal of Experimental Child Psychology. 1978;26:419–431.
  • Denckla M, Rudel RG. Rapid “automatized” naming (RAN): Dyslexia differentiated from other learning disabilities. Nueropyschologia. 1976;14:471–479.
  • Dowdeswell HJ, Slater AM, Broomhall J, Tripp J. Visual deficits in children born at less than 32 weeks’ gestation with and without major ocular pathology and cerebral damage. British Journal of Ophthalmology. 1995;79:447–452. [PMC free article] [PubMed]
  • Farkas MS, Smothergill DW. Configuration and position encoding in children. Child Development. 1979;50:519–523. [PubMed]
  • Goswami U, Bryant P. Phonological skills and learning to read. Hillsdale: Erlbaum; 1990.
  • Hale S. A global developmental trend in cognitive processing speed. Child Development.1990;61:653–663. [PubMed]
  • Hughes L, Wilkins A. Typography in children’s reading schemes may be suboptimal: evidence from measures of reading rate. Journal of Research in Reading. 2000;23:314–324.
  • Hughes LE, Wilkins AJ. Reading at a distance: implications for the design of text in children’s big books. British Journal of Educational Psychology. 2002;72:213–226. [PubMed]
  • Jackson MD, McClelland JL. Processing determinants of reading speed. Journal of Experimental Psychology: General. 1979;108:151–181. [PubMed]
  • Kwon MY, Legge GE. Developmental Changes in the Size of the Visual Span for Reading : Effects of Crowding. [Abstract] Journal of Vision. 2006;6:1003a.
  • Legge GE. Psychophysics of Reading. Mahweh, NJ: Erlbaum; 2007.
  • Legge GE, Cheung SH, Yu D, Chung STL, Lee H-W, Owens DP. The case for the visual span as a sensory bottleneck in reading. Journal of Vision. 2007;7:1–15.http://www.journalofvision.org/7/2/9/
  • Legge GE, Hooven TA, Klitz TS, Mansfield JS, Tjan BS. Mr. Chips 2002: New insights from an ideal-observer model of reading. Vision Research. 2002;42:2219–2234. [PubMed]
  • Legge GE, Klitz TS, Tjan BS. Mr. Chips: An ideal-observer model of reading. Psychological Review. 1997;104:524–553. [PubMed]
  • Legge GE, Mansfield JS, Chung STL. Psychophysics of reading. XX. Linking letter recognition to reading speed in central and peripheral vision. Vision Research. 2001;41:725–734.[PubMed]
  • Legge GE, Pelli DG, Rubin GS, Schleske MM. Psychophysics of reading. I. Normal vision.Vision Research. 1985;25:239–252. [PubMed]
  • Legge GE, Ross JA, Maxwell KT, Luebker A. Psychophysics of reading. VII. Comprehension in normal and low vision. Clinical Vision Sciences. 1989;4:51–60.
  • Manis FR, Seidenberg MS, Doi LM. See Dick RAN: Rapid Naming and the Longitudinal Prediction of Reading Subskills in First and Second Graders. Scientific Studies of Reading.1999;3:129–157.
  • Mansfield JS, Legge GE, Bane MC. Psychophysics of reading. XV. Font effects in normal and low vision. Investigative Ophthalmology & Visual Science. 1996;37:1492–1501. [PubMed]
  • Martins AJ, Kowler E, Palmer C. Smooth pusuit of small-amplitude sinusoidal motion. Journal of the Optical Society of America A. 1985;2:234–242.
  • Mason M. Reading ability and the encoding of item and locations information. Journal of Experimental Psychology. 1980;6:89–98. [PubMed]
  • Mason M, Katz L. Visual processing of non-linguistic strings: Redundancy effects in reading ability. Journal of Experimental Psychology: General. 1976;105:338–348. [PubMed]
  • McConkie GW, Rayner K. The span of the effective stimulus during a fixation in reading.Perception and Psychophysics. 1975;17:578–586.
  • Miles C, Morgan MJ, Milne AB, Morris EDM. Developmental and individual differences in visual memory span. Current Psychology. 1996;15:53–67.
  • Morrison FJ, Giordani B, Nagy J. Reading disability: An information-processing analysis.Science. 1977;196:77–79. [PubMed]
  • Muter V, Hulme C, Snowling M, Taylor S. Segmentation, not rhyming, predicts early progress in learning to read. Journal of Experimental Child Psychology. 1997;65:370–396. [PubMed]
  • Neuhaus G, Foorman BR, Francis DJ, Carlson CD. Measures of information processing in rapid automatized naming (RAN) and their relation to reading. Journal of Experimental Child Psychology. 2001;78:359–373. [PubMed]
  • O’Brien BA, Mansfield JS, Legge GE. The effect of print size on reading speed in dyslexia.Journal of Research in Reading. 2005;28:332–349. [PMC free article] [PubMed]
  • O’Regan JK. Eye movements and reading. In: Kowler E, editor. Eye movements and their role in visual and cognitive processes. New York: Elsevier; 1990. pp. 395–453.
  • O’Regan JK. Understanding visual search and reading using the concept of stimulus “grain”IPO Annual Progress Reports. 1991;26:96–108.
  • O’Regan JK, Levy-Schoen A, Jacobs AM. The effect of visibility on eye-movement parameters in reading. Perception and Psychophysics. 1983;34:457–464. [PubMed]
  • Pelli DG. The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision. 1997;10:437–442. [PubMed]
  • Pelli DG, Farell B, Moore B. The remarkable inefficiency of word recognition. Nature.2003;423:752–756. [PubMed]
  • Pelli DG, Tillman KA, Freeman J, Su M, Berger TD, Majaj NJ. Reading is crowded. Journal of Vision in press.
  • Rayner K. Eye movements and the perceptual span in beginning and skilled readers. Journal of Experimental Child Psychology. 1986;41:211–236. [PubMed]
  • Rayner K. Eye movements in reading and information processing: 20 years of research.Psychological Bulletin. 1998;124:372–422. [PubMed]
  • Rayner K, Bertera JH. Reading without a fovea. Science. 1979;206:468–469. [PubMed]
  • Ross-sheehy S, Oakes LM, Luck SJ. The Development of Visual Short-Term Memory Capacity in Infants. Child development. 2003;74:1807–1822. [PubMed]
  • Shwantes FM. Cognitive scanning processes in children. Child development. 1979;50:1136–1143. [PubMed]
  • Skottun BC. On the conflicting support for the magnocellular-deficit theory of dyslexia. Trends in Cognitive Sciences. 2000a;4:211–212. [PubMed]
  • Skottun BC. The magnocellular deficit theory of dyslexia: The evidence from contrast sensitivity. Vision Research. 2000b;40:111–127. [PubMed]
  • Smith F. Understanding reading. New York: Holt, Rinehart & Winston; 1971.
  • Spinelli D, De Luca M, Judica A, Zoccolotti P. Length effect in word naming in reading: Role of reading experience and reading deficit in Italian readers. Developmental Neuropsychology.2002;27:217–235. [PubMed]
  • Stanovich KE. Toward an interactive-compensatory model of individual differences in the development of reading fluency. Reading Research Quarterly. 1980;16:32–71.
  • Stanovich KE. Progress in understanding reading. New York: The Guilford Press; 2000.
  • Stein J, Walsh V. To see but not to read: The magnocellular theory of dyslexia. Trends in Neuroscience. 1997;20:147–152.
  • Taylor SE. Eye movements in reading: Facts and fallacies. American Educational Research Journal. 1965;2:187–202.
  • Tressoldi PE, Stella G, Faggella M. The development of reading speed in Italians with dyslexia: A longitudinal study. Journal of Learning Disabilities. 2001;34:414–417. [PubMed]
  • Wichmann FA, Hill NJ. The psychometric function: I. Fitting, sampling and goodness-of-fit.Perception and Psychophysics. 2001a;63:1293–1313. [PubMed]
  • Wilson JTL, Scott JH, Power KG. Developmental differences in the span of visual memory for pattern. British Journal of Developmental Psychology. 1987;5:249–255.
  • Woodcock RW, McGrew KS, Mather N. Woodcock-Johnson III. Itasca, IL: Riverside; 2001.
  • Wolf M. Naming speed and reading: The contribution of the cognitive neurosciences. Reading Research Quarterly. 1991;26:123–141.
  • Wolf M, Bally H, Morris R. Automaticity, retrieval processes and reading: A longitudinal study in average and impaired readers. Child Development. 1986;57:988–1000. [PubMed]
  • Ygge J, Aring E, Han Y, Bolzai R, Hellstrom A. Fixation Stability in Normal Children. Ann NY Academy of Sciences. 2005;1039:480–483.
  • Yu D, Cheung SH, Legge GE, Chung STL. Effect of letter spacing on visual span and reading speed. Journal of Vision. 2007;7:1–10. http://www.journalofvision.org/7/2/2/
  • Zeno S, Ivens S, Millard R, Duvvuri R. The educator’s word frequency guide. Touchstone Applied Science Associates; Brewster, NY: 1995.

 

POSTED BY  ON THURSDAY, MARCH 15, 2012 AT 6:06 AM (EDIT)
FILED UNDER FEATURED, UNCATEGORIZED · TAGGED WITH , ,, , , , , , , , , 

 

 

Teachers and teacher evaluation may be directly related to a construct from social psychology calledStereotype Threat and the relationship between motivation and teacher professional identities.

Stereotype Threat

Stereotype threat is the fear that we may confirm a negative stereotype about a group we belong to. From the wikipedia, we can read some of the history of the socio-cognitive construct:

In the early 1990s, Claude Steele, in collaboration with Joshua Aronson, performed the first experiments demonstrating that stereotype threat can undermine intellectual performance. . . Overall, findings suggest that stereotype threat may occur in any situation where an individual faces the potential of confirming a negative stereotype. For example, stereotype threat can negatively affect the performance of European Americans in athletic situations[11] as well as men who are being tested on their social sensitivity.[12] The experience of stereotype threat can shift depending on which group identity is salient to the situation. For example, Asian-American women are subject to a gender stereotype that expects them to be poor at mathematics, and a racial stereotype that expects them to do particularly well. Subjects from this group performed better on a math test when their racial identity was made salient; and worse when their gender identity was made salient.[13]

In another description of stereotype threat, Open Education describes that a recent study by researchers at the University of Colorado reveals that the issue of stereotype threat in the sciences is very real for young women. The study also reveals that educators can take some very simple steps to help reduce the impact of stereotype threat in the classroom.

Certain individuals appear to be more likely to experience stereotype threat than others. Individuals who are highly identified with a particular domain appear to be more vulnerable to experiencing stereotype threat. Therefore, students who are highly identified with doing well in school may, ironically, be more likely to underperform when under stereotype threat. A key feature of this phenomena was highlighted byAmanda Schaefer at Slate Magazine. In order to counter stereotype threat, individuals need to experience positive development and build confidence over the course of a semester. Schaefer explains that a slightly better performance on test one leads to greater motivation and thus leads some individuals to work harder. That work then transcends to understanding of the material that then leads to greater confidence and even further motivation.

Teachers and Stereotype Threat

A recent study by the MeLife Foundation identified that teacher morale is at an all time low. This was discussed recently by the New York Times and a blog at Education Week. It would seem that teachers may be in a no-win situation. They are scrutinized for performance, often evaluated by administrators and others who may not be qualified to evaluate teacher performance. When faced with an evaluation, teachers may face serious professional and personal consequences when they do not satisfy criteria in the evaluation rubric, as interpreted by the evaluators. This was very nicely described in an opinion piece at the NY Times, called, Confessions of a Bad Teacher. In this article, the author writes:

I was confused. Earlier last year, this same assistant principal observed me and instructed me to prioritize improving my “assertive voice” in the classroom. But about a month later, my principal observed me and told me to focus entirely on lesson planning, since she had no concerns about my classroom management. A few weeks earlier, she had written on my behalf for a citywide award for “classroom excellence.” Was I really a bad teacher?

In my three years with the city schools, I’ve seen a teacher with 10 years of experience become convinced, after just a few observations, that he was a terrible teacher. A few months later, he quit teaching altogether. I collaborated with another teacher who sought psychiatric care for insomnia after a particularly intense round of observations. I myself transferred to a new school after being rated “unsatisfactory.”

My belief is that if we are to avoid such things as stereotype threat in evaluating teachers, good administrators use the evaluation processes to support teachers and help them avoid those painful classroom moments — not to weed out the teachers who don’t produce good test scores or adhere to their pedagogical beliefs (Johnson, 2012).

The current culture of teacher quality and evaluation may be leading to issues in how teachers view their professional identities. They may be living two different professional lives–what they believe to work, and what they have to do to make the grade. This may be especially true with innovative teachers, who have to keep their heads down and teach in a way that works for them and leads to results. After I achieving, their  methods may be accepted. This comes from professional pride and ability. The question that must be asked is whether these innovators and creative teachers can document and demonstrate data-driven instruction.   This kind of instruction may not be applicable to generalized teacher quality assessments, because what the teacher is doing is not generally what is seen.  Is it possible an evaluator who is given a rubric is able or capable of making this evident after reviewing the teacher for  55 minutes they spent checking off cells on a rubric-driven evaluation?

 

The Jekyll and Hyde Effect

In the Jeckyll and Hyde Effect (Dubbels, 2009), teachers reported themselves in a situation where they had begun creating two different classrooms, two different sets of grade books, and two different teaching identities – culminating in the classroom they show, and the classroom they grow. These teachers had created a duality in professional identity, meaning that they had created different classrooms and identities to fit the expectations of the mandates, district mentors on learning walks, district trainings, and Professional Development Planning, so they could work “under the radar” and “not be hassled.”

This phenomena seems to accompany most trends of educational reform.   In an article by Lasky (2005), it was posited that we may be destroying the professional identities of teachers by attacking their styles and beliefs about teaching and learning, and perhaps most importantly, their willingness to be vulnerable to reach kids and connect. Teachers expressed that they felt tension as professional educators, and that their beliefs about student learning often contrasted the current beliefs related to the culture of accountability.

 

 

According to Lasky (2005), this is not uncommon. In this passage (pg. 905) Lasky quotes and describes a veteran teacher considering leaving the profession because of frustration with “ladder climbers”:

Now there are lot of people who think this is a job to go to because the vacations are good, they follow the doctrines, and a lot of good people are leaving. The major message I was receiving was that you could make a difference, and we’re in this together, and it’s up to all of us to make the world a better place, you know, find your niche and dig in. And it was almost your job to do the peace and love thing. But the message now is that there’s no one to take care of you, you’ve got to watch your back, which is sad.

This teacher’s identity and sense of agency were in tension with the changing political landscape of reform. She found that she was not able to trust these people who were not willing to take the ‘‘real risks’’ entailed in teaching. One such risk is expressing one’s vulnerability;  such as knowing and standing up for one’s beliefs, connecting with students and doing all that can be done to help students from failing.

This can lead to a real hindrance in organizational and institutional trust, especially when it comes time for professional development activities that might require learning.  Teachers reported that they had seen “outcomes based education, constructivism, and profiles of learning” come and go. They reported that they had already invested a huge amount of time into these curricular approaches earlier in their careers, and were not willing to invest as heavily now that they had curriculum that worked for them. One teacher spoke of, “ I used to spend my weekends, afternoons, and evenings calling parents and correcting all for a .6 (part-time) placement, and I decided that I was working harder than the students and parents and not getting paid for it.” This teacher’s feeling was repeated throughout interviews with experienced teachers who shared that they had found an approach that they liked and allowed them to have lives outside of the classroom.

Identity and motivation

According to Dubbels (2009), formal learning seems to necessitate trust and identity. For Deci and Ryan (2002) the focus comes from work on motivation of basic psychological needs, with a focus on Autonomy — possibly built from early work by White (1959), where organisms have an innate need to experience competence and agency, and experience joy and pleasure with the new behaviors when they assert competence over the environment . . . what White called effectance motivation. If the individual gets social reinforcement and improved status in a relationship or community, they will be more likely be motivated to engage, and sustain that engagement, (Dubbels, 2009).

In order for a teacher to remain engaged in their profession and care about what is happening, and to sustain engagement , “motivation must be internalized” the teacher needs to identify the value of the behavior with other values that are part of themselves (Dubbels, 2009).  This internalization is recognized by others, and can be rewarded, ignored, or punished by the professional community, the students and parents, and administration and mentors for professional development. The same factors that we ask teachers to take into account for their classroom students are also at play in developing teacher professional development. This process of change includes public acknowledgement and awareness of making personal and professional change, and this public behavior can expose the individual to being vulnerable to the perceptions and judgment of others.

 

 

 

 

 

 

 


When a serious game is commissioned, it is expected that in-game learning should transfer to the work place or a clinical setting, not just lead to improvements in game play.

Vegas effect

Evidence of transfer should be a priority in serious game development; there should be evidence that learning acquired in a game is applicable outside of the game.

The Vegas Effect is not unique to games; however, serious games will need to provide evidence that learning that happens in games, does not stay in games.

The tradition of psychometrics may provide methods for data collection and analysis so that serious games may eventually serve as empirically validated diagnostic tools and measures of learning—applicable inside and outside of the game. With tools for measuring training effectiveness from psychometrics, ROI analysis of training solutions and clinical tools can be conducted, and the risk associated with the costs of game development may be diminished.

Serious games and assessment

Serious games are very much like the tools used in psychological assessments and evaluations. Three types of assessments from psychometric methods:

  • Formative assessments –a measurement tool used to measure growth and progress in learning and activity and can be used in games to alter subsequent learning experiences in games. Formative assessments represent a tool external to the learning activity, and typically occur in leading up to a summative evaluation.
  • Summative assessments provide an evaluation or a final summarization of learning. Summative assessment is characterized as assessment of learning and is contrasted with formative assessment, which is assessment for learning. Summative assessments are also tools external to the learning activity, and typically occur at the end of the learning intervention to evaluate and summarize and is conducted with a tool that is external, not part of the training.
  • An informative assessment guides and facilitates learning as part of the assessment. The assessment is the intervention. Successful participation in the learning results in evidence that learning has taken place. The behaviors in the activity have been shown to verify that learning has taken place. No external measures have been added on for assessment.

Games are typically used in the definition of what is an informative assessment. This makes sense, as a game, by its very nature, provides an activity along with assessments, measures, and evaluation. What, why, and how a game measures learning is of primary importance—and this is why serious game designers must learn assessment methods from the field of psychometrics if serious games are to grow as diagnostic tools, assessments, and evaluations.

If a game is to act as an informative assessment, it will stress meaningful, timely, and continuous feedback about learning concepts and process that are accurately depicted. As in an informative assessment, feedback in a game can be a powerful part of the assessment process. As the learner acts in the context of the games rule environment, they may learn the rules and tools through trial and error—eventually developing tactical approaches, and potentially formulate strategies from the possibilities for action deduced from learning from the in-game assessment criteria. This can be powerful.

Evidence supports this powerful learning tool. Research findings from over 4,000 studies indicate that informative assessment has the most significant impact on achievement (Wiliam, 2007). When serious games are built with same care as an informative assessment using methods from psychometrics, serious games can be as effective as an informative assessment.

Currently, most games are not designed as informative assessments. This means that learning in a serious game might suffer from the Vegas Effect. For a game to act as informative assessment, the game must accurately measure the learning the concepts, and the concepts from the game must transfer to other performance contexts—beyond the game. In order to achieve this, the issue of construct validity must be addressed.

For a serious game to have construct validity, the training interventions that they present must have been designed with emphasis on the creation of internal and external validity—what we model, how we measure it, and how it is presented in a game:

  • External validity: the ability to generalize in-game learning to other contexts.  To what extent can a training effect from a game be generalized to other populations (population validity), other settings (ecological validity), other treatment variables, and other measurement variables?
  • Internal validity: examines whether the adequacy of the study design, or in this case of the game, that the intervention was the only possible cause of a change in the players learning.

To do this, serious game development requires valid concepts for modeling, implementation, and assessment of what is to be learned, as well as how it will be measured outside the game. This is essential for ROI (return on investment) analysis. Serious game development requires research and construct validity to conduct ROI and to avoid the Vegas Effect. Learning that happens in games should not stay in games.

 

 

Leaving Las Vegas:

I have come across few if any games that have been designed with the kind of careful attention to research methodology that would be expected when measuring learning, intelligence, personality, or depression. Methods that ensure construct validity are expected in the field of psychometrics and the learning sciences, and may soon emerge as standard practice in serious game design.

Games are often designed to have surface validity. This means that the game APPEARS to measure what it is supposed to measure. Surface level validity is a useful beginning, but should only be considered a step towards having a valid assessment. It should be considered a gamble to build a serious game on surface validity. Designing a serious game on surface validity increases the likelihood of the Vegas Effect.

Until we do this, we may be just spinning our oatmeal.

How do we structure learning in a game that leads to measurable improvement outside of the game?

Currently, many games are not required to prove that the game play leads to learning. All we have to prove is that the information was delivered. The onus is on the learner.  The learner is given the information, and they must act to learn and integrate. If not, they may lose their job or be reprimanded.

Folks who pay for serious games know that they are getting Return On Investment because of the savings in the platform.  Because of this, they do not examine the efficacy of the learning structured in the game. Why should they?

If they are delivering training to thousands, or even hundreds of employees, they have already saved money just through the delivery model.

It is not the same for educational games. Educational games are supposed to deliver learning. If we are going to say that games are more effective than traditional classroom learning, shouldn’t we have evidence to support that statement?

Games often have the same problem as teachers.

Learning may not transfer to improved performance outside of a lesson or a game. Maybe learning takes place, but it is incidental learning. Incidental learning can be useful.  An educator can leverage this as prior knowledge to build upon; They can debrief and ask the learner to draw from the tacit knowledge from game play to develop and form concepts. (read more )

But hypotheticals and incidental learning do not create comprehension. The process of taking content into concept has been called transduction, where sensorimotor stimulation memory is transduced for conceptual knowledge from perceptual knowledge. We assign meaning. According to embodiment theorists, this is done through assigning causation and other meaningful content to objects and creating conceptual categories. Take for example the concept “containment”. We may look at many different objects that contain, and then create a term like “containment”. This term must be learned from looking at many examples.

Luckily, we do have a fairly good description of how this happens behaviorally, and cognitively. But neuroscience, is still trying to parse it out topographically in the brain. But is is a good idea to study other learning games to for delivering this? Why not use the theory and apply it in a relevant way, rather than just implementing a copycat mechanic?

By studying games, it seems like we are back to a behaviorist paradigms like operant conditioning. Where the brain does not exist. Where we fixate on the object and response rather than ideas about we respond the way we do.Without a model of learning (behavior that identifies conceptualization/comprehension) we may struggle mightily to measure the learning we hope to deliver. What does learning look like?  What structures in a game deliver it? We cannot really put the game first can we?

If a game maker hopes to create a game that can diagnose or treat schizophrenia, they must have a robust model of how to measure schizophrenia, and measure “not-schizophrenia.” There must be convergent and divergent validity in the measures. What is and what is not schizophrenia.The same goes for problem solving and comprehension (learning).

How will we get this from studying a game? I am earnest in asking this.

Games can deliver many outcomes depending upon the users expectations, goals, and prior experience, and whether these align with the developer and instructor.

As an example: a child can be given a banana, turn around and make a long distance call with it, stop a bank robbery, and then have a snack with it. The use and purpose may be different depending upon the goals of the user. If the developer limits goals, they have restricted play, and diminished the degree to which it might be called a game.

So, the study of games for learning outcomes would be a task that would seemingly require specificity for each game and each purpose.

How would we create a general framework for that?

My own position is that games can be repurposed for learning. I did this with middle school students. This led to documented statistical significant improvement in standardized test scores in reading. But this was about what I put around the games. (Read more ). However, the way that I was able to show causation was because of how I surrounded the games with instructional objectives. The games can teach anything, just like a banana. It is how the games were framed with curriculum that lead to  gains, not just playing games. Sure, my 7-year old plays games where he must read, and this helps his reading skill, but can I attribute his improvement to these games? He is also reading books, comics, and everything else in the environment. A game developer who wants to make a claim about improving some skill or concept should have identified criteria for how and why this should happen. Otherwise, it is just incidental.

In my own classrooms I created a curriculum that emphasized the conceptual criteria from the standards and assessments the students were to become proficient with:

Here is the curriculum

Here is a story about what we were doing

 

Serious games need this same attention to detail in what and how content is being learned. One cannot give a child a pet and expect they will gain the expertise of of a zoologist, nor can we have a child play a game and expect that they will learn the concepts of science. These concepts need to be thoughtfully embedded in a game, just as game designers are thoughtful in how they twist player expectations for fun and exciting game play. Games are both assessments and instructional interventions. They are the epitome of Informative Assessments. (Read More), and they need to measure what they say they measure.

When I taught 120 engineering students to reverse engineer games in a 6 week curriculum project, many of the students claimed they learned nothing. That they had played games, described games, but they were not learning engineering. It was not until we talked over the criteria from the rubrics that they made the connection to the engineering concepts. To learn specific information, there must be specific cues.

Thus, we need to have a an operational description of what learning looks like when it happens. We need this so we can design the game with mechanics that incorporate the criteria from the assessments in multiple methods and multiple traits. In 1954 Meehl and Crohnbach described this issue of construct validity a nomological network. Campbell and Fisk built on this and created the Multio Trait Multi Method Matrix. This is the MTMM model.

Often game designers say that this kills games. That it structures the fun out. Well, this is true if you cannot imaginatively integrate learning metrics into engaging game narratives and mechanics. The same is said about killing learning with assessment.Well, get better at game design, and get better at structuring assessments. This seems more like a crisis of imagination and implementation rather than oppressive testing and game design.So, if you are going to measure the effects of game on depression, you need to have criteria measures and methods for construct validity.

When we diagnose, we need a model of depression. we have the DSM-V for this. The same can be said about learning. What and how? We have lots and lots of models for this. In fact, this approach may extend research in learning. Designing games for learning is comparable to hypothesis testing.

We need models to apply to the games in both analysis and development. Blooms Taxonomy will not cut it any more. It was never meant for this any way if you read Bloom’s book. Bloom gave us general categories to begin thinking about learning process, and how to structure it. But his ideas did not lead to empirical validation of those principals. Learning games may need evidence of transfer. That learning which happens in games, does not stay in games. So we need actual psychometric measures, as well as some imagination in delivering them as game mechanics and plot twists.