ELearning/Test and review
Testing and evaluation is the final quality check before the course is declared ready for use. Used appropriately, it is also the path toward continuous improvement. For our purposes here, testing refers to ensuring that the technology (navigation, buttons, hyperlinks, embedded media, etc.) works as intended. Evaluation refers to examining the qualitative course design and its ability to achieve stated learning objectives or outcomes. Most checklists include both elements, often emphasizing one or the other.
Usability Testing
From a technology perspective, a course is a piece of software - courseware - with many subcomponents, and so should be approached as such. Software testing is the process of validating and verifying that a computer program/application/product:
- meets the requirements that guided its design and development,
- works as expected,
- can be implemented with the same characteristics,
- and satisfies the needs of stakeholders.
From Wikipedia, "Software testing, depending on the testing method employed, can be implemented at any time in the software development process. Traditionally most of the test effort occurs after the requirements have been defined and the coding process has been completed, but in Agile approaches most of the test effort is on-going. As such, the methodology of the test is governed by the chosen software development methodology." We recommend the Wikipedia article for its succinct and informative overview of the software testing process. Using our classification of assessment as an analogy, we can also consider ongoing testing as "formative," with the purpose of improvement, and "summative" testing, with the purpose of evaluation and a go, no-go decision.
Usability testing refers to evaluating a product or service by testing it with representative users (learners in our case). Learners want easy navigation, clear instructions, consistent fonts, readability, "defined image resolutions, a smooth playout of audio and video, and an upper boundary on response time for interactive applications" (Codone, 2001). Characteristically, during a test participants will try to complete typical tasks while observers watch, listen and takes notes. Alternatively, users may be provided diary and evaluation checklists to complete. The goal is to identify any usability problems, collect qualitative and quantitative data and determine the participant's satisfaction with the product. Note how Figure 1 is not what usability testing is all about. Usability is always about the user experience and any problems are considered shortcomings of the software.
First priority is given to testing the application's usability, referred to as blackbox testing, rather than the complexity of the application's internal workings (whitebox testing). Regardless of the quality of the underlying code, if the user interface doesn't work for users, the result is an inefficient and frustrating experience. Blackbox testing requires no programming experience, and also points to where whitebox testing and remediation by technical professionals should focus.
Usability testing typically covers:
- Screen elements (text, graphics, animation, video)
- Interface elements (navigation, instructions, other resources like glossaries and maps)
- Graphic sizing and quality
- Animation timing and other effects
- Instructional text (directions on menus, assignments, tests)
- Accessibility (section 508 compliance)
- Button and other navigational functionality
- Functional communication with browsers (especially Chrome, Internet Explorer, Firefox, and Safari)
- Functional communication with the LMS (SCORM, TinCan)
- Latency (delay) issues
- Data loss of audio and video (dropped frames or words, pixilation)
How many test users? 
According to Nielsen (2012, 2000), elaborate tests are a waste of resources. The best results come from qualitative testing with no more than five users for any single test. His advice is to spread users among multiple iterations rather than conduct one large test. For larger projects and hundreds of target users, best results come from five users for qualitative methods, 20 for quantitative, 15 for card sorting, and 39 for eyetracking methods. Always use more than one user for any test to avoid spurious results.
In some cases, revisions create new errors - cascading errors - in programmed components (Codone, 2001). For this reason, all revisions must be followed by additional testing.
Larger production organizations also need to add regular auditing activities staffed separately and outside of production management structure (Codone, 2001):
- Defect identification
- Revision prioritization
- Defect trend analysis
- Root cause analysis
- Correction and recurrence control
Editorial review
Editorial review originates in the publishing world, focusing primarily on written text - manuscripts. This review seeks to improve readability, improve clarity, and ensure factual accuracy. While not generally as formalized in education and training as it is in publishing, it remains an important aspect of course quality. The Mayfield Handbook of Technical & Scientific Writing recommends the following steps in the process:
- Read the draft for content: coverage and organization. Read the draft all the way through before you start to make suggestions for adding or rearranging material, reordering paragraphs, or recasting sentences. Get a firm grasp of the author's purpose, problem statement, audience, and organization. 
- Make marginal notes. If you have to slow down in your reading or have to reread a section, mark it for revision. Make marginal notes of sections that are vague, awkward, inconsistent, or poorly supported. Note any grammatical or stylistic problems as you read along. Note whether or not new or unusual terms are defined upon their first appearance. 
- Place potential problems in context. Reread each area you marked in the first reading. Place the problem in the context of the audience, the reader's purpose, and the rules of grammar and style. 
- Write down your recommendations. Make written suggestions in the margins or on a separate sheet of paper. Identify - lack of clarity of purpose and problem
- material inappropriate for a given audience (e.g., target proficiency level)
- appropriate use of person (i.e., first, second, third) Some organizations require that text be written in third person.
- weak organization
- overall document organization
- format inconsistencies
- paragraph structure
- grammatical errors
- stylistic weakness
- terminology definitions upon first appearance
 
- Read for punctuation and mechanics. Note patterns of misused punctuation, mechanics, and spelling, as well as any misuse of units, acronyms, citations, or numbering of pages, sections, graphics, or equations. 
We highly recommend the Mayfield Handbook as a general resource. Bookmark the site!
Quality evaluation
Evaluation is the process of quality assessment. Quality requirements were examined in the Foundations module, and we will here review the process of their assessment within completed courses. Quality review examines the makeup of the course compared to a set of evidence-based standards. In other words, research has established to a high degree of confidence that compliance to the standards results in quality courseware. Keep in mind, however, that course design is only one factor to affect the quality of a course, others being course delivery, content, learning management system, support structures, environment, instructor and learner readiness. Examination of components alone assumes that if they together meet established standards, the desired outcomes will come. However, we are not able to make this assumption in the real world, and so quality evaluation is only the first step in the review process. Results must also be evaluated.
Assessment tools
We examined five quality assessment instruments, below, all of which examine course components. Each more or less includes a standardized process, ranging from informal (Chico) to highly formalized (Quality Matters). Please review each in the Quality requirements article.
- e3 evaluation
- Chico Rubric for Online Instruction
- Blackboard Exemplary Course Rubric
- Quality Matters Rubric - Abbreviated
- UNLV Hybrid Rubric
Reviewer qualifications and number
The qualifications and number of reviewers is specified for the formal Quality Matters and Blackboard Exemplary Course programs, although these and all assessment tools can be used informally or quasi-formally. Quality Matters requires that reviewers be experienced online faculty members or, in the case of professional and continuing education, instructional designers who have completed two QM training courses. Three reviewers are required. Blackboard hosts an annual Catalyst Awards for exemplary courses built on Blackboard software. Submissions are judged by individual peer reviewers, a reviewer council, and program directors who make the final decision. At the least, then, submissions are reviewed by three judges and likely more.
Educational institutions and private businesses create their own review requirements. Some examples:
- No formal requirements, so it is left to the instructor to conduct his or her own review
- Shared responsibility between the instructor and instructional designer
- Shared responsibility among subject matter expert, instructional designer, lead instructional designer, and customer representative
- Evaluation by the paying customer, generally in the form of contracted deliverables
- Membership and participation in the Quality Matters program
Results evaluation
Even when courses and programs have been extensively vetted prior to delivery, the question remains: Were the desired results achieved? To answer that question, we have to examine the actual results.
The Kirkpatrick model (1994) of training evaluation, introduced in the 1950s, is the most recognized in industry. It includes four levels of measurement: reaction, learning, behavior, and results. Phillips (1997) added a fifth level, return on investment (ROI) and altered some of the terminology, so we now have the Kirkpatrick-Phillips model of training evaluation. As we see in Figure 2, each level is measured in specific ways:
- Satisfaction: Evaluation forms (often referred to as "smile sheets), encouraging written comments (immediately following completion or delayed 1-4 weeks)
- Learning: Pre- and post- knowledge and skills tests, simulations, attitude surveys, and other assessment tools (before and during or immediately after training).
- Application: Behavioral questionnaires and/or observational sampling (delayed 3-6 months).
- Impact: Business output, quality, costs, and time measures (before training and 6-12 months after).
- ROI: Comparing the total program costs against monetary benefits from the program.
Criticisms
Three primary criticisms, or limitations, have been argued against the Kirkpatrick-Phillips model (Bates, 2004). First, the model is incomplete, presenting an oversimplified view of training effectiveness. "A broad stream of research has documented a wide range of organization, work environment, and individual characteristics that crucially impact the implementation of learning." For example, if organizational policies do not allow for changes in processes, training contrary to these policies will be ignored by the organization. Likewise, if a manager tells his or her subordinates, "We don't do it that way around here," all gains will be lost.
Second, the model assumes a causal chain of impact: "positive reactions lead to greater learning, which produces greater transfer and subsequently better organizational results." Research has largely failed to confirm these causal links, or even correlation between the different levels. We can note that other research finds correlation between levels 1 and 2, and levels 3 and 4, but not between levels 2 and 3 (Kirkpatrick & Kirkpatrick, 2009). Third, there is an assumption that higher levels of measurement provide more useful information about training's effectiveness - again unsupported by the research. Both of these later criticisms build on the first, the fact that there are so many intervening variables that it is exceedingly difficult to prove causality.
Integrated approaches
The problem with evaluating results alone is that the metrics may or may not measure the actual quality or impact of training; the assumption is unsupported by the evidence. Approaches are needed that integrate training with the entire business process. We move beyond training and education and into consulting and organizational development. We briefly describe two such approaches.
Kirkpatrick business partnership model (BPM)
- The end is the beginning. "For decades, practitioners have attempted to apply the four levels after a program has been developed and delivered. It is difficult, if not impossible, to create significant training value that way."
- Return on expectations, ROE not ROI, is the ultimate indicator of value. Learning professionals need to negotiate key business stakeholder expectations. From there, they need to convert those generic expectations into observable, measurable success outcomes by asking, "What will success look like to you?" These success indicators become level 4 outcomes.
- Business partnership is necessary to bring about positive ROE. "We do not believe that training events in and of themselves deliver positive, bottom line outcomes. Much has to happen before and after training."
- Value must be created before it can be demonstrated. This means that getting the learning implemented in the business cannot be assumed, but must be approached systematically with the business units in the form of policy and procedure changes, follow-up and coaching.
- A compelling chain of evidence demonstrates training's bottom line value. A chain of evidence consists of data and information that sequentially connects the four levels and shows the contribution learning has made to the business. "We do not believe in attempting to isolate the impact of training in order to prove our value. Instead, we advocate presenting a chain of evidence that illustrates the value of the entire business partnership effort."
Figure 3 presents a visual representation of the model consisting of (top to bottom) planning, execution, and demonstration of value stages.
Logic modelling
Widely used in the non-profit sector, logic modelling provides a framework for any type of program by explicitly linking activities and processes with short- and long-term outcomes.
| 4. Logic Modelling process | 
|---|
| 4. Logic Modelling process | 
Among others, the W.K. Kellogg Foundation (2004) requires funded organizations to use the model in a number of ways, discussed below. "Good evaluation reflects clear thinking and responsible program management. Over the years, our experience with the model has provided ample evidence of the effectiveness of these methods."
How to "read" a logic model
Reading a logic model means following a chain of reasoning or "if...then..." statements that connect a program's parts. Following along with Figure 4, the model reads:
- If you have access to the necessary resources/inputs,
- then you can use them to accomplish your planned activities, and if you accomplish your planned activities,
- then you will (hopefully) deliver the amount of product and/or service you intended, and if you accomplished your planned activities to the extent you intended,
- then your participants will benefit in the ways you planned, and if these benefits to participants are achieved,
- then the desired changes in systems and the organization can be expected to occur.
Uses for the model
The purpose in using a logic model is to provide stakeholders with a road map describing the sequence of related events connecting the need for the program and the program's desired results.
During program design and planning, the model can be used to think through assumptions, and evaluate different approaches to solving the problem and meeting the need. This encourages stakeholders to examine best practice research and practitioner experience associated with each approach.
In program implementation, the model forms the core of a focused management plan including identifying and collecting the data needed to monitor and improve the program to accomplish short-term and long-term goals. The model is a work in progress that can be revisited and revised as results become known.
For program evaluation and reporting, the model presents program information and progress toward goals in ways that inform, advocate for a particular approach, and teach stakeholders what works and doesn't work. Large-scale impacts most frequently occur some time after the conclusion of the formal program.
Finally, the logic model argues for an inclusive approach to all stages of the program, involving all levels of stakeholders including management, implementers, and recipient participants.
A larger set of competencies
ASTD's (now ATD, the Association for Talent Development) Certified Professional in Learning and Performance (CPLP) tests for the knowledge and skill required for this extended approach to training and development. The International Society for Performance Improvement (ISPI) offers a similar certification, the Certified Performance Technologist (CPT).
| ASTD/ATD Competencies | ISPI Standards | 
|---|---|
| 
 | 
 | 
These are demanding skill sets. We strongly encourage those working in the private sector to look into these programs. In most business settings, the certifications are most applicable to consultant, management, supervision, and lead positions.
Conclusion
Testing and quality evaluation establish fitness for release of the course, courseware, or program to its intended audience. Results measurements establish whether the effort has had its intended impact, both in the short-term and long-term, and set the stage for continuous improvement. Integrated approaches best incorporate the fact that courses in and of themselves do not create real change. These efforts must be part of a larger effort to bring about lasting large-scale change.
⇑ ⇑ ⇑ Up to Home