Teaching and Learning Forum 2008 Home Page

Category: Research
Teaching and Learning Forum 2008 [ Refereed papers ]
Taking baby steps: The impact of test length on first year student engagement with online formative assessments in human biology

Julie Hill, Jan Meyer, Kathy Sanders
School of Anatomy & Human Biology, The University of Western Australia
Georgina Fyfe
School of Biomedical Sciences, Curtin University of Technology
Sue Fyfe
School of Public Health, Curtin University of Technology
Mel Ziman
School of Exercise, Biomedical and Health Science, Edith Cowan University
Nicole Koehler
School of Anatomy & Human Biology, The University of Western Australia

This paper investigates the impact of test length on student engagement with an online feedback-enhanced formative assessment exercise. A 30 item version of the exercise had been shown to be effective in enhancing learning, but engaged relatively few students from the lower end of the class distribution and retained the attention of relatively few male students. Comments in a questionnaire survey and a fall-off in commitment to coherent responses across the survey lead us to suspect that one barrier to effective use of the formative exercise arose from the limited capacity of some students for sustained concentration and effort which was not compelled or directly rewarded. We therefore trialled a shorter 10 item test, looking to see not only whether it improved participation rates, but that the trade-off between time and content did not significantly compromise efficacy. The participation rate for the shorter test was 40% greater than for the long test, the difference being greatest amongst low achieving students. Failure to complete tests was practically eliminated. The greater number of times students, especially males, used the shorter test more than compensated for the decrease in content at each exposure. There was no loss in efficacy and weaker students benefited the most, in contrast to the situation with the longer test where the high fliers displayed the greatest learning gains. While female students gained the most individually from practice with the shorter tests, the enhanced participation and repetition rates of male students meant that they benefited most as a sector.


Introduction

It has been well established that participation in formative online tests can enhance performance in related summative assessments (Black & Wiliam, 1998; Sly, 1999; Gemmiti, 2003 ). We (Fyfe et al., 2007) have previously shown that provision of immediate, explanatory feedback for online formative and summative tests enhances the effects of such practice.

Like many other educators (e.g. Henly, 2003; Fill & Brailsford, 2005), however, we also found that women (Sanders et al., 2007) and more able students (Meyer et al., 2007) are significantly more likely to take advantage of the opportunies offered by such voluntary formative assessment exercises.

Since it has also been claimed (Fuchs et al., 1997) that it is low achievers who gain most by the increased effort associated with the completion of feedback-enriched tasks, increasing the participation of this group in voluntary online formative assessments would seem to be an important step towards enhancing their learning.

Relative high (20%) rates of failure to complete a 30 item formative test, particularly amongst males; the splitting of tests across 2-3 sessions (10%); patterns of fall-off of meaningful response to questions in a paper-based questionnaire (Figure 1) and the frequency of comments in an online test-evaluation survey such as

it was good, but very very 'wordy,' it took a lot of concentrating for thirty questions.
2 many questions in one quiz - got distracted
Also this was a very long test for a computer test after a while at staring at the computer you start to lose concentration.
The test was too long and the stark white background hurts your eyes after a while.
all suggested that the length of the task was a factor discouraging vulnerable students from engagement with the formative exercise.

Figure 1

Figure 1: Variation in rates of fall-off in meaningful response across a 12 question survey with age
[Responses were identified as indiscriminant or meaningless if the whole 14 items
in a question plus the header question ("experience") were all the same.]

This presentation reports on a trial comparison of the effects of short (10 item) and longer (30 item) versions of online formative feedback-enhanced tests on student engagement and learning.

Methods

The trial was conducted over two semesters on students in the large (500+) first year Human Biology units at The University of Western Australia.

As the first, longer version of the test had been administered in the second semester of a year, both test topic and student cohort also differed. Forty two percent of students in the trial of the longer 30 item version of the test were male and 38% in the trial of the shorter 10 item version. Approximately 85% of our students are between 16 and 18 years of age every year and the majority (60-79%) are in some form of paid employment. In our experience cohort quality in a unit of this size varies little from year to year. Thirty one percent of the first cohort went on to achieve Fail-Pass final unit grades, 29% Credits and 40% Distinction-High Distinctions (mean 65.5%) while in the second cohort the equivalent figures were 31%, 30% and 39% respectively (mean 64%). Both test topics were areas previously found to be of particular conceptual difficulty to first year Human Biology students, neuroscience in the case of the 30 item test and evolution in the case of the 10 item test.

Open, 24 hour access to both the 30 item and 10 item online formative tests with feedback comments was provided for a period of one week in each case. Summative assessment of these topics included a later online test based on random subsets of 15 questions from the formative assessment question base and a smaller set of independently derived paper-based MCQs in the final examination. The online summative assessment contributed 2% to the final grade for the unit, whilst the MCQs on the same topic in the final exam contributed 3%. Thus MCQ assessment for the topic contributed at most 5% to the final grade and final grade can be considered an independent indicator of quality.

Students' involvement with the feedback-enhanced online formative tests was determined from the inbuilt tracking functions of the web platform, WebCT. Retrospective linkage of individual student performance in summative assessments to engagement with the formative tests was possible after the Board of Examiners.

The question bases for each test were of similar size, about 250 questions in all. Each question was composed of a stem and 5 options (correct answer and four distractors), with explanatory feedback comments for each option whether correct or incorrect. Questions were grouped into either 30 or 10 subtopic areas (pools) with one test questions randomly drawn from each for each test attempt. That is, for the 30 item test each question came from a pool of 5-15 possibilities whilst in the 10 item test each question came from a pool of 15-50 possibilities.

The formative feedback-enhanced tests were presented on WebCT (password protected). Students could access the tests at any time of the day, any number of times, for as long as they wished. They had the opportunity to take a break in the middle of a test and come back to it. Students were unable to review or change answers within a single test.

The feedback provided was informed by the following guidelines, developed on the basis of surveys of student attributes, experience and attitudes to feedback, analyses of error patterns in previous year's MCQ examinations and the intended course outcome addressed by each question. It was determined that each feedback comment should

Genstat (Ninth Edition) was used for all data analysis with Excel employed for some graphics. The significance of variations in participation rates was assessed using Chi-square Goodness of Fit tests. Efficacy was compared using Analyses of Variance in the number of feedback-related items correctly answered in the final examination relative to the number expected from the students' overall unit mark derived from essay assignments, lab quizzes, practical examinations and examination essays, short answer questions and MCQs (i.e with final unit mark as a covariate). The chief index of effectiveness in promoting learning in the area was the difference in feedback area MCQ scores relative to final grade between groups using the formative tests to different extents.

Results

The overall level of participation increased significantly with the reduction in test length from 30 to 10 items (Table 1). The table displays the proportion of the cohort completing each type of online test at least once. The 10 item test was completed at least once by 40% percent more females and 43% more males than the 30 item test. The equivalent increases in participation for students at the lower end of the class distribution (end of unit grades of Fail-Pass and Credit) were significantly greater (53.5% and 58% respectively) than those for the high achievers (25% increase for Distinction-High Distinction students) (Table 1 and Figure 2). In Figure 2 the bars represent the proportion of students achieving each grade level over the whole unit completing each type of test at least once.

Table 1: Participation in formative feedback-enhanced online MCQ tests

30 item test10 item testSignificance Chi-squared
overall32%74%Chi-squared = 173.84, df 1, p<0.001
males29%72%Chi-squared <1, NSD (between sexes)
females35%75%
Fail-Pass12.5%66%Chi-squared = 43.51, df 4, p<0.001 (between grades)
Credit27%85%
Dist-HDist51%76%

Figure 2

Figure 2: Participation in formative feedback-enhanced online tests according to overall standing in class

Decreasing test length practically eliminated the tendency for students to quit tests before finishing them. The rate of failure to complete tests once started fell from 28% to 3% overall with decrease in length of the test from 30 to 10 items (32% to 4% for males, 26% to 2% for females; 26% to 6% for Fail-Pass students, 34% to 2% for Credit students and 25% to 1% for Distinction - High Distinction students) (Figure 3).

Figure 3

Figure 3: The relative rates of failure to start, failure to complete and at achievement of
at least one test completion for 30 item and 10 item formative feedback-enhanced online MCQ tests

The decrease in the number of questions to which students were exposed as a result of shortening the test from 30 to 10 items was more than compensated for by an increase in the number of times the shorter test was taken (Figure 4).

Figure 4

Figure 4: The average number of completed test attempts by length of test

Males were more responsive to the difference in test length than females in this respect. On average males completed the 10 item test 5 times more than they did the 30 item version, whilst females completed it 3.9 times more than the longer version. As a consequence, males completed the shorter version of the test significantly more times than did females ('a' compared with 'b' on Figure 5). There was no significant difference in the number of times the short and long tests were taken by high and low achieving students at either test length (Figure 6).

Figure 5

Figure 5: Variations in the number of completed test attempts with length of test and gender
Means +/- standard errors

Figure 6

Figure 6: Variations in the number of completed test attempts with length of test and final unit grade
Means +/- standard errors

The proportion of students taking tests more than once rose from 4% to 53% overall with the reduction in test length (males 5% to 53%, females 3% to 54%; Fail-Pass 1% to 37.5%, Credit 1% to 53%, Dist-High Distinction 8% to 66.5%).

Approximately one quarter (23%) of the class completed the 10 item test more than 5 times, while none did so for the 30 item test (Figure 7).

Figure 7

Figure 7: Number of test completions by length of test

Completion of a feedback-enhanced test was associated with better performance on the feedback-related MCQs in the final exam than would be expected from the overall unit mark for both short and long tests (Figure 8), but more students received this benefit with the 10 item test. Completing the feedback-enhanced tests more than once conferred greater benefit (Figure 9).

Figure 8

Figure 8: MCQ scores on the feedback-related areas of the final paper-based examination
relative to overall unit score completion of at least one test compared with none by test length
Means +/- standard errors

Figure 9

Figure 9: MCQ scores on the feedback-related areas of the final paper
based examination relative to overall unit score multiple compared with
single completions for students who completed at least one test
Means +/- standard errors

For example, students who completed the original 30 item test between two and five times (60- 250 questions) obtained significantly higher marks on the feedback-related MCQ questions relative to their overall level of performance in the unit than did those not completing a test (a compared with b, LSD, p<0.05, Figure 10). In the same way those completing the 10 item test more than five times (>50 questions) fared significantly better on the parts of the final paper related to the feedback area than those who did not complete that test at all (a' cf b' LSD, p<0.05, Figure 10).

Figure 10

Figure 10: Performance on feedback-related MCQ questions in final exam in relation to
overall standing in unit according to degree of engagement with feedback-enhanced tests
Means +/- standard errors

The benefit gained from test participation for women is significantly greater than that for men for both types of test in terms of the difference in feedback-related MCQ scores in the final exam relative to final unit mark for those who use compared with those who do not use the formative test system (Figure 11).

Figure 11

Figure 11: The impact of test participation on final exam feedback
related MCQ relative to final unit grade by sex and test length
Mean +/- standard error. b is significantly greater than a and b' than a' (LSD p<0.05)

Taken together, however, the effects of decreasing test length on participation rates, number of test completions and the associated benefits gained make a significantly greater difference to male performance in areas related to online formative assessment tasks enriched by immediate explanatory feedback (Figure 12).

Figure 12

Figure 12: Overall impact of test length on male and female performance in feedback-related MCQs in the final examinations relative to final unit mark (ie the product of the number of formative test participants, the number of times each repeats the test and the degree of benefit each obtains at that level of repetition) by sex
Mean +/- standard errors. a (male 10 item test average benefit) is significantly greater than b (female equivalent).

While relatively high achievers obtained the greatest benefit from practising with the 30 item version of the feedback-enhanced test (b compared with a, Figure 13, LSD p<0.05), the lowest achievers benefited most from practice with the shorter 10 item version (b' compared with a', Figure 13, LSD p<0.05).

Figure 13

Figure 13: Overall impact of test length on performance in feedback-related MCQs in the final examinations by final unit grade (ie the product of the number of formative test participants from each grade group, the number of times each repeats the test and the degree of benefit each obtains at that level of repetition). Mean +/- standard errors. b (the performance in feedback-related MCQ questions in the final examination by High Distinction students who have used the formative test) is significantly greater than a (the equivalent for those who have not). Similarly b' (the feedback-related MCQ score for Fail grade students who have used the 10 item formative test) is significantly higher than a' (the score of those who did not use the test)

Discussion and conclusions

Our original 30 item online formative test provided further confirmation of the claim by Black & Wiliam (1998) that feedback which provides a clear understanding of what is wrong and how to put it right can benefit learning. It also provided support for Sly's (1999) contention that practice with formative tests can lead to better performance in summative assessment. It provided support for Ebbinghaus' observation way back in 1885, that distributed repetition of practice is more effective for learning than massing practice into a single session, applies as well now in the automated online environment as it did then. Unfortunately, it also provided support for Henly's (2003) and Fill and Brailsford's (2005) findings that the very students who might, according to Black and Wiliam, most benefit from the type of feedback we were providing were least likely to take advantage of it. Our Distinction and High Distinction students too were more likely to make use of the feedback-enhanced formative tests, to do them more often and to benefited more from doing them than Fail and Pass students.

The success in this study of the shorter (10 item) test format in increasing the recruitment and retention of low achieving students confirmed indications that some of the difficulty they experienced with engagement with the original 30 item test shown was likely to have arisen from the daunting (to them) length of the test. Black and Wiliam argue that pupils who lack confidence and encounter difficulties avoid investing effort in in learning that can lead only to disappointment. The shorter feedback-enhanced formative tests reduced the time and energy investment required to obtain a learning return. The enhancement in the performance of such students in areas of a summative examination related to the formative test confirmed the value of formative feed-enhanced test practice to them, if they can be induced to participate.

The success of the trial in extending the benefits of formative test practice with feedback to traditionally avoidant students was accompanied by very few direct disadvantages. It would appear, as Black and Wiliam (1998) claimed, that "It is better to have frequent short tests than infrequent long ones". Indeed, the greater capacity of the 10 item test to gain and hold students' attention is, perhaps just what would be expected from Johnstone and Percival's (1976) oft-cited demonstration of the difficulty of holding the attention of adults on one concentrated task for much more than 15 minutes at all.

Nevertheless, university learning is by its nature an activity which requires prolonged periods of sustained concentration in the face of difficulties and without immediate reward. While short formative exercises may increase student participation and the experience of success, which is itself held by Black and Wiliam to underlie motivation and engagement with learning, they do little to develop readiness for the more onerous aspects of university life.

Having demonstrated the effectiveness of short online formative tests with immediate automated delivery of explanatory feedback it is our intention to present formative exercises to first year students in the short format at the beginning of semester, then to provide the same tests in 20 to 50 item formats and to encourage progression to the more sustained tasks.

The demonstration of the effects of test length on performance in the formative setting raises the question of its effects in summative testing. Few of the online summative tests we surveyed were under 20 items long and several were longer than 50 items. If the demotivating effect of longer tests on the engagement of less confident students seen in the formative situation applies as well to summative assessments, they might in themselves disadvantage and act as further deterrents to engagement with learning for such students. Examination of rates of responding and rates of responding correctly across larger online summative tests should reveal the extent of any such effect for students from different part of the class distribution.

Acknowledgement

Support for this study has been provided by The Carrick Institute for Learning and Teaching in Higher Education Ltd, an initiative of the Australian Government Department of Education, Science and Training. The views expressed in this presentation do not necessarily reflect the views of The Carrick Institute for Learning and Teaching in Higher Education.

References

Black, P. & Wiliam, D. (1998). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2), 139-148. http://blog.discoveryeducation.com/assessment/files/2009/02/blackbox_article.pdf

Ebbinhaus, H. (1885). Memory. A contribution to experimental psychology. Cited by Wozniak, R. H. (1999). Classics in Psychology, 1855-1914: Historical Essays. Bristol, UK: Thoemmes Press. Reprinted in Green, C.D Classics in the history of psychology http://psychclassics.yorku.ca/Ebbinghaus/wozniak.htm#f3

Fill, K. & Brailsford, S. (2005). Investigating gender bias in formative and summative CAA. In 9th International Computer Assisted Assessment Conference. Loughborough, UK, Loughborough University, 263-272. http://eprints.soton.ac.uk/16256/

Fuchs, et al. (1997). Effects of task-focussed goals on low-achieving students with and without learning disabilities. American Educational Research Journal, 34(3), 513-543.

Fyfe, S., Meyer, J., Fyfe, G., Ziman, M., Plastow, K., Sanders, K. & JHill, J. (2007). Yes! On-line test Feedback Works for Learning in Human Biology. e-learning Symposium, RMIT, December 2007. http://ls7.cgpublisher.com/proposals/40/index_html

Gemmiti, F. (2003). Computer based learning in first year human biology: Students' use, evaluation and the value of practice quizzes and graded tests. In Partners in Learning. Proceedings of the 12th Annual Teaching Learning Forum, 11-12 February 2003. Perth: Edith Cowan University. http://lsn.curtin.edu.au/tlf/tlf2003/abstracts/gemmiti-abs.html

Genstat Ninth Edition (Version 9.1.0.147). VSN International Ltd. http://www.vsni.co.uk/products/genstat/

Henly, D.C. (2003). Use of web-based formative assessment to support student learning in a metabolism/nutrition unit. European Journal of Dental Education, 7, 116-122

Johnstone, A.H. & Percival, F. (1976). Attention breaks in lectures. Education in Chemistry, 13, 49-50.

Meyer, et al. (2007). Implications of patterns of use of freely- available online formative tests for online summative tasks. Proceedings of the 11th CAA conference, Loughborough, UK, July 2007 http://www.caaconference.com/ pastConferences/2007/proceedings/Meyer%20J%20Ziman%20M%20Fyfe%20S%20et%20al%20r2_formatted.pdf

Sanders, K., Hill, J., Meyer, J., Fyfe, G., Fyfe, S., Ziman, M. & Koehler, N. (2007). Gender and engagement in automated online test feedback in first year human biology. In ICT: Providing choices for learners and learning. Proceedings ascilite Singapore 2007. http://www.ascilite.org.au/conferences/singapore07/procs/sanders-poster.pdf

Sly, L. (1999). Practice tests as formative assessment improve student performance on computer-managed learning assessments. Assessment & Evaluation in Higher Education, 24(3), 339-343.

Authors: Julie Hill, Jan Meyer, Kathy Sanders
School of Anatomy & Human Biology, The University of Western Australia
Georgina Fyfe, School of Biomedical Sciences, Curtin University of Technology
Sue Fyfe, School of Public Health, Curtin University of Technology
Mel Ziman, School Exercise, Biomedical and Health Science, Edith Cowan University
Nicole Koehler, School of Anatomy & Human Biology, The University of Western Australia

Author for correspondence: Julie Hill has taught first year Human Biology at a number of Western Australian Universities over the last two decades. For the last five years she has been principal coordinator of the large (500+) first year Human Biology course at the University of Western Australia which was twice finalist in the AUTC Excellence in Teaching Awards and part of the Carrick Project team investigating the impact of immediate automated feedback for online MCQ tests. Email jhill@anhb.uwa.edu.au

Please cite as: Hill, J., Meyer, J., Sanders, K., Fyfe, G., Fyfe, S., Ziman, M. & Koehler, N. (2008). Taking baby steps: The impact of test length on first year student engagement with online formative assessments in human biology. In Preparing for the graduate of 2015. Proceedings of the 17th Annual Teaching Learning Forum, 30-31 January 2008. Perth: Curtin University of Technology. http://otl.curtin.edu.au/tlf/tlf2008/refereed/hill.html

Copyright 2008 Julie Hill, Jan Meyer, Kathy Sanders, Georgina Fyfe, Sue Fyfe, Mel Ziman and Nicole Koehler. The authors assign to the TL Forum and not for profit educational institutions a non-exclusive licence to reproduce this article for personal use or for institutional teaching and learning purposes, in any format (including website mirrors), provided that the article is used and cited in accordance with the usual academic conventions.


[ Refereed papers ] [ Contents - All Presentations ] [ Home Page ]
This URL: http://otl.curtin.edu.au/tlf/tlf2008/refereed/hill.html
Created 20 Jan 2008. Last revision: 20 Jan 2008.