|Teaching and Learning Forum 2013 [ Refereed papers ]|
Phil Hancock and Gillian Dale-Jones
The University of Western Australia
University of Technology, Sydney
Email: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
Judgment and (written) communication are two of the five learning standards identified by the Australian Learning and Teaching Council for Accounting Bachelor and Master Degrees. This paper reports the use of a custom-designed collaborative peer and self-assessment teaching initiative, embedded into the curriculum for postgraduate Introductory Financial Accounting. The study, which is modeled on the one used in the Achievement Matters: External peer review of accounting learning standards project, uses as its base, carefully selected exemplars, written by peers in a previous cohort. Students are required (twice) to assess the quality of the written communication in exemplars (using their judgment), which in turn generates informed debate on the expected standards of written communication and the application of an assessment grid. Assuming that students learn from this process, they then have an opportunity to apply their learning to their own written work.
The study is designed to capture any change in the quality of firstly, the students' judgment using the online tool Self and Peer Assessment Resource Kit (SPARKPLUS) and secondly, their written communication (using the outcome of an independent grading process). The results indicate a statistically significant improvement in the quality of student judgment in relation to the grammatical, structural and presentation components of written communication. In addition, the quality of their written communication shows a statistically significant improvement in all six components specified by the assessment grid. While there is insufficient evidence to draw any specific conclusions, our observations and student comments suggest that the teaching initiative contributed to this improvement.
Accounting was chosen as the first discipline in business and in December 2010 the Accounting learning standards were published by the ALTC (Hancock and Freeman 2011). It identifies five threshold learning standards, namely judgment, knowledge, application skills, communication and teamwork, and self-management.
Following the release of the accounting learning standards the Australian Business Deans Council (ABDC) provided seed funding for a project to assess whether accounting graduates met the learning standards. Achievement Matters: External peer review of accounting learning standards jointly led by Hancock and Freeman is seeking to collaboratively develop and implement a national model of expert peer review for benchmarking learning outcomes against the accounting learning standards (Freeman and Hancock 2011). The project uses the online tool Self and Peer Assessment Resource Kit (SPARKPLUS) for peer reviewers to record their decisions.
In this paper we report the use of a custom-designed process of collaborative peer and self-assessment, with the aid of an assessment-criteria grid in a postgraduate accounting unit at an Australian university. The process is modeled on the one used in the Achievement Matters project and SPARKPLUS is also used by the students to record their assessment decisions. The objective of this teaching initiative is to encourage relevant, cognitive gains for postgraduate accounting students within existing levels of resources.
Therefore, the learning standards selected for this study are written communication combined with judgment - it is important that students can judge what is acceptable written communication for professional accountants.
Why use peer learning? To quote Boud et al. (1999),
There are both pragmatic and principled reasons for the current focus on peer learning in university courses. The pragmatic reason is the most obvious. Financial pressure on university funding has led (sic) to staff teaching more students. This has prompted a search for teaching and learning strategies to enable staff to cope without increasing their overall load. Peer learning has considerable promise as it involves maintaining the level of student learning without more input from staff. (Boud et.al 1999, p.415)Hence the study examines the use of peer learning to develop and assess students' ability to exercise judgment and to improve written communication skills.
The benefits of the process of self and peer assessment have been well documented. Quite aside from its recognised ability to improve the operation and fairness of teamwork (Willey & Freeman, 2006b), it is acknowledged that it
There are some circumstances in which the effects of peer assessment are less favourable or even disadvantageous, for example:
O'Donovan, Price and Rust (2001) found that from a student's perspective, the criteria assessment grid is a constructive means of providing feedback and improving the quality of work, only if it is part of a broader initiative that includes 'explanation of the standards, exemplars and opportunities for discussion' (p.74). Sadler (2009) on the other hand is critical of assessment grids and contends that 'To simply reach for a rubric or construct a scoring key each time a complex work has to be appraised is both impractical and artificial in life outside academe. Equipping students with holistic evaluative insights and skills therefore would contribute an important graduate skill' (p178). However, even with holistic assessment the use of criteria to guide the assessor would not be inappropriate and Sadler admits that 'clearly, there is more work to be done on this important issue of grading complex student responses to assessment.' (p.178).
In relation to the exemplars
Four (two pairs of) exemplars of mediocre written work from the previous year's cohort were selected using a pairwise assessment process, as described by Heldsinger and Humphry (2010) to rank ten randomly selected examples.
Those four exemplars were then subjected to a more conventional assessment process using an assessment grid that has been developed over several years, for the purpose of assessing written reports in this unit. This process was performed independently by two authors of this paper and then compared (one of these authors is a peer reviewer in the Achievement Matters project and has participated in the four national calibration workshops on written communication). Out of six possible criteria over the four exemplars (i.e. twenty four points of assessment) the assessors gave exactly the same rating for 20 of the points of assessment. The assessment for the other points differed by only one degree on the scale and after discussion between the markers, consensus was reached. This process formed the 'benchmark' assessment for the two pairs of (four) exemplars.
In relation to the SPARKPLUS tool
The SPARKPLUS tool was to be used as an efficient means of recording the students' individual assessments of the two pairs of exemplars, and then evaluating these assessments against the 'benchmark' assessment to arrive at a 'judgment mark' to be awarded to each student. At the preparatory stage, this involved summarising the assessment grid criteria into SPARKPLUS, and then capturing the 'benchmark' assessment for each criterion and each exemplar (this 'benchmark' was not made available to the students at this stage).
Figure 1: Time line of process
Did the quality of the judgment change?
In this instance 'quality of judgment' is measured by comparing the student ratings of the exemplars against the benchmark ratings which were established in advance by two of the authors. For each exemplar almost 40 student ratings were collected across each of the six criteria. Exemplars A and B were performed concurrently but before the intervention, while exemplars C and D were performed concurrently, but after the intervention. The six criteria (taken from the grid) are shown here:
|K1||Knowledge 1||Relevant issues are raised and the discussion is convincing. Are the conclusions valid?|
|K2||Knowledge 2||Does the work contain reference to, and reflect an understanding and application of appropriate, adequate literature?|
|W1||Writing style 1||Is the written work grammatically correct and well structured? (This includes sentence and paragraph structure and logical sequencing.)|
|W2||Writing style 2||Is the report well presented and in accordance with the requirements that were stipulated for the assignment?|
|W3||Writing style 3||Is the in-text referencing correctly applied?|
|W4||Writing style 4||Is the end-text referencing correctly applied?|
SPARKPLUS requires the students to enter their ratings on a continuous spectrum, which is evenly sub-divided into categories as below. The students are aware (from a trial run) that the spectrum is a continuous measure.
|WB||Well below acceptable|
|WA||Well above acceptable|
Table 1 reports a descriptive overview of the students' ratings compared against the benchmark rating. On average, they initially allocated lower marks for most of the writing style criteria (W2 to W4), although they were more lenient with grammar and structural considerations (W1). After the intervention their ratings were more balanced. The results for the knowledge criteria (K1 and K2) indicate that after the intervention the students become more demanding with these criteria.
Since the aim of the exercise is to develop judgment, the students' percentage ratings are themselves of limited interest. The size (not direction) of the gap between the ratings and the benchmark (hereafter termed the 'difference') is more representative of the quality of the judgment. In order to eliminate directional effects, the absolute value of the 'difference' is used.
If a difference of less than ten percent either side of the benchmark is defined as a demonstration of sound judgment, then Figure 2 gives an indication of the quality of judgment, by exemplar and by criterion. Visually, improvement in judgment appears to occur in relation to criteria W1, W2, W3 and W4.
Figure 2: Percentage of student ratings within ten percent of the benchmark rating
Table 2 shows the means of the grouped results for Exemplars A and B (C and D), which represent the before (after) intervention situations. An improvement in judgment (shown by a decrease in the differences) occurs in relation to criteria K1, W1, W2 and W3, however, a two tailed t-test indicates that these improvements are statistically significant only in the cases of criteria W1, W2 and W3. The unexpected result for W4 arises because of the effect of outliers.
A control test is performed to establish whether the differences for Exemplar A (C) are significantly different from those for Exemplar B (D) - to ensure that there is a degree of internal stability in the ratings differences both before and after the intervention. For the criteria W1 and W2, in which it is thought that an improvement has occurred, it is shown that there is no significant internal inconsistency in the ratings before and after the intervention. However, for criterion W3, the ratings differences for A and B (i.e. before the intervention) are significantly different from each other (as they are also between C and D). This lowers the credibility of the finding reported in the previous paragraph in relation to criterion W3.
|Exemplars A and B||18.53||16.10||15.48||26.46||25.42||14.73|
|Exemplars C and D||15.23||19.27||10.99||12.44||15.56||15.08|
|Improvement in judgment||Yes||No||Yes*||Yes*||Yes*||No|
|* Significant at 1%|
In summary an improvement in judgment appears to exist in relation to criteria W1 and W2.
Did the quality of the written communication change?
After the teaching initiative, both draft and final reports were assessed by an independent examiner, who specialises in written communication. The examiner used the same assessment grid, however within both knowledge criteria, her focus was on the students' ability to communicate the information, as opposed to the choice of content selected by the students for presentation. These draft/final reports (although not obvious to the examiner) effectively formed a matched pair sample because each pair of reports was written by the same person.
|Score (After - Before)||0.49||0.65||0.08||0.14||0.20||0.09||1.65|
|Number of report pairs||35||35||35||35||35||35||35|
|Instances of no change||9||5||16||15||10||17||1|
|Instances of improvement||23||24||14||16||23||13||32|
As shown in Table 3 within the 35 pairs of reports, the total assessment for the final report exceeds the assessment for the draft report in 32 cases. In one case there is no change and in the other two cases, there is a slight decrease in the assessment. However, on average, the results show a significant improvement between the draft and final reports (for W1 and W4, with 95% confidence, and with 99% confidence for all the other criteria). While these results reflect a clear improvement in their written communication, it is less clear what caused the improvement, since it is not permitted at the university to run a true control group who do not undergo the teaching innovation. The factors which may cause the improvement include (1) the fact that the students are assessed on their final report, but not on their draft report, (2) the students may have had more time to assimilate the material, (3) students undertaking a concurrent communication unit may have been positively influenced by that process and (4) the teaching innovation may have enabled the students to more effectively self evaluate and improve their reports. It is likely that some combination of these factors is at work.
Did the students take the opportunity to self-evaluate their reports?
The report requirements state that "the final version of the report must include a statement, of not more than 250 words, explaining how the self-evaluation process changed your work from Draft one to the Final Version." The aim of this requirement is to encourage the students to take time to reflect on the process of self-evaluation. If the students find value in the process, then there is potential for this intervention to seed or nurture, the habit of regular self-evaluation - an important component of independent learning.
The free-form reflective statements indicate that the students did take the opportunity to self-evaluate their work. Reference to the teaching intervention was unsolicited, however 62% of the respondents specifically credit it with providing insight into some component of their written work.
How did the students perceive the process?
The responses to the survey (in the Appendix) are scored as follows, in order to perform statistical analysis: Strongly disagree = 1, Disagree = 2, Neutral = 3, Agree = 4 and Strongly agree = 5.
The average rating for each question is summarised in Table 4, where it is seen that students indicate somewhere between 'agreement' and 'strong agreement' for questions one to six (which concentrate on the exemplar process itself), whereas for questions seven to nine (which concentrate on application of what they may have learned through the exemplar process to improve their work) their average response lies somewhere between 'neutral' and 'agreement'. Interestingly, they credit an improvement in their understanding of the assessment criteria to all three parts of the exemplar process - their individual review (Q4), their collaborative peer review (Q5) and the instructor's de-brief (Q6), although there is a statistically insignificant preference shown for the effect of the instructor's de-brief.
|Number of responses||38||39||39||39||38||39||39||39||39|
The demographic information collected via the survey enables initial analysis of whether the perceptions of students about the teaching initiative are to some extent dependent on the first language of the country in which they undertook their previous studies. For all of the survey questions, except Q9a, students whose previous study was in a country where English was the first language, rated the effects of the intervention more highly than students whose previous education was in an Asian country.
An equivalent analysis of the survey responses by gender revealed nothing remarkable.
In summary, it would appear that the whole process was perceived by the students to have improved their confidence in assessing written material, their understanding of assessment criteria, and their ability to apply these skills to their written work. In answer to question ten, many of the comments emphasised the challenge and learning that had resulted from the process of having to assess, using specific criteria, another's piece of written work.
Collaborative classrooms stimulate both students and teachers. In the most authentic of ways, the collaborative learning process models what it means - to question, learn and understand in concert with others. Learning collaboratively demands responsibility, persistence and sensitivity, but the result can be a community of learners in which everyone is welcome to join, participate and grow (Smith & MacGregor, 1992, p.29).This teaching and learning initiative was challenging. The preparation was considerable, although during semester, the initiative created only a slightly higher workload than the previous year. More particularly, our academic judgment was subjected to close scrutiny, which was daunting - not because the benchmarks were necessarily right or wrong, but simply because justifying and explaining what has almost always been an intuitive process, is confronting. The debrief sessions required serious preparation and planning.
With the benefit of hindsight, one improvement to the process would be to include an example of a 'good' piece of writing as one of the exemplars. It would give the students a more secure launching pad for analysis and comparison.
The results support that there were benefits for the students both in terms of their judgment and written communication skills. In a study of this kind, the analysis of quantitative results lends a degree of rigour however the value of the qualitative and anecdotal findings should not be overlooked. The majority of students moved from hesitation and bewilderment through to being more confident about assessing the quality of the exemplars, via a process of enthusiastic engagement. Two remarks, both from strong, mature-aged students are worth documenting - the first said that he wished he had undergone this process before starting employment and the second commented that in all her years of study she had never previously learned the power of learning through participation. It should also be noted that this study has attempted to measure only the short term benefits to the students. It would be most interesting to know if any of the perceived benefits are more permanent.
Birrell, B. (2006). The changing face of the accounting profession in Australia. CPA Australia, November.
Boud, D., Cohen, R. & Sampson, J. (1999). Peer learning and assessment. Assessment & Evaluation in Higher Education, 24(4), 413-426. http://dx.doi.org/10.1080/0260293990240405
Boud, D. & Falchikov, N. (1989). Quantitative studies of student self-assessment in higher education: A critical analysis of findings. Higher Education, 18(5), 529-549. http://dx.doi.org/10.1007/BF00138746
Boud, D. & Falchikov, N. (2006). Aligning assessment with long?term learning. Assessment & Evaluation in Higher Education, 31(4), 399-413. http://dx.doi.org/10.1080/02602930600679050
Brindley, C. & Scoffield, S. (1998). Peer assessment in undergraduate programmes. Teaching in Higher Education, 3(1), 79-79. http://dx.doi.org/10.1080/1356215980030106
Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in Education & Training International, 32(2), 175-187. http://dx.doi.org/10.1080/1355800950320212
Freeman, M. & Hancock, P. (2011). A brave new world: Australian learning outcomes in accounting education. Accounting Education, 20(3), 265-273. http://dx.doi.org/10.1080/09639284.2011.580915
Hancock, P. & Freeman, M. (2011). Learning and teaching academic standards statement for accounting. Learning and Teaching Academic Standards Project: Australian Learning and Teaching Council. http://www.abdc.edu.au/download.php?id=325154,282,1
Heldsinger, S. & Humphry, S. (2010). Using the method of pairwise comparison to obtain reliable teacher assessments. Australian Educational Researcher, 37(2), 1-19. http://dx.doi.org/10.1007/BF03216919
Lane, B. (2009). Degrees still lure low skill migrants. The Australian Higher Education Supplement, 14 January. http://www.theaustralian.com.au/news/degrees-still-lure-low-skill-migrants/story-e6frg6n6-1111118554567
Mello, J. A. (1993). Improving individual member accountability in small work group settings. Journal of Management Education, 17(2), 253-259. http://dx.doi.org/10.1177/105256299301700210
Mowl, G. & Pain, R. (1995). Using self and peer assessment to improve students' essay writing: A case study from geography. Innovations in Education & Training International, 32(4), 324-335. http://dx.doi.org/10.1080/1355800950320404
O'Donovan, B., Price, M. & Rust, C. (2001). The student experience of criterion-referenced assessment (through the introduction of a common criteria assessment grid). Innovations in Education and Teaching International, 38(1), 74-85. http://dx.doi.org/10.1080/147032901300002873
Sadler, R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education, 34(2), 159-179. http://dx.doi.org/10.1080/02602930801956059
Searby, M. & Ewers, T. (1997). An evaluation of the use of peer assessment in higher education: A case study in the School of Music, Kingston University. Assessment and Evaluation in Higher Education, 22(4), 371-383. http://dx.doi.org/10.1080/0260293970220402
Smith, B. L. & MacGregor, J. T. (1992). What is collaborative learning? In A. Goodsell, M. Maher, V. Tinto, B. Smith & J. MacGregor (Eds.), Collaborative learning: A sourcebook for higher education. Pennysylvania: National Center on Postsecondary Teaching, Learning, and Assessment.
Somervell, H. (1993). Issues in assessment, enterprise and higher education: The case for self-peer and collaborative assessment. Assessment & Evaluation in Higher Education, 18(3), 221-233. http://dx.doi.org/10.1080/0260293930180306
Topping, K. (1998). Peer assessment between students in colleges and universities. Review of Educational Research, 68(3), 249-276. http://dx.doi.org/10.3102/00346543068003249
Torrance, H. (2007). Assessment as learning? How the use of explicit learning objectives, assessment criteria and feedback in post?secondary education and training can come to dominate learning. Assessment in Education: Principles, Policy & Practice, 14(3), 281-294. http://dx.doi.org/10.1080/09695940701591867
Willey, K. & Freeman, M. (2006a). Completing the learning cycle: The role of formative feedback when using self and peer assessment to improve teamwork and engagement. In Proceedings Australasian Association for Engineering Education Conference - Creativity, challenge, change: Partnerships in engineering education. http://hdl.handle.net/10453/7495
Willey, K. & Freeman, M. (2006b). Improving teamwork and engagement: The case for self and peer assessment. Australasian Journal of Engineering Education. http://www.aaee.com.au/journal/2006/willey0106.pdf
Willey, K. & Gardner, A. (2010). Investigating the capacity of self and peer assessment activities to engage students and promote learning. European Journal of Engineering Education, 35(4), 429-443. http://dx.doi.org/10.1080/03043797.2010.490577
Willey, K. & Gardner, A. (2012). Collaborative learning frameworks to promote a positive learning culture. Paper presented at the 2012 Frontiers in Education Conference - Soaring to New Heights in Engineering Education, Seattle, Washington.
|Please cite as: Hancock, P., Dale-Jones, G. & Willey, K. (2013). Impact of collaborative peer and self assessment on students' judgment and written communication. In Design, develop, evaluate: The core of the learning environment. Proceedings of the 22nd Annual Teaching Learning Forum, 7-8 February 2013. Perth: Murdoch University. http://ctl.curtin.edu.au/professional_development/conferences/tlf/tlf2013/refereed/hancock.html|
Copyright 2013 Phil Hancock, Gillian Dale-Jones and Keith Willey. The authors assign to the TL Forum and not for profit educational institutions a non-exclusive licence to reproduce this article for personal use or for institutional teaching and learning purposes, in any format, provided that the article is used and cited in accordance with the usual academic conventions.