A number of research projects addressing many of the challenges posed by existing and new conceptions of assesment are underway at CARPE. Details of each project can be accessed below. Through its research programme, CARPE contributes to critiques of policy and to policy making pertaining to all aspects of assessment.
Assessment of Critical Thinking in Dublin City University (ACT@DCU) Pilot Project
Project Director: Michael O'Leary (CARPE)
ACT@DCU is a pilot study investigating the extent to which an online test developed by the Educational Testing Service (ETS) in the United States to assess critical thinking in higher education is suitable for use in DCU. There are two major parts to this study, as follows:
(1) Data from the administration of the test to 225 First Year students in 2017 and 225 Final Year students in 2018 are being used to validate the test in a non-US context, i.e. the DCU item statistics and factor structure will be compared to the international item statistics and factor structure. This will also provide comparative data between DCU and other institutions internationally.
(2) This study will also seek to determine if the test can measure growth in CT from First Year to Final Year in DCU. In the pilot this will be achieved with current First Years and Fourth Years. Leaving Certificate points will be used as a control variable to account for the fact that we are measuring two different groups. The intention is that the psychometric properties of test hold up in the DCU context, the First Year cohort will be tested again in 2021 when they are in their final year.
Over time we hope that data from the test will help to facilitate conversations among staff regarding pedagogy, curricula and educational interventions to improve teaching and learning of CT; be integrated with other non-cognitive and co-curricular indicators of student success at DCU (e.g. Loop Reflect); and provide evidence of institutional and program-level learning outcomes in CT.
Standardised Assessment in Reading and Mathematics Project
Project Directors: Michael O’Leary (CARPE) & Deirbhile Nic Craith (INTO)
Since the publication of the Assessment Guidelines for Primary Schools in 2007, there has been a stronger focus on assessment in schools. There are many forms of assessment, of which standardised testing is one. Standardised tests have gained in importance since 2012 when schools have been obliged to forward the results of standardised tests to the Department of Education and Science.
The purpose of this research is to explore the use of standardised tests in literacy and numeracy in primary schools in Ireland (ROI). Issues addressed include teachers’ understanding of standardised tests, how standardised tests are used formatively and diagnostically and the experiences of schools in reporting on the results of standardised tests. Data on teachers' professional development needs with respect to standardised testing have also been gathered. The findings of the project will inform both policy and practice regarding standardised testing in Irish primary schools.
Following a year-long development and piloting process, a questionnaire was distributed in hard copy and online to a random sample of 5,000 teachers in May 2017. A report on the first set of findings from the survey is due to be published in mid 2018.
The Leaving Certificate as Preparation for Third Level Education Project
Project Directors: Michael O'Leary & Darina Scully (CARPE)
The Leaving Certificate Examination (LCE) plays a crucial role in the process of how people are selected for third level education. However, the extent to which the Leaving Certificate Programme (LCP) as a whole (i.e. 5th and 6th year + the examination) provides students with a good preparation for their Third Level education is unclear. This project seeks to shed some light on this issue.
For those who sat the LCE in 2017, their experiences of 5th and 6th year and preparing for and taking the LCE are still fresh in their minds. They also have a good understanding of what is being required of them in college. With this in mind, this project will gather data from first year students at DCU who are in a position to offer important insights that can be used to evaluate the LCP and its relevance to first year in college.
A questionnaire is currently being distributed in both hard copy and online formats to a large sample of first year undergraduate students. Preliminary findings from the study are due to be published later in 2018.
Assessment for Learning and Teaching (ALT) Project
Project Directors: Zita Lysaght and Michael O'Leary (CARPE)
The Assessment for Learning and Teaching Project (ALT) project has its roots in assessment challenges identified from research conducted in the Irish context. This research highlighted: (a) The dearth of assessment instruments nationally and internationally to capture changes in children’s learning arising from exposure to, and engagement with, AfL pedagogy; (b) The nature and extent of the professional challenges that teachers face when trying to implement AfL with fidelity and; (c) The urgent need for a programme of continuous professional development to be designed to support teachers, at scale, to learn about AfL and integrate it into their day-to-day practice.
Since the initiation of the ALT project, significant progress has been made in all three areas: The Assessment for Learning Audit instrument (AfLAi) has been used across a range of Irish primary schools and in educational systems in Australia, Norway, Malaysia, Chile and South Africa. Work is currently underway in adapting the AfLAi for use in secondary schools and by students in both primary and secondary settings. The research focused Assessment for Learning Measurement instrument (AfLMi), first developed in 2013, is being updated with data from almost 600 Irish primary teachers. Programmes of professional development continue to be implemented in pre-service undergraduate teacher education, in post graduate teacher education and as part of site based in-service teacher education.
Assessment of Teachers' Tacit Knowledge Project
Project Directors: Steven Stemler (Wesleyan University), Darina Scully, Anastasios Karakolidis, Vasiliki Pitsia, Michael O’Leary(CARPE) & Julian Elliott (Durham University)
Effective teachers are characterized not only by pedagogical abilities and subject area mastery, but also by interpersonal skills. This 'social dimension' of teaching, however, is typically under-represented in teacher education programmes, and it is difficult to define and communicate the elements of skilled interpersonal behaviour in the teaching profession. This project involved the collection of data from experienced teachers across Ireland, England and Russia with respect to this issue. Specifically, teachers' responses to the Tacit Knowledge Inventory (TKI-HS) - a situational judgement test consisting of 11 challenging interpersonal scenarios - were investigated. Preliminary findings outlining cross-cultural differences were presented at the AERA annual meeting in April 2018. Further data analysis is now underway and more in-depth findings pertaining specifically to the Irish sample will be published in late 2018.
Animations for Large Scale Testing Programmes Project
Project Director: Anastasios Karakolidis (PhD Candidate); Project Supervisors: Michael O’Leary and Darina Scully
Although technology provides a great range of opportunities for facilitating assessment, text is usually the main, if not the only, means used to explain the context, present the information, and communicate the question in a testing process. Written language is often a good fit for measuring simple knowledge-based constructs that can be clearly communicated via text (such as historical events), nevertheless, when assessments provide test takers with plenty of sophisticated information in order to measure complex constructs, text may not be suitable for facilitating this process (Popp, Tuzinski, & Fetzer, 2016). Animations could be a pioneering way of presenting complex information that cannot be easily communicated by text/written language. However, research literature on the use of animations in assessment is currently scarce.
This PhD project is focused on (a) the development and validation of an animation-based assessment instrument, (b) the investigation of test-takers’ views about this instrument and (c) the examination of the extent to which this animated test provides a more valid assessment of test-takers’ knowledge, skills and abilities, compared to a parallel text-based test.
Multimedia Items in Technology-Based Assessments Project
Project Director: Paula Lehane (PhD Candidate); Project Supervisors: Michael O'Leary, Mark Brown, Darina Scully
Using digital devices and technology to conduct assessments in educational settings has become more and more prevalent in recent times. Indeed, it now seems inevitable that future assessments in education will be administered using these media (OECD, 2013). Therefore, it is essential that educational researchers know how to design reliable and appropriate technology-based assessments (TBAs). However, no guidelines for the design of TBAs exist. Although TBAs have many medium-unique items, including multimedia objects such as animations and videos, their impact on test-taker performance and behaviour, particularly in relation to attentional allocation and information processing, has yet to be fully clarified.
This PhD project aims to contribute to this growing field of research by addressiong the following research questions:
- How do test-takers allocate attention in TBAs that include multimedia items?
- What is the impact of multimedia items on test-taker performance in TBAs?
- Is there a difference in test-taker performance and attentional allocation behaviours in TBAs involving different types of multimedia items?
- What are the meaningful relationships, patterns and clusters in performance data that can be used to assess and score problem-solving skills in TBAs?
Competency-Based Assessment (CBA) Project
Project Directors: Darina Scully (CARPE), Kenneth Ridgley (Prometric), Mark Raymond (U.S. National Board of Medical Examiners)
In recent years, the terms ‘competency’ and ‘competencies’ have gradually infiltrated the discourse surrounding assessment of all types. One of the biggest challenges facing the competency paradigm; however, is the plethora of different interpretations of this term within the literature (Shippman et al., 2000). As part of this project, CARPE conducted a review of the literature, unpacking ‘competencies’ and how they differ from traditional KSA (knowledge, skills and abilities) statements. Among the issues considered in this review are:
- whether or not competencies can be considered to be directly observable
- whether competencies encompass personal attributes in addition to skills and abilities
- whether competency statements describe superior or effective performance/achievement
- whether competencies are best ‘atomistic’ or ‘holistic’ in nature
The findings of the review were presented at the Association of Test Publishers (ATP) in February 2018. They were received with interest, and a follow-up paper in collaboration with Mark Raymond, Research Director of the National Board of Medical Examiners is now underway. This paper will explore additional topics, such as the use of subject-matter expert (SME) judgements in conjunction with multi-dimensional scaling (MDS) and clustering algorithms to help define high-level competencies.
Assessment of Bullying in the Workplace Project
Project Directors: James O’Higgins Norman (Anti Bullying Centre), Michael O’Leary (CARPE), Larry Ludlow (Boston College) & Sebastian Montcaleano (Boston College)
Bullying research has gained a substantial amount of interest in recent years because of the impact on emotional and social development of children, adolescents and adults. Assessment measures have generally focused on school bullying and interactions between peers. The most widely used assessment is the Olweus Bullying Questionnaire (OBQ) which characterises peer bullying behaviour as involving at least one of the following: physical harassment, verbal abuse, relational or exclusion bullying and cyberbullying. This tool and others of its kind are advantageous to research but still pose certain issues in terms of providing absolute measures of bullying behaviour.
One recent meta-analysis conducted by the Anti-Bullying Research Centre, DCU showed that a range of methodological issues influenced the rates of bullying in studies across Ireland, even if the same assessment scale was used. These included: the use or lack of a definition of bullying, the timeframe participants were referred to (i.e., ‘ever’ to ‘one month ago’) and even how answers were categorised (‘frequent’ to ‘occasional’). While the OBQ has been reliably validated in several large scale and international studies among school children, there is no equivalent for adult or workplace bullying. This research will draw on a literature review on current approaches to the assessment of workplace bullying to develop a Rasch measurement scale using scenarios/vignettes that can be trialled with a small sample of Irish adults.
Assessment of Learning about Well-Being Project
Project Directors: Catherine Maunsell (Institute of Education) Michael O’Leary (CARPE), Larry Ludlow (Boston College) & Gulsah Gurkan (Boston College)
Wellbeing of the child/young person and its significance for developmental and educational outcomes are unequivocal. The objective measurement of wellbeing is a relatively recent and growing academic pursuit and this particular study seeks to examine, a heretofore understudied area namely, the potential use of scenarios/vignettes to objectively measure young people’s experience of well-being as a consequence of their engagement with efforts to enhance it within second-level schooling.
The impetus for this study can be found in recent curricular reforms within the Irish second level (high-school) education context. Of particular relevance to this research is the new subject area of Wellbeing within a reformed Junior Cycle programme for students aged 12-15 years. Following stakeholder consultation, the National Council for Curriculum and Assessment (NCCA) has published the Guidelines for Wellbeing in Junior Cycle 2017. In highlighting the use of a wide variety of assessment approaches such as projects, presentations, self and peer assessment, the guidelines point towards more class-based assessment of students’ learning. As a consequence, the development of objective assessment tools that will aid student and teacher judgement making warrants serious academic attention.
State-of-the-art in Digital Technology-Based Assessment Project
Project Directors: Michael O'Leary, Darina Scully, Anastasios Karakolidis & Vasiliki Pitsia
Following an invitation to contribute to a special issue of the European Journal of Education, a peer-reviewed journal covering a broad spectrum of topics in education, CARPE completed an article on the state-of-the-art in digital technology based assessment. The article spans advances in the automated scoring of constructed responses, the assessment of complex 21st century skills in large-scale assessments, and innovations involving high fidelity virtual reality simulations. An "early view" of the article was published online in April 2018, with the special issue (focused on the extent to which assessments are fit for their intended purposes) due to be published in June 2018.
Learning Portfolios in Higher Education Project
Project Directors: Darina Scully, Michael O'Leary (CARPE) & Mark Brown (NIDL)
The ePortfolio is often lauded as a powerful pedagogical tool, and consequently, is rapidly becoming a central feature of contemporary education. Learning portfolios are a specific type of ePortfolio that may also include drafs and 'unpolished work', with the focus on both the process of compiling the portfolio as well as the finished product. It has been hypothesized that learning portfolios may be especially suited to the development and assessment of integrated, cross-curricular knowledge and generic skills/attributes (e.g. critical thinking, creativity, communication, emotional intelligence), as opposed to disciplinary knowledge in individual subject areas. This is of particular interest in higher education contexts, as universities and third-level face growing demands to bridge a perceived a gap between what students learn, and what is valued by employers.
In conjunction with the NIDL, CARPE have completed a comprehensive review examining the state of the field regarding learning portfolio use in third level education. Specifically, this review (i) evaluates the extent to which there is sufficient empirical support for the effectiveness of these tools, (ii) highlights potential challenges associated with their implementation on a university-wide basis and (iii) offers a series of recommendations with respect to ‘future-proofing’ the practice.
The review was formally launched in February 2018, and has garnered a great deal of attention in the intervening months. A roundtable discussion to discuss possible research opportunities within DCU on the basis of the findings is due to be held in May 2018. In addition, selected findings will be disseminated at various international conferences, including EdMedia in June 2018 (Amsterdam, Netherlands) and the World Education Research Association (WERA) in August 2018 (Cape Town, South Africa). The review is also in the process of being adapted and translated into Chinese by Prof. Junhong Xiao of Shantou Radio and Television University, with the translated article to feature in an upcoming addition of the peer-reviewed journal Distance Education in China, and CARPE have recently acquired funding to support an additional translation into Spanish.
Situational Judgement Tests (SJTs) Project
Project Directors: Michael O'Leary, Darina Scully, Anastasios Karakolidis (CARPE) & Steve Williams (Prometric)
Originating in and most commonly associated with personnel selection, Situational Judgement Tests (SJTs) can be loosely defined as assessment instruments comprised of items that (i) present a job-related situation, and (ii) require respondents to select an appropriate behavioural response to that situation. Traditionally, SJTs are assumed to measure tacit, as opposed to declarative knowledge; or as Wagner and Sternberg (1985) put it: “intelligent performance in real-world pursuits… a kind of ‘street smarts’ that helps people cope successfully with problems, constraints and realities of day-to-day life.” Debate about the precise nature of the construct(s) underlying SJTs persists.
In recent years, the use of SJTS for selection, training and development purposes is increasing rapidly; however, these instruments are still not well understood. Experts continually debate issues such as how SJTs should be developed, and how they should be scored. For example, although it is common to score SJTs based on test-takers' ability to identify the best response to each given situation, it has been argued (e.g. Stemler, Aggarwal & Nithyanand, 2016) that it may be more appropriate to distinguish between test-takers based on their ability to avoid the worst option.
In collaboration with our funders, Prometric, this project investigated the use of an SJT designed using the 'critical incident approach' for the training and development of Prometric employees. Specifically, the project sought to explore validity evidence for the SJT as a measure of successful job performance across two different keying approaches (consensus vs. expert judgement) and five different scoring approaches (match best, match worst, match total, mismatch penalty and avoid total). The findings suggest that scoring approaches focused on the ability to identify the worst response are associated with moderate criterion-related validity. Furthermore, they underline the psychometric difficulties associated with critical incident SJTs. These findings were presented at the European Association of Test Publishers (E-ATP) conference in September 2017 (Noordwijk, Netherlands).
Three vs. Four Option Multiple-Choice Items Project
Project Directors: Darina Scully, Michael O’Leary (CARPE) & Linda Waters (Prometric)
A strong body of research spanning 30+ years suggests that the optimal number of response options for a multiple-choice item is three (one key and two distractors). Three-option multiple choice items require considerably less time to construct and to administer than their four- or five-option counterparts. Furthermore, they facilitate broader content coverage and greater reliability through the inclusion of additional items. Curiously; however, the overwhelming majority of test developers have paid little heed to these factors. Indeed, it is estimated than <1% of contemporary high-stakes assessments contain three-option items (Edwards, Arthur & Bruce, 2012).
This phenomenon has often been commented on, but never satisfactorily explained. It is likely that fears of guessing have played a role, given that chance selection of the correct response theoretically rises from 20% to 25% or 33% when the number of response options is reduced to three. However, distractor analyses across various contemporary high-stakes assessments reveal that more than 90% of four- and five-option items have at least one non-functioning distractor. That is, most of the time, when test-takers need to guess, they do not do so blindly; rather, they eliminate at least one implausible distractor and guess from the remaining options. As such, the majority of four- and five-option items effectively operate as three-option items.
In collaboration with our funders, Prometric, a study comparing item performance indices and distractor functioning (based on responses from more than 1,000 test candidates) across 20 stem-equivalent three-and four-option items from a high-stakes certification assessment was conducted. Findings from the project were disseminated at the Association of Test Publishers (ATP) Conference in March 2017 (Scotsdale, Arizona) and are being used to inform the development of future items for a number of Prometric's examinations. _________________________________________________________________________________________________________
Higher-Order Thinking in Multiple-Choice Items (HOT MC Items) Project
Project Directors: Darina Scully & Michael O'Leary (CARPE)
The nature of assessment can exert a powerful influence on students’ learning behaviours. Indeed, students who experience assessments that require them to engage in higher-order thinking processes (i.e. those represented by higher levels of Bloom’s (1956) Taxonomy, such as application, analysisand synthesis) are more likely to adopt more meaningful, holistic approaches to future study, as opposed to engaging in mere surface-level or ‘rote-learning’ techniques (Leung, Mok & Wong, 2008). It is often assumed that multiple-choice items are incapable of assessing higher-order thinking; or indeed, anything beyond recall/recognition, given that the correct answer is provided amongst the response options. However, a more correct assertion may be that multiple-choice items measuring higher-order processes are simply rarely constructed. It is true that MC items, like all assessment formats, are associated with some limitations, but it may be possible to construct these items at higher levels, provided certain strategies are followed. MC items remain attractive to and frequently used by educators and test developers due to their objective and cost-efficient nature; as such, it is worthwhile putting time and effort into identifying and disseminating these strategies within the assessment community.
This project involved a comprehensive review of the extant literature that (a) has investigated the capacity of multiple-choice items to measure higher-order thinking or (b) has offered strategies or guidance on how to do so. An article based on this review was published in the peer-reviewed journal Practical Assessment, Research and Evaluation in May 2017, and the work has also contributed to the development of training and development materials for Prometric's test developers
Practice Tests in Large Scale Testing Programmes Project
Project Directors: Anastasios Karakolidis, Darina Scully & Michael O’Leary (CARPE)
This project was focused on developing a research brief reviewing the key findings arising from the literature regarding the efficacy of practice tests. This brief was published in the summer 2017 edition of Clear Exam Review, and the findings are also being used to inform Prometric's practices surrounding the development and provision of practice test materials.
Feedback in Large Scale Testing Programmes Project
Project Directors: Michael O'Leary & Darina Scully (CARPE)
In recent years there is increasing pressure on test developers to provide diagnostic information that can assist unsuccessful test takers improve future performance and assist academic and training institutions in evaluating the success of their programmes and identifying areas that may need to be modified (Haberman & Sinharay, 2010; Haladyna & Kramer, 2004). This growing demand for diagnostic feedback is also evident in the Standards for Educational and Psychological Testing, which states that “candidates who fail may profit from information about the areas in which their performance was especially weak” (AERA, APA & NCME, 2014, p. 176). Test developers face a substantial challenge in attempting to meet this demand, whilst simultaneously upholding their ethical responsibility – also outlined in the Standards – to ensure that any test data that are reported and shared with stakeholders, or used to make educational, certification or licensure decisions are accurate, reliable and valid.
CARPE have conducted a review of the literature on the issues involved in reporting test sub-scores, including the identfication of a number of approaches (e.g. scale anchoring, level descriptors and graphical methods) that can be taken when reporting in large scale testing contexts. These findings are being used to inform Prometric's practices surrounding the provision of feedback to unsuccessful test candidates. _________________________________________________________________________________________________________
Partial Credit for Multiple Choice Items Project
Project Directors: Darina Scully & Michael O’Leary (CARPE)
Multiple-choice test developers have typically shown a strong preference for the use of the single-best answer response format and number-correct scoring. Despite this, some measurement experts have expressed dissatisfaction with these methods, on the basis that they assume a sharp dichotomy between knowledge and lack of knowledge. That is, the entire model fails to take into account the varying degrees of partial knowledge a test-taker may possess on an item-by-item basis. This is regrettable, as information regarding test-takers’ partial knowledge levels may contribute significantly to the estimation of true proficiency levels (DeAyala, 1992).
In response to this criticism, a number of alternative testing models that facilitate the allocation of partial credit have been proposed (e.g. Ben-Simon, Budesco & Nevo, 1997; Frary, 1989; Lau, Lau, Hong & Usop, 2011). Their exact nature varies considerably, but all share the aim of maximizing the information efficiency of individual items, and increasing precision of measurement. CARPE have conducted a literature review focusing on three approaches that facilitate the allocation of partial credit; namely: option-weighted scoring, confidence-weighted responding, and the liberal multiple-choice item format. To date, findings regarding the application of these approaches have been complex and equivocal, with no one method emerging as uniformly superior. Ultimately, whether or not it is worth pursuing these strategies depends on a combination of multiple factors, such as the overall purpose of the assessment, the overall difficulty (pass rate) of the test, the cognitive complexity of the items, and the particular psychometric properties that are most valued by the test developer.