Accessibility     Accessibility |
My DCU | Library | Loop |

☰       Department Menu

Centre for Assessment Research, Policy and Practice in Education (CARPE)

Our Research

A number of research projects addressing many of the challenges posed by existing and new conceptions of assesment are underway at CARPE. Details of each project can be accessed below. Through its research programme, CARPE contributes to critiques of policy and to policy making pertaining to all aspects of assessment. 

Current Research Projects

Assessment of Critical Thinking in Dublin City University (ACT@DCU) Pilot Project
Project Director: Michael O'Leary (CARPE)

ACT@DCU is a pilot study investigating the extent to which an online test developed by the Educational Testing Service (ETS) in the United States to assess critical thinking in higher education is suitable for use in DCU.  There are two major parts to this study, as follows:  

(1) Data from the administration of the test to 225 First Year students in 2017  and 225 Final Year students in 2018 are being used to validate the test in a non-US context, i.e. the DCU item statistics and factor structure will be compared to the international item statistics and factor structure. This will also provide comparative data between DCU and other institutions internationally.

(2) This study will also seek to determine if the test can measure growth in CT from First Year to Final Year in DCU. In the pilot this will be achieved with current First Years and Fourth Years.  Leaving Certificate points will be used as a control variable to account for the fact that we are measuring two different groups. The intention is that the psychometric properties of test hold up in the DCU context,  the First Year cohort will be tested again in 2021 when they are in their final year.

Over time we hope that data from the test will help to facilitate conversations among staff regarding pedagogy, curricula and educational interventions to improve teaching and learning of CT; be integrated with other non-cognitive and co-curricular indicators of student success at DCU (e.g. Loop Reflect); and provide evidence of institutional and program-level learning outcomes in CT.
_________________________________________________________________________________________________________

Computer Based Examinations for Leaving Certificate Computer Science
Project Director: Paula Lehane (CARPE) with the National Council for Curriculum and Assessment (NCCA)

In line with the recommendations of the Digital Strategy for Schools (Department of Education and Skills [DES], 2015), a more formal approach to the study of technology and computing in second-level schools has been established thanks to the newly developed Computer Science (CS) curriculum for Leaving Certificate students. In September 2018, forty schools were selected to trial the implementation of this subject which will culminate in an ‘end-of-course computer-based examination’ in 2020 (National Council for Curriculum and Assessment [NCCA]). This examination will represent 70% of a student’s overall CS grade.

The use of a computer-based exam (CBE) for the assessment of CS students is a significant departure in tradition for the Leaving Certificate programme. All other subjects in the Leaving Certificate involving an end-of-course examination employ paper-based tests. The planned CBE for CS will represent the first of its kind in the Irish education system when it is introduced in 2020. This challenge of developing and delivering a high-stakes CBE is also magnified by the inherent difficulties associated with the evaluation of students’ knowledge and learning in computing courses (Kallia, 2018). Therefore, to ensure that the pending CS exam delivers a CBE in a responsible manner that preserves the fairness, validity, utility and credibility of the Leaving Certificate examination system, CARPE was commissioned by the NCCA to write a report outlining what factors pertaining to the design, development and deployment of this CBE will need to be considered. The aim of this report is to guide the decisions of policy-makers and other relevant stakeholders. The report will be available online in late 2019.

________________________________________________________________________________________________________

Assessment in the re-developed Primary School Curriculum
Project Directors: Darina Scully, Zita Lysaght and Michael O'Leary (CARPE) with the National Council for Curriculum and Assessment

The National Council for Curriculum and Assessment (NCCA) is working with teachers and early childhood practitioners, school leaders, parents and children, management bodies, researchers and other stakeholders to develop a high-quality curriculum for the next 10-15 years. CARPE has submitted a paper that aims to present for discussion a number of issues that will be pertinent when considering the role of assessment in the redeveloped primary curriculum. This discussion paper, which highlights the importance of aligning assessment, learning and teaching in curricular reform and implementation, will be available online in late 2019.

_________________________________________________________________________________________________________

Interviews as a Selection Tool for Initial Teacher Education
Project Directors: Paula Lehane, Michael O’Leary, & Zita Lysaght (CARPE) 

Even when other factors such as student background and prior attainment are controlled for, having a ‘good’ teacher is one of the most important predictors of student success (Slater et al., 2009). Therefore, the goal of Initial Teacher Education (ITE) in Ireland should be to produce these ‘good’ teachers for employment in primary and post-primary schools. To achieve this, the admissions procedures for ITE programmes have a responsibility to select those applicants who are most suited to the profession and most likely to succeed in the required preparatory courses.

Many countries, including Ireland, now consider a range of admission criteria and selection tools when screening applicants for entry to ITE. Most Irish institutions use applicant performance on an interview as a selection tool for postgraduate ITE (Darmody & Smyth, 2016). However, research on the efficacy of interviews as a selection measure for ITE programmes is mixed. CARPE has recently conducted an in-depth literature review that aims synthesise what research has found about the efficacy, or otherwise, of interviews as a selection mechanism for university based postgraduate programmes of teacher education. Based on this review, recommendations for future practice and policy were formulated. This research will be available online in late 2019.

_________________________________________________________________________________________________________

Validity Evidence in Maintenance of Certification (MOC) Assessments
Project Directors: Michael O’Leary (CARPE), & Katherine Reynolds (Boston College)

In the United States, Maintenance of Certification (MOC) was created in response to public health research in the 1990s revealing “significant variations in healthcare practices” among physicians, many of which lead to preventable negative patient outcomes (Chung, Clapham, & Lalonde, 2011, p. 3). A critical component of MOC is the cognitive exam, which until recently was typically administered by its respective medical specialty board in a secure environment near the end of a 10-year cycle.

Criticism of medical specialty boards’ Maintenance of Certification (MOC) 10-year exams have spurred the development of shorter, more frequent assessments. These assessment programs, such as MOCA Minute or Knowledge Check-In, aim to reduce examinee burden and provide better alignment to physician practice. But how can we tell if these forms of assessment are “better” than the traditional, 10-year exam? The answer is not straightforward; however, in this research a validity-based framework for addressing this question is proposed, emphasizing validity evidence with respect to content, criteria, and consequences. This paper aims to identify these evidence types and discuss strategies for accumulating this evidence. It will be available online for viewing in mid-2020.

_________________________________________________________________________________________________________

Test Specifications in Certification and Licensure Assessments
Project Directors: Michael O’Leary (CARPE), & Katherine Reynolds (Boston College) 

Specifying test content, often in the form of professional knowledge, skills and judgments (KSJs), prior to item development is fundamental to test quality in the field of certification and licensure. Alignment between test items and KSJs can serve as a critical piece of content-related validity evidence for a testing program. Alignment studies, common in high-stakes achievement testing, are less frequent in credentialing and licensure. The current research explores the application of the Webb model (2006), a popular alignment approach in educational settings, for use in professional testing. The Webb model provides four indices of alignment: categorical congruence, depth of knowledge consistency, range of knowledge correspondence and balance of representation. Together, these four indices can be taken as evidence of alignment between assessment items and KSJs, providing content validity evidence for a testing program. This form of validity evidence is particularly important, given that US test developers have a legal mandate to ensure test content is reflective of the knowledge, skills and judgements in a given profession. This paper, which will be available online after April 2020, discusses how a Webb alignment study might be carried out in a professional testing context and how such a study proceeds in practice.

_________________________________________________________________________________________________________

Assessment for Learning and Teaching (ALT) Project

Project Directors: Zita Lysaght and Michael O'Leary (CARPE)

The Assessment for Learning and Teaching Project (ALT) project has its roots in assessment challenges identified from research conducted in the Irish context. This research highlighted:  (a) The dearth of assessment instruments nationally and internationally to capture changes in children’s learning arising from exposure to, and engagement with, AfL pedagogy; (b) The nature and extent of the professional challenges that teachers face when trying to implement AfL with fidelity and; (c) The urgent need for a programme of continuous professional development to be designed to support teachers, at scale, to learn about AfL and integrate it into their day-to-day practice.

Since the initiation of the ALT project, significant progress has been made in all three areas: The Assessment for Learning Audit instrument (AfLAi) has been used across a range of Irish primary schools and in educational systems in Australia, Norway, Malaysia, Chile and South Africa.  Work is currently underway in adapting the AfLAi for use in secondary schools and by students in both primary and secondary settings. The research focused Assessment for Learning Measurement instrument (AfLMi), first developed in 2013, is being updated with data from almost 600 Irish primary teachers. Programmes of professional development continue to be implemented in pre-service undergraduate teacher education, in post graduate teacher education and as part of site based in-service teacher education.

_________________________________________________________________________________________________________

Assessment of Transversal Skills in STEM
Project Partners: CARPE, NIDL (National Institute for Digital Learning), CASTeL (Centre for the Advancement of STEM Teaching and Learning), and representatives from education ministeries in the following countries: Ireland, Austria, Cyprus, Belgium, Slovenia, Spain, Finland and Sweden

This is an ambitious DCU led project that has recently secured €2.34 million in Erasmus+ funding. Involving 8 EU countries (Ireland, Austria, Cyprus, Belgium, Slovenia, Spain, Finland and Sweden) and working with 120 schools across Europe, the partners will devise, test and scale new digital assessments for STEM education that engage and enhance students’ transversal skills such as teamwork, communication and discipline-specific critical thinking. CARPE is currently working on providing the theoretical and operational frameworks of the research, with particular reference to digital assessment approaches. CARPE is also responsible for a review and synthesis of the research literature on STEM formative digital assessment with particular respect to schools. This review will highlight how students can best be scaffolded towards the development of key STEM skills and how digital tools can capture the evidence for this and augment teaching practices to help provide constructive feedback on student progress. Updates will be posted on the CARPE website throughout the project.

_________________________________________________________________________________________________________

Minecraft in Irish Primary and Post-Primary Schools
Project Directors: Paula Lehane (CARPE), & Deirdre Butler (Institute of Education) 

Minecraft is a ‘sandbox’ video game first released to the public in 2009, where players control a virtual avatar in a Lego-like world made up of blocks that can be moved to construct buildings and used to create items and structures. It is currently the second most popular video game of all time, with more than 100,000,000 copies sold worldwide. Schools in many countries, including the United States of America and Sweden, have decided to integrate the education version of the game (MinecraftEdu) into their curricula. MinecraftEdu is a platform that allows students in schools to freely explore, imagine and create in virtual environments and collaborative worlds that have special features specifically designed for classroom use. In DCU, the Institute of Education (IoE) has a dedicated Minecraft Studio (opened in December 2018) that student teachers can use to explore how innovative virtual and physical learning spaces can transform the curriculum and engage young people with new educational environments. CARPE is currently working with the IoE to develop research projects that will investigate the possible value of Minecraft in Irish primary and post-primary settings.

_________________________________________________________________________________________________________

Assessment of Teachers' Tacit Knowledge Project

Project Directors:  Steven Stemler (Wesleyan University), Darina Scully, Anastasios Karakolidis, Vasiliki Pitsia, Michael O’Leary (CARPE) & Julian Elliott (Durham University) 

Effective teachers are characterized not only by pedagogical abilities and subject area mastery, but also by interpersonal skills.  This 'social dimension' of teaching, however, is typically under-represented in teacher education programmes, and it is difficult to define and communicate the elements of skilled interpersonal behaviour in the teaching profession.  This project involved the collection of data from experienced teachers across Ireland, England and Russia with respect to this issue.  Specifically, teachers' responses to the Tacit Knowledge Inventory (TKI-HS) -  a situational judgement test consisting of 11 challenging interpersonal scenarios - were investigated.  Preliminary findings outlining cross-cultural differences were presented at the AERA annual meeting in April 2018.  Further data analysis is now underway and more in-depth findings pertaining specifically to the Irish sample will be published in late 2018.  
_________________________________________________________________________________________________________

Multimedia Items in Technology-Based Assessments Project
Project Director: 
Paula Lehane (PhD Candidate); Project Supervisors: Michael O'Leary, Mark Brown, Darina Scully 

Using digital devices and technology to conduct assessments in educational settings has become more and more prevalent in recent times.  Indeed, it now seems inevitable that future assessments in education will be administered using these media (OECD, 2013).  Therefore, it is essential that educational researchers know how to design reliable and appropriate technology-based assessments (TBAs).  However, no guidelines for the design of TBAs exist.  Although TBAs have many medium-unique items, including multimedia objects such as animations and videos, their impact on test-taker performance and behaviour, particularly in relation to attentional allocation and information processing, has yet to be fully clarified. 

This PhD project aims to contribute to this growing field of research by addressiong the following research questions: 
- How do test-takers allocate attention in TBAs that include multimedia items? 
- What is the impact of multimedia items on test-taker performance in TBAs?
- Is there a difference in test-taker performance and attentional allocation behaviours in TBAs involving different types of multimedia items?
- What are the meaningful relationships, patterns and clusters in performance data that can be used to assess and score problem-solving skills in TBAs?   

_________________________________________________________________________________________________________

Measuring Non-Cognitve Factors
Project Directors:
Mark Morgan (DCU) & Lisa Abrams (Virginia Commonwealth University)

Cognitive skills involve conscious intellectual effort, such as thinking, reasoning, or remembering. In contrast, non-cognitive skills are related to other important interpersonal or ‘soft skills’ like motivation, integrity, persistence, resilience and interpersonal interaction. These non-cognitive factors are associated with an individual’s personality, temperament, and attitudes. Research at the international, national and school level is increasingly looking at the value of non-cognitive skills and at how education systems impact their development. As demand for these skills will continue to change as economies and labor market needs evolve, with trends such as automation causing fundamental shifts, this is an issue that should be addressed by researchers and those in industry. This paper, which should be published by the end of 2019, will explore the definitions of and current measurement difficulties associated with non-cognitive factors. Possibilities for future research will also be identified.

_________________________________________________________________________________________________________

Competency-Based Assessment (CBA) Project
Project Directors: Darina Scully (CARPE), Katherine Reynolds (Boston College), Kenneth Ridgley (Prometric), Mark Raymond (U.S. National Board of Medical Examiners) 

In recent years, the terms ‘competency’ and ‘competencies’ have gradually infiltrated the discourse surrounding assessment of all types.  One of the biggest challenges facing the competency paradigm; however, is the plethora of different interpretations of this term within the literature (Shippman et al., 2000).  As part of this project, CARPE conducted a review of the literature,  unpacking ‘competencies’ and how they differ from traditional KSA (knowledge, skills and abilities) statements. Among the issues considered in this review are:

  • whether or not competencies can be considered to be directly observable
  • whether competencies encompass personal attributes in addition to skills and abilities
  • whether competency statements describe superior or effective performance/achievement
  • whether competencies are best ‘atomistic’ or ‘holistic’ in nature

The findings of the review were presented at the Association of Test Publishers (ATP) in February 2018.  They were received with interest, and a follow-up paper in collaboration with Mark Raymond, Research Director of the National Board of Medical Examiners is now underway.  This paper will explore additional topics, such as the use of subject-matter expert (SME) judgements in conjunction with multi-dimensional scaling (MDS) and clustering algorithms to help define high-level competencies.
_________________________________________________________________________________________________________
Assessment of Bullying in the Workplace Project
Project Directors:  James O’Higgins Norman (Anti Bullying Centre), Michael  O’Leary (CARPE), Angela Vitale (Anti-Bullying Centre), Larry Ludlow (Boston College)  & Sebastian Montcaleano (Boston College)

Bullying research has gained a substantial amount of interest in recent years because of the impact on emotional and social development of children, adolescents and adults. Assessment measures have generally focused on school bullying and interactions between peers. The most widely used assessment is the Olweus Bullying Questionnaire (OBQ) which characterises peer bullying behaviour as involving at least one of the following: physical harassment, verbal abuse, relational or exclusion bullying and cyberbullying. This tool and others of its kind are advantageous to research but still pose certain issues in terms of providing absolute measures of bullying behaviour.

One recent meta-analysis conducted by the Anti-Bullying Research Centre, DCU showed that a range of methodological issues influenced the rates of bullying in studies across Ireland, even if the same assessment scale was used. These included: the use or lack of a definition of bullying, the timeframe participants were referred to (i.e., ‘ever’ to ‘one month ago’) and even how answers were categorised (‘frequent’ to ‘occasional’). While the OBQ has been reliably validated in several large scale and international studies among school children, there is no equivalent for adult or workplace bullying. This research will draw on a literature review on current approaches to the assessment of workplace bullying to develop a Rasch measurement scale using scenarios/vignettes that can be trialled with a small sample of Irish adults.
_________________________________________________________________________________________________________

Teacher Assessment Literacy - Scale Develoment Project

Project Directors:  Darina Scully, Anastasios Karakolidis, Vasiliki Pitsia, Paula Lehane, Zita Lysaght & Michael O’Leary (CARPE) 

Assessment literacy (Stiggins, 1991) has long been viewed as an important characteristic of effective teachers. Assessment literacy can be defined as “an individual's understandings of the fundamental assessment concepts and procedures deemed likely to influence educational decisions” (Popham, 2011, p. 267). Correct use of different assessment types and forms, accurate administration and scoring of tests, appropriate interpretation of student performance etc., all form part of a teacher’s assessment literacy. At present, very few objective measures of teacher assessment literacy exist. CARPE is currently attempting to rectify that with the current research project as the centre is now trying to develop a scale to measure primary teachers’ assessment literacy in Ireland.
_________________________________________________________________________________________________________

Assessment of Learning about Well-Being Project
Project Directors: Catherine Maunsell (Institute of Education) Michael O’Leary (CARPE), Larry Ludlow (Boston College)  & Gulsah Gurkan (Boston College)

Wellbeing of the child/young person and its significance for developmental and educational outcomes are unequivocal. The objective measurement of wellbeing is a relatively recent and growing academic pursuit and this particular study seeks to examine, a heretofore understudied area namely, the potential use of scenarios/vignettes to objectively measure young people’s experience of well-being as a consequence of their engagement with efforts to enhance it within second-level schooling.

The impetus for this study can be found in recent curricular reforms within the Irish second level (high-school) education context.  Of particular relevance to this research is the new subject area of Wellbeing within a reformed Junior Cycle programme for students aged 12-15 years.  Following stakeholder consultation, the National Council for Curriculum and Assessment (NCCA) has published the Guidelines for Wellbeing in Junior Cycle 2017. In highlighting the use of a wide variety of assessment approaches such as projects, presentations, self and peer assessment, the guidelines point towards more class-based assessment of students’ learning. As a consequence, the development of objective assessment tools that will aid student and teacher judgement making warrants serious academic attention.

Completed Research Projects

Standardised Assessment in Reading and Mathematics Project
Project Directors: Michael O’Leary (CARPE) & Deirbhile Nic Craith (INTO)

Since the publication of the Assessment Guidelines for Primary Schools in 2007, there has been a stronger focus on assessment in primary schools. There are many forms of assessment, of which standardised testing is one. Standardised tests have gained in importance since 2012 when schools have been obliged to forward the results of standardised tests to the Department of Education and Science.

The purpose of this research was to explore the use of standardised tests in literacy and numeracy in primary schools in Ireland (ROI). Issues addressed include teachers’ understanding of standardised tests, how standardised tests are used formatively and diagnostically and the experiences of schools in reporting on the results of standardised tests. Data on teachers' professional development needs with respect to standardised testing were also gathered. Following a year-long development and piloting process, a questionnaire was distributed in hard copy and online to a random sample of 5,000 teachers in May 2017. Over 1500 teachers returned completed questionnaires and the findings were released in June 2019, along with a number of policy recommendations to help address the needs and concerns of teachers regarding the use of standardised tests in primary schools. The report is available online from CARPE.

 _________________________________________________________________________________________________________

Animations for Large Scale Testing Programmes Project 
Project Director: Anastasios Karakolidis (PhD Candidate); Project Supervisors: Michael O’Leary and Darina Scully 

Although technology provides a great range of opportunities for facilitating assessment, text is usually the main, if not the only, means used to explain the context, present the information, and communicate the question in a testing process.  Written language is often a good fit for measuring simple knowledge-based constructs that can be clearly communicated via text (such as historical events), nevertheless, when assessments provide test takers with plenty of sophisticated information in order to measure complex constructs, text may not be suitable for facilitating this process (Popp, Tuzinski, & Fetzer, 2016). Animations could be a pioneering way of presenting complex information that cannot be easily communicated by text/written language. However, research literature on the use of animations in assessment is currently scarce.

Anastasios' recently completed PhD project focused on (a) the development and validation of an animation-based assessment instrument, (b) the investigation of test-takers’ views about this instrument and (c) the examination of the extent to which this animated test provides a more valid assessment of test-takers’ knowledge, skills and abilities, compared to a parallel text-based test. His preliminary findings will be published after September 2019.
___________________________________________________________________________________________________________

The Leaving Certificate as Preparation for Third Level Education Project 
Project Directors: Michael O'Leary & Darina Scully (CARPE)

The Leaving Certificate Examination (LCE) plays a crucial role in the process of how people are selected for third level education. However, the extent to which the Leaving Certificate Programme (LCP) as a whole (i.e. 5th and 6th year + the examination) provides students with a good preparation for their Third Level education is unclear.  This project aimed to shed some light on this issue.

For those who sat the LCE in 2017, their experiences of 5th and 6th year and preparing for and taking the LCE were still fresh in their minds as they started college in Sepetmber 2017.  They also had a good understanding of what is being required of them in college by March 2018.  With this in mind, this project gathered data from first year students at DCU in April 2018 who were in a position to offer important insights that can be used to evaluate the LCP and its relevance to first year in college.

Findings from the study are available online from CARPE.
_________________________________________________________________________________________________________

State-of-the-art in Digital Technology-Based Assessment Project
Project Directors:
Michael O'Leary, Darina Scully, Anastasios Karakolidis & Vasiliki Pitsia 

Following an invitation to contribute to a special issue of the European Journal of Education, a peer-reviewed journal covering a broad spectrum of topics in education, CARPE completed an article on the state-of-the-art in digital technology based assessment.  The article spans advances in the automated scoring of constructed responses, the assessment of complex 21st century skills in large-scale assessments, and innovations involving high fidelity virtual reality simulations.  An "early view" of the article was published online in April 2018, with the special issue (focused on the extent to which assessments are fit for their intended purposes) due to be published in June 2018. 
_________________________________________________________________________________________________________

Learning Portfolios in Higher Education Project 
Project Directors: Darina Scully, Michael O'Leary (CARPE) & Mark Brown (NIDL) 

The ePortfolio is often lauded as a powerful pedagogical tool, and consequently, is rapidly becoming a central feature of contemporary education.  Learning portfolios are a specific type of ePortfolio that may also include drafs and 'unpolished work', with the focus on both the process of compiling the portfolio as well as the finished product.  It has been hypothesized that learning portfolios may be especially suited to the development and assessment of integrated, cross-curricular knowledge and generic skills/attributes (e.g. critical thinking, creativity, communication, emotional intelligence), as opposed to disciplinary knowledge in individual subject areas.  This is of particular interest in higher education contexts, as universities and third-level face growing demands to bridge a perceived a gap between what students learn, and what is valued by employers.  

In conjunction with the NIDL, CARPE have completed a comprehensive review examining the state of the field regarding learning portfolio use in third level education.  Specifically, this review (i) evaluates the extent to which there is sufficient empirical support for the effectiveness of these tools, (ii) highlights potential challenges associated with their implementation on a university-wide basis and (iii) offers a series of recommendations with respect to ‘future-proofing’ the practice.

The review was formally launched in February 2018, and has garnered a great deal of attention in the intervening months.  A roundtable discussion to discuss possible research opportunities within DCU on the basis of the findings is due to be held in May 2018.  In addition, selected findings will be disseminated at various international conferences, including EdMedia in June 2018 (Amsterdam, Netherlands) and the World Education Research Association (WERA) in August 2018 (Cape Town, South Africa).  The review is also in the process of being adapted and translated into Chinese by Prof. Junhong Xiao of Shantou Radio and Television University, with the translated article to feature in an upcoming addition of the peer-reviewed journal Distance Education in China, and CARPE have recently acquired funding to support an additional translation into Spanish.
_________________________________________________________________________________________________________
Situational Judgement Tests (SJTs) Project
Project Directors:  Michael O'Leary, Darina Scully, Anastasios Karakolidis (CARPE) & Steve Williams (Prometric)

Originating in and most commonly associated with personnel selection, Situational Judgement Tests (SJTs) can be loosely defined as assessment instruments comprised of items that (i) present a job-related situation, and (ii) require respondents to select an appropriate behavioural response to that situation.  Traditionally, SJTs are assumed to measure tacit, as opposed to declarative knowledge; or as Wagner and Sternberg (1985) put it: “intelligent performance in real-world pursuits… a kind of ‘street smarts’ that helps people cope successfully with problems, constraints and realities of day-to-day life.”  Debate about the precise nature of the construct(s) underlying SJTs persists. 

In recent years, the use of SJTS for selection, training and development purposes is increasing rapidly; however, these instruments are still not well understood.  Experts continually debate issues such as how SJTs should be developed, and how they should be scored.  For example, although it is common to score SJTs based on test-takers' ability to identify the best response to each given situation, it has been argued (e.g. Stemler, Aggarwal & Nithyanand, 2016) that it may be more appropriate to distinguish between test-takers based on their ability to avoid the worst option.    

In collaboration with our funders, Prometric, this project investigated the use of an SJT designed using the 'critical incident approach' for the training and development of Prometric employees.  Specifically, the project sought to explore validity evidence for the SJT as a measure of successful job performance across two different keying approaches (consensus vs. expert judgement) and five different scoring approaches (match best, match worst, match total, mismatch penalty and avoid total).  The findings suggest that scoring approaches focused on the ability to identify the worst response are associated with moderate criterion-related validity.  Furthermore, they underline the psychometric difficulties associated with critical incident SJTs.  These findings were presented at the European Association of Test Publishers (E-ATP) conference in September 2017 (Noordwijk, Netherlands).         
_________________________________________________________________________________________________________
Three vs. Four Option Multiple-Choice Items Project
Project Directors:  Darina Scully, Michael O’Leary (CARPE) & Linda Waters (Prometric) 

A strong body of research spanning 30+ years suggests that the optimal number of response options for a multiple-choice item is three (one key and two distractors).  Three-option multiple choice items require considerably less time to construct and to administer than their four- or five-option counterparts.  Furthermore, they facilitate broader content coverage and greater reliability through the inclusion of additional items.  Curiously; however, the overwhelming majority of test developers have paid little heed to these factors.  Indeed, it is estimated than <1% of contemporary high-stakes assessments contain three-option items (Edwards, Arthur & Bruce, 2012). 

This phenomenon has often been commented on, but never satisfactorily explained.  It is likely that fears of guessing have played a role, given that chance selection of the correct response theoretically rises from 20% to 25% or 33% when the number of response options is reduced to three.  However, distractor analyses across various contemporary high-stakes assessments reveal that more than 90% of four- and five-option items have at least one non-functioning distractor.  That is, most of the time, when test-takers need to guess, they do not do so blindly; rather, they eliminate at least one implausible distractor and guess from the remaining options.  As such, the majority of four- and five-option items effectively operate as three-option items.     

In collaboration with our funders, Prometric, a study comparing item performance indices and distractor functioning (based on responses from more than 1,000 test candidates) across 20 stem-equivalent three-and four-option items from a high-stakes certification assessment was conducted.  Findings from the project were disseminated at the Association of Test Publishers (ATP) Conference in March 2017 (Scotsdale, Arizona) and are being used to inform the development of future items for a number of Prometric's examinations. 

_________________________________________________________________________________________________________
Higher-Order Thinking in Multiple-Choice Items (HOT MC Items) Project
Project Directors: Darina Scully & Michael O'Leary (CARPE)

The nature of assessment can exert a powerful influence on students’ learning behaviours.  Indeed, students who experience assessments that require them to engage in higher-order thinking processes (i.e. those represented by higher levels of Bloom’s (1956) Taxonomy, such as applicationanalysisand synthesis) are more likely to adopt more meaningful, holistic approaches to future study, as opposed to engaging in mere surface-level or ‘rote-learning’ techniques (Leung, Mok & Wong, 2008).  It is often assumed that multiple-choice items are incapable of assessing higher-order thinking; or indeed, anything beyond recall/recognition, given that the correct answer is provided amongst the response options.  However, a more correct assertion may be that multiple-choice items measuring higher-order processes are simply rarely constructed.  It is true that MC items, like all assessment formats, are associated with some limitations, but it may be possible to construct these items at higher levels, provided certain strategies are followed.  MC items remain attractive to and frequently used by educators and test developers due to their objective and cost-efficient nature; as such, it is worthwhile putting time and effort into identifying and disseminating these strategies within the assessment community. 

This project involved a comprehensive review of the extant literature that (a) has investigated the capacity of multiple-choice items to measure higher-order thinking or (b) has offered strategies or guidance on how to do so.  An article based on this review was published in the peer-reviewed journal Practical Assessment, Research and Evaluation in May 2017, and the work has also contributed to the development of training and development materials for Prometric's test developers
_________________________________________________________________________________________________________
Practice Tests in Large Scale Testing Programmes  Project
Project Directors: Anastasios Karakolidis, Darina Scully & Michael O’Leary (CARPE)

This project was focused on developing a research brief reviewing the key findings arising from the literature regarding the efficacy of practice tests.  This brief was published in the summer 2017 edition of Clear Exam Review, and the findings are also being used to inform Prometric's practices surrounding the development and provision of practice test materials. 
_________________________________________________________________________________________________________
Feedback in Large Scale Testing Programmes Project
Project Directors: Michael O'Leary & Darina Scully (CARPE) 

In recent years  there is increasing pressure on test developers to provide diagnostic information that can  assist unsuccessful test takers improve future performance and assist academic and training institutions in evaluating the success of their programmes and identifying areas that may need to be modified (Haberman & Sinharay, 2010; Haladyna & Kramer, 2004).  This growing demand for diagnostic feedback is also evident in the Standards for Educational and Psychological Testing, which states that “candidates who fail may profit from information about the areas in which their performance was especially weak” (AERA, APA & NCME, 2014, p. 176).  Test developers face a substantial challenge in attempting to meet this demand, whilst simultaneously upholding their ethical responsibility – also outlined in the Standards – to ensure that any test data that are reported and shared with stakeholders, or used to make educational, certification or licensure decisions are accurate, reliable and valid. 

CARPE have conducted a review of the literature on the issues involved in reporting test sub-scores, including the identfication of a number of approaches (e.g. scale anchoring, level descriptors and graphical methods) that can  be taken when reporting in large scale testing contexts.  These findings are being used to inform Prometric's practices surrounding the provision of feedback to unsuccessful test candidates.  _________________________________________________________________________________________________________
Partial Credit for Multiple Choice Items Project
Project Directors: Darina Scully & Michael O’Leary (CARPE)

Multiple-choice test developers have typically shown a strong preference for the use of the single-best answer response format and number-correct scoring.  Despite this, some measurement experts have expressed dissatisfaction with these methods, on the basis that they assume a sharp dichotomy between knowledge and lack of knowledge.  That is, the entire model fails to take into account the varying degrees of partial knowledge a test-taker may possess on an item-by-item basis.  This is regrettable, as information regarding test-takers’ partial knowledge levels may contribute significantly to the estimation of true proficiency levels (DeAyala, 1992). 

In response to this criticism, a number of alternative testing models that facilitate the allocation of partial credit have been proposed (e.g. Ben-Simon, Budesco & Nevo, 1997; Frary, 1989; Lau, Lau, Hong & Usop, 2011).  Their exact nature varies considerably, but all share the aim of maximizing the information efficiency of individual items, and increasing precision of measurement.  CARPE have conducted a literature review focusing on three approaches that facilitate the allocation of partial credit; namely: option-weighted scoring, confidence-weighted responding, and the liberal multiple-choice item format.  To date, findings regarding the application of these approaches have been complex and equivocal, with no one method emerging as uniformly superior.  Ultimately, whether or not it is worth pursuing these strategies depends on a combination of multiple factors, such as the overall purpose of the assessment, the overall difficulty (pass rate) of the test, the cognitive complexity of the items, and the particular psychometric properties that are most valued by the test developer.