Issues in Language Teaching (Dec 2015)
The Effect of Item Modality and Note-taking on EFL Learners’ Performance on a Listening Test
Abstract
The pivotal role of listening comprehension in second/foreign language learning requires that researchers conduct studies which investigate factors that affect test takers’ performances. The present study was set out to examine whether item modality (i.e., written vs. oral items) affects listening comprehension test performance. In addition, it investigated whether allowing test takers to take notes while listening would also affect their performances. To this end, two different tests, each containing 20 multiple choice items, were administered to 66 (35 female and 31 male) upper-intermediate EFL learners. The first test was administered to look into the role of item modality, and the second test was employed to investigate the effect of note-taking. The application of independent samples t-tests to analyze the data revealed that that test takers performed better when the items were provided in written rather than oral form, and that test takers’ performances did not differ significantly when they were allowed to take notes. More detailed findings and implications are discussed in the paper. Keywords: item modality, note-taking, listening test, EFL Authors’ emails: [email protected]; [email protected] INTRODUCTION Testing is an integral part of any teaching and learning process and like other educational fields, English as a Foreign Language (EFL) education has long recognized testing as a major part of the teaching. New perspectives on the use of English as an international language (EIL) have presented significant challenges to the field of language testing, with calls for change in assessment practices arising over the past decade (Jenkins, 2006). One of the skills for which constructing test items is demanding is listening comprehension, as in real life contexts, listeners cannot usually move backwards and forwards over what is being said in the way that they can do in a written text. In a listening test, the key concern is to evaluate the students’ comprehension, that is, to determine whether the students have grasped the intended message. So, it is essential to decide on the conditions and operations that merit inclusion in a test of listening comprehension (Weir, 1990). In actual fact, the assessment of listening abilities is one of the least understood, least developed and yet one of the most important areas of language testing (Buck, 2007). The issue is even more complex nowadays given the unprecedented diversity of testing methods and academic pathways available for international students (Taylor & Geranpayeh, 2011). In other words, among the many existing variables that are considered to affect test takers’ performance, one central issue is the effect of test methods and formats (Alderson, 2000; Bachman, 1990; Buck, 2007). Besides the awkward nature of testing listening comprehension, there exist some factors that might affect test-takers’ performance. When test developers set out to design a listening comprehension test, they usually encounter, and have to account for, numerous factors that may influence test-takers’ performances, such as item format, speech rate, speaker accents, topic familiarity, etc. Considering this, the present study is, for one thing, concerned with the mode of presentation of multiple choice items in a listening comprehension test, that is, it makes a difference to present the items orally or in written form. Focusing on different items format, some studies conclude that allowing candidates to preview question stems enables them to make good use of planning, a meta-cognitive strategy by directing their attention to relevant areas of the text (Wu, 1998; Yanagawa & Green, 2008). However, the listening items in which the stem of the question is not seen on the paper or the screen, have their own advocates who believe that auditory memory does not need to be supported by visual aids. When it comes to listening instruction, there are numerous studies that look at enhancing listening comprehension through various means of support, such as visual aids, advance organizers, captions, etc. with the overall conclusion that most of these forms of support have been found to facilitate listener comprehension and also to have some positive psychological effects on listeners’ learning (Chang, 2009). Elekaei, Faramarzi and Biria (2015), for instance, investigated test-takers’ attitudes towards items with audio-only, pictorial and visual modality and found that students favoured picture-based (rather than visual) items over audio items. I support of Elekaei et al.’s (2015) findings, Basal, Gulozer and Demir (2015) compared the performance of Turkish EFL learners on items with audio and visual modality and found the performance on audio modality to be significantly higher than that on visual modality. In addition to the modality of the item, another factor that may affect listening performance of the EFL learners is whether they are allowed to take notes during the listening test, which is the second concern of this study. Note-taking variable was considered in conjunction with modality in this research on the assumption that both these variables involve similar cognitive processes in listening. In other words, while written item modality helps listeners to overcome the memory problem (which is evident in oral items), note-taking functions similarly by allowing the listener to have a partial written record of the lapsing message, helping to remember better and retrieve what may otherwise be unretrievable. Some studies (e.g., Hale & Courtney, 1991) have found that note-taking almost always improves retention of aurally presented material when performance is measured with a recall test. In their studies, Hale and Courtney concluded that allowing students to take notes would lead them to a better performance in listening tests. However, research suggests that note-taking may work differently for listeners of different proficiency level. In his study with 257 participants who took English as a second language placement exam, Song (2011) found that those with higher levels of proficiency benefited more from note-taking compared to listeners with lower proficiency level, while some other studies have failed to find an effect (Carter & Van Matre, 1975; Dunkel, 1986); and still other researchers like Aiken, Thomas, and Shennum (as cited in Song, 2011) have observed an interfering effect. Given the widespread use of language proficiency tests administered throughout the world and considering test-takers’ desire to gain satisfactory results in such tests as the score are sued to make life-changing decisions on them, there seems to be a need to better understand what affects candidates’ performances in such tests (as well as in less high-stakes assessments) in order to assist test-takers in obtaining desired results. Therefore, in designing such tests, besides the needs of the candidates, test-dependent factors including item modality and allowing test-takers the opportunity to take notes are areas which require further research attention with the aim to provide listeners the chance to reveal their true listening competence and guard them against memory problems, which can be doubled in exam setting. LITERATURE REVIEW L2 listening tests should demonstrate that the test-taker has the ability to process language automatically, in real time (Buck, 2007). Thus, there is a need for the listener to automatize the listening process, and consequently there is a need to assess if the listener can indeed comprehend spoken language automatically in real time. This presents a dilemma for testers, in determining the mode of presenting the item stems and allowing the test-takers to take notes or not, since the first of these resources does not seem to exist in real-life situations, and the second has few outside realizations (except for academic or formal encounters). Ideally, the item stems should be presented orally to the test-takers, because this is generally how spoken language is encountered in real life. Note-taking is considered as a good strategy for keeping the points in mind in real life and in listening to lectures. However, the burden on listeners in an exam context is quite different from that in a real-life context, and it needs to be investigated whether providing support to listeners in the form of written items (as opposed to oral items) and allowing them the chance to take notes helps them to better reveal their listening ability in a test context. Below we provide a brief account of some studies conducted in this area before we introduce our project. Item Modality It has been argued that EFL learners need abundant support when processing auditory input (Chang, 2009). Numerous studies (e.g., Markham, Peter, & McCarthy, 2001; Stewart & Pertusa, 2004; Vandergrift, 2007) have looked at enhancing listening comprehension through various means of support such as visual aids, captions, etc. Most of these supports have been recognized as facilitative and have been shown to have positive psychological effects on listeners’ comprehension. However, in the realm of assessing listening, providing cognitive processing support to listeners in the form of written item modality has not received due attention. A few studies have looked at the issue of modality but diverse results have been reported. Yanagawa and Green (2008), for example, examined whether the choice of multiple choice item format led to differences in task difficulty and test performance. In their study, they studied three formats, two of which were Full Question Preview (used in tests such as TOEIC which displays both the question stem and answer options on the question paper/screen) and Answer Option Preview (used in TOEFL where answer options are displayed on the question paper/screen, but the questions are heard after the text). In their study, 279 test-takers participated and listening tests were administered using different formats. The results indicated that listening comprehension test performance did vary significantly according to whether test-takers had been able to preview the question stem. It was found that allowing test-takers to preview only the answer options produced fewer correct answers than allowing test-takers to preview both the question stem and answer options prior to listening. However, they suggested that although the cues provided in answer options did not facilitate comprehension, previewing them may encourage test takers to fall back on a lexical matching strategy. Chang (2009) compared two modes of aural input: reading while listening versus listening only. The results of the study revealed that although students showed a strong preference for the reading/listening mode, they gained only 10% more with that mode. More than half of the students believed that reading while listening mode made listening tasks easier and more comprehensible. In a study similar to ours, Wagner (2010) examined the effect of using visual components of spoken texts on listeners’ performance and their comprehension of aural information in a listening test. In his study, the two groups’ performance on an ESL listening test was compared. The control group took a listening test with audio-only texts. The experimental group took the same listening test, with the exception that test-takers received the input through the use of video texts. Analyses of the results indicated that the video (experimental) group performed better than the audio-only (control) group on the test, and the difference between their performances was statistically significant. More recently, Rogowsky, Calhoun, and Tallal (2016) compared immediate and delayed comprehension (retention) of three groups of learners who either listened to an audio text (the preface and a chapter form a non-fiction book), or read the original text on screen or did both at the same time (dual modality). The findings revealed that in neither condition did readers/listeners outperform either at Time 1, or at Time 2, concluding that input modality does not matter in comprehension. The comprehension test was however in written mode and whether similar results could be obtained in listening comprehension has to be established by future research. Note-taking Note-taking is generally considered to promote the process of learning and retaining, especially in the context of reading comprehension (Rahmani & Sadeghi, 2011). Over the years, research on note-taking has generated debates, and researchers have tried to implement studies to verify whether taking notes is effective for students to improve their listening comprehension. A study conducted by Hale and Courtney (1991) who investigated note-taking effect on listening comprehension of test-takers in TOEFL mini-talks. In their study, Hale and Courtney had two groups of international test-takers (a total number of 563 students) who were getting ready to take part in TOEFL. In their study, one of the groups was free to take notes while the test-takers were listening to the text. However, the test-takers in the other group were not allowed to take notes at all. The results revealed that allowing test-takers to take notes had little effect on their performance, and more interestingly, allowing test-takers to take notes impaired their performance in the listening test. In a similar vein, in a study conducted by Kobayashi (2005), the researcher was concerned with the question of whether the process of taking notes promotes the encoding of lecture or text information, and if so, how much and why. The results of his meta-analysis demonstrated that the overall effect of note-taking compared with no note-taking was positive but modest, which was somewhat inconsistent with the tenets of encoding hypothesis that note-taking enhances learning by stimulating note-takers to actively process the material and to relate it to their existing knowledge. Carrell (2007) investigated the relationships between note-taking strategies and performance on the three language assessment tasks. Her study employed 216 international test-takers (88 males and 128 females) ranging in listening comprehension proficiency from low-intermediate to high. The participants were tested and were asked to take notes while listening to the talks. The researcher analyzed the content of the notes as well as the candidates’ performances. The overall results revealed that the relationship is complex, depending upon the note-taking strategy and the task. She found positive correlations between the number of total notations and task performance. Likewise, Ching Ko (2007) in his study with fifteen university EFL students tried to explore test-takers’ perceptions of note-taking and analyze the effect of note-taking on students’ foreign language listening comprehension. The findings indicated that taking notes did not distract students from their listening process; but rather, it helped them pay more attention to the text. He concluded that with the help of note-taking, students can improve their listening performance through both enhancing recall and paying more attention to the listening text. The above brief literature on two variables of interest in this study (item modality and note-taking) reveals that although these two variables are among those important test method facets that have the potential to affect listening performance in exam contexts, little research exists to indicate the role item modality and note-taking plays in test-taking, and the small body of published research does not point to a uniform direction. In order to contribute to the existing literature in this important area of language testing, this study was planned to further our understanding of the links between item modality, note-taking, and performance in listening tests. PURPOSE OF THE STUDY The main purpose of the current research was to assess students’ ability to comprehend spoken language as it would typically occur in an academic setting. In other words, the study sought to find the effects of the modality of multiple choice items (oral versus written modality) and note-taking (whether it is allowed or not) on the performance of upper-intermediate EFL learners in taking listening tests. More specifically the following research questions were posed for further scrutiny: 1. Does item modality (written vs. oral) have any significant effect on the listening performance of Iranian upper-intermediate EFL test-takers? 2. Does note-taking have any significant effect on the listening performance of Iranian upper-intermediate EFL test-takers? METHOD Participants A total number of 66 upper-intermediate EFL learners (31 males and 35 females) within the age range of 18 to 25 took an institutional version of PBT TOEFL, from among whom no one was excluded as an outlier (since they all enjoyed a similar proficiency level, and their scores ranged between 62 to 85 out of 100). They were all upper-intermediate language learners who were taking English language courses in Shukuh-e-Iran language school; and having attended English classes for the last three years, they had relatively high levels of English proficiency, including listening. They participants attended the same course (in different classes for males and females) and the institute placed them at the same level, confirming their homogeneity as revealed by TOEFL scores. Instrumentation The following data elicitation tools were employed to measure participants' listening performance under four measurement conditions discussed above (oral versus written item modality and note-taking versus no-note-taking condition). Listening Test 1 The first listening test was the listening section of an institutional PBT TOEFL. The test consisted of 20 mini-talks, each followed by a multiple choice question. The mini-talks were randomly selected from among 150 items provided in the Complete TOEFL Test section of Longman Preparation Course for the TOEFL Test by Deborah Phillips (2003) published by Pearson ESL. The items in this pack are claimed to be similar to real TOEFL in terms of content and difficulty, hence evidence for its construct validity. In order to provide data for the first research question, two versions of this test were produced: the first version with written item modality (for both the stem and the options) and the second version with item stems in the oral mode (but with the options in the written mode). K-R 21 was utilized to estimate the reliability of the test, which was estimated to be 0.75. Listening Test 2 A second test of listening (based on the same sample tests as above) was employed to provide data for the second research question. The test consisted of two long conversations and three talks. For each conversation or talk, there were four multiple choice items that the students had to answer after listening to each conversation or talk. The texts used in this test ranged in length from 100 to 150 words. These texts and questions were selected randomly from among 20 talks and 20 long conversations in Complete TOEFL Test section of Longman Preparation Course for the TOEFL Test and were assumed to be valid in content and difficulty as they represented real TOEFL items. The test was administered to the same participants as above in a different session. In administering the test, one group was not allowed to take notes, while the other group was instructed to take notes (using the note-taking sheets provided) while listening to the talks/conversations. K-R 21 was also used to estimate the test’s reliability, and the results revealed a high index of reliability of 0.79. Listening Proficiency Test In order to have a controlled level of listening proficiency and work with homogeneous participants, the Listening Section of an institutional version of TOEFL was administered at the beginning of the study. The test had 20 multiple choice items, and enjoyed a reliability index of 0.86. Data Collection Procedure The following steps were taken to conduct this study: First, a listening proficiency test was administered to all upper-intermediate EFL learners at a language school (as mentioned above) to select that all the candidates who enjoyed a homogeneous listening ability. These learners were all studying “Passages 1” book and were regarded as higher intermediate by institute standards. The results of the proficiency test revealed (see above) that students were indeed homogeneous and of similar language proficiency (in listening). Then, to provide data for the first research question, thirty three learners (16 males and 17 females) were selected randomly and took the first version of the test, that is, the test with oral item modality while the other 33 testees took the second version with items in written modality. Subsequent to this, and in another session of the treatment, the second listening test was administered to the same groups in a similar procedure where one group was allowed to take notes and the other was not. Data Analysis To analyze the elicited data, the data were entered into SPSS (Statistical Package for Social Sciences) software, PASW Statistics 18 and two separate sets of independent samples t-tests were run. RESULTS Results of the Normality Test To ensure the homogeneity of the participants, the Listening Section of an institutional version of TOEFL test was utilized as explained above. Table 1 shows the results of test of normality for the participants. Table 1. Tests of normality for the proficiency test Kolmogrove-Smirnov Shapiro-Wilk Statistics df. Sig. Statistics df. Sig. Exam Scores .14 66 .06 .93 66 .07 As it can be seen in the table above, the non-significant result (i.e., .06 which is more than .05) indicates normality which means that participants were homogeneous. Furthermore, Figure 1 presents the related box plot which shows that there were no outliers among the participants. Figure 1. Box plot for homogeneity of participants. Item Modality and Listening Comprehension After ensuring the homogeneity of the participants, an independent samples t-test was run to find the answer to the first research question by comparing the mean scores of the groups which had different item modalities in the tests. Table 2 provides the independent samples t-test statistics. Table 2. Independent samples t-test for test 1 (item modality variable) Levene’s Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference F Sig. t df Sig (2-tailed) Mean Difference Std. Error Difference Lower Lower Test Scores Equal Variances assumes 1.19 .27 8.18 64 .00 4.69 .57 3.55 5.84 Equal Variances not assumes 8.18 64 .00 4.69 .57 3.54 5.84 As it is shown in Table 2, the significance level shown by Levene’s Test is .27 which is larger than the cut-off of .05, and this means that the assumption of equal variances has not been violated. And the significance level (i.e., Sig (2-tailed) is p = .00) which is less than .05 and this indicates that there is a significant difference between the two groups in terms of item modality. Comparing the mean scores of the test-takers, it is evident that test-takers exposed to the written item modality (M = 16.64) did much better than those who experienced the oral presentation of the items (M = 11.94). In addition, using the Eta squared formula, the effect size of this independent samples test was calculated and the result (i.e., Eta squared = .51) reveals that the effect size for this test is medium. Expressed as percentages, it can be inferred that 51 percent of the variance in listening test performance is explained by item modality. All this can be interpreted to mean that the modality of test items does have a significant effect on the listening performance of Iranian upper-intermediate EFL learners. Note-taking and Listening Comprehension In order to provide an answer to research question 2, another independent-samples t-test was used to compare the mean scores of the two groups of test-takers (with and without note-taking condition). Table 3 reports the results of homogeneity of variances as well as t-test results. Since the significance level for Levene's test is less than .05 (p = .04), the assumption of homogeneity of variances is violated, so the second row is consulted for analysis of the results. Table 3. Independent samples t-test for test 2 (note-taking variable) Levene’s Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference F Sig. t Df Sig (2-tailed) Mean Difference Std. Error Difference Lower Lower Test Scores Equal Variances assumes 4.16 .04 -.34 64 .73 -.30 .88 -2.06 1.46 Equal Variances not assumes -.34 58.69 .73 -.30 .88 -2.07 1.46 As it can be seen in Table 3, the p value for the independent test is .73 which is greater than the cut-off of .05, and this reveals that there is not a significant difference between the mean scores of the two groups. The Eta squared was also calculated and showed a really small amount (i.e., Eta = .001). The mean scores of the test-takers who were allowed to take notes while listening (i.e., M = 13.94) did not prove to be statistically different from the mean score of the test-takers who did not have the chance to take notes (M = 14.24). In other words, the mean difference for the two groups was -.3 which is too small a difference for statistical significance. Surprisingly, note-taking seems to have negatively affected listening performance, to a non-significant level though. DISCUSSION This study set out with the aim of assessing the effect of written versus oral item modality, and note-taking on listening performance under test-taking conditions. The results revealed a positive effect of written item modality of listening performance and no significant effect for note-taking. These findings are elaborated further below. Listening Test Performance and Item Modality To answer the first research question, two groups of test-takers took a listening test with a different item modality (written versus oral). The results following the application of an independent samples t-test (p = .00) revealed that there is a significant difference between the mean scores of the two groups. This means that listening test performance did vary significantly according to whether test-takers had the chance to view the item stem in writing or not with the result that the test-takers who had the chance to view item stems outperformed those who received the item stems in oral format. The findings of the current study corroborate the findings of Wu (1998) in which he concluded that viewing the item stems as well as the options appeared to benefit advanced EFL test-takers. Of course, he links this benefit to advanced level of language proficiency; and the present study also confirms that written item modality also benefits upper-intermediate test-takers (who enjoy more or less advanced level proficiency). Moreover, an inspection of Yanagawa and Green's (2008) study indicates that there was an apparent difference regarding item preview format. In their study, the results indicated differences between the full question preview (written item stem) condition and answer option preview (oral item stem) condition. The research found that test-takers were able to benefit from previewing the full questions rather than just previewing the options. In other words, it seems that the cues provided in the answer options did not facilitate comprehension to the same extent as the item stems did, a finding which our study adds support for. However, the findings of the current study do not seem to support those of Sherman (1997) who found no significant effect of item stem preview on test-takers’ performance. The reason why test-takers do better when the stem of the item is revealed rather than hidden from them can be easily justified by referring to psychological aspects of the listening test. When test-takers have access to the item stems as well as the options while listening, they are psychologically more relaxed and feel more secure compared to the situation in which they do not have a visual record of the item stem and when the item stem is gone as soon as it is produced (in oral modality). Although this psychological stand is not supported by some studies (e.g., Buck, 1991; Sherman, 1997), the context in which the present study was carried out highly supports this position, since most Iranian learners are stressed when they take a test and this stress would increase if test-takers do not see the item stems on their sheets. Furthermore, the cues which are present in the stem of an item help test-takers have better understanding of the item and when these cues are presented in written modality, they are processed and retrieved more easily. Listening Test Performance and Note-taking This study was also an attempt to examine the effect of taking notes on listening test performance while test-takers listen to short talks or long conversations. Contrary to most findings of previous studies, our analysis did not detect any evidence for the effect of note-taking on test-takers’ performance in a listening test. A quick glance at Table 3 reveals that the p value of .73 suggests that allowing students to take notes had little effect on their performance compared to test-takers who were not allowed to take notes. Test-takers who did not take notes even gained slightly higher scores than those who took notes, an o
Keywords