COMPREHENSION OF A NOVEL ACCENT BY YOUNG AND ELDERLY LISTENERS
Patti Adank1,2 and Esther Janse3,4 1 School of Psychological Sciences, University of Manchester, Manchester, United Kingdom
2 Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University Nijmegen, Nijmegen, the Netherlands
3 Utrecht Institute of Linguistics, OTS, Utrecht University, Utrecht, the Netherlands
4 Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
Running head:Perceiving a novel accent by elderly listeners
Date: July 24, 2016
Address for correspondence:
Neuroscience and Aphasia Research Unit
School of Psychological Sciences
University of Manchester
M13 9PL, Manchester, UK
Phone: 0044-161-275 2693
We investigated perceptual learning of a novel accent in young and elderly listeners by testing speech-perception-thresholds (SRT) over consecutive blocks of speech materials. Participants (20 young and 30 elderly) were first presented with four blocks of Standard Dutch sentences to establish their baseline SRT. Subsequently, they heard four sentence blocks spoken by the same speaker, but who now spoke in an (artificial) novel accent of Dutch in which pronunciation of the vowels was systematically altered. We studied whether both groups show comparable effects of accent on their SRTs and comparable learning. Both were found to adapt to the novel accent, but the impact on the SRTs was considerably higher for the elderly group, indicating that they showed poorer comprehension for the novel accent. Importantly, the results indicated that the pattern of perceptual learning of the accent differed for the age groups: whereas the elderly showed minimal learning beyond the second block, the young adults do show further improvement with longer exposure. Among the elderly participants, hearing acuity predicted the SRT, as well as the effect of the novel accent on SRT. Furthermore, a measure of executive function predicted the impact of the accent on SRT. In sum, these results indicate that accentedness is more detrimental to speech understanding in elderly than in young adults. The individual difference analysis of the elderly participants’ data suggests that this may be due both to poorer hearing and decreased mental flexibility in elderly listeners.
Human speech perception is extraordinary in the sense that we are able to learn to comprehend distorted or unfamiliar speech streams. For instance, listeners can quickly learn to understand foreign-accented speech (Clarke & Garrett, 2004), noise-vocoded speech (Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995), spectrally shifted speech (Rosen, Faulkner, & Wilkinson, 1999), synthetic speech (Golomb, Peelle, & Wingfield, 2007; Greenspan, Nusbaum, & Pisoni, 1988; Pallier, Sebastián-Gallés, Dupoux, Christophe, & Mehler, 1998; Sebastián-Gallés, Dupoux, Costa, & Mehler, 2000; Wingfield, Peelle, & Grossman, 2003), and time-compressed speech (Dupoux & Green, 1997), to name a few. What is most remarkable about this process is the speed at which it occurs. Listeners generally need exposure to only a handful of sentences to improve their perception of the novel speech stream (Clarke & Garrett, 2004). This ability to adapt appears to remain stable throughout the lifetime (Golomb et al., 2007; Peelle & Wingfield, 2005). For instance, Peelle and Wingfield (2005) tested young adults and older adults’ ability to learn to understand artificially time-compressed sentences and to noise-vocoded and spectrally shifted speech. Time-compression is a method for artificially shortening the duration of an audio signal without affecting its fundamental frequency (Moulines & Charpentier, 1990). When both groups were equated for starting accuracy on a sentence-recall task, Peelle and Wingfield (2005) found that both groups learned at a similar rate and magnitude: these similarities were found both with respect to adaptation to time-compression and to the noise-vocoding manipulation. Relative to their 30% accuracy starting level, both groups showed an improvement of 10-14 percent points after exposure to 20 time-compressed sentences. Note that the speech rate differed across listener groups: the young listeners adapted to 669 words per minute, while the elderly listeners adapted to 569 words per minute. Elderly listeners have previously been found to perform less well at understanding time-compressed speech than younger listeners (Janse, 2009; Peelle & Wingfield, 2005; Wingfield, Tun, Koh, & Rosen, 1999). This overall poorer performance when processing fast speech has been linked to age-related hearing loss in elderly listeners (Gordon-Salant & Fitzgibbons, 1993, 2001) and may also be due to aging of cognitive abilities (Salthouse, 2000b). Janse (2009) compared young and elderly listeners’ processing of fast (time-compressed) speech using as task online detection of target words. Elderly listeners’ performance on this task could be predicted from their hearing acuity, from a cognitive measure reflecting their relative information-processing speed (the Digit Symbol Substitution task, or DSS), and from two measures of their reading speed.
In the present paper, we intend to further investigate the relationship between speech comprehension processes and hearing acuity and cognitive factors. Spoken language comprehension is important throughout the life span. Therefore, investigating perceptual adaptation to novel listening conditions in older adults offers the opportunity to study how perceptual learning is shaped by ‘ear’ and ‘brain’. Crucially, such an investigation may also yield fundamental insights into the mechanisms underlying the efficient adaptation in young normal-hearing adults.
To date, there are not that many studies on age-related differences in perceptual learning for speech comprehension. Apart from earlier studies on whether aging affects adaptation to temporal or spectral manipulations (Golomb et al., 2007; Peelle & Wingfield, 2005), 2007), we only know of studies addressing adaptation to speaker characteristics and amplitude fluctuations in young and older listeners and a study on ERP correlates of vowel identification (Alain & Snyder, 2008). Studies on adaptation or learning in other modalities have shown age differences in perceptual learning (Fernandez-Ruiz, Hall, Vergara, & Diaz, 2000; Gilbert & Rogers, 1996; Kennedy, Rodrigue, Head, Gunning-Dixon, & Raz, 2009 Gunning-Dixon, & Raz, 2009; Raz, Williamson, Gunning-Dixon, Head, & Acker, 2000). The latter study on the identification of fragmented pictures (Kennedy et al., 2009) investigated whether age-related decreases in perceptual priming and learning were mediated by differences in cognitive performance and regional cerebral volume. Variance in learning of perceptual skill was related to indirect influence of regional brain volume via mediating cognitive processes. In other words: decreased brain volume in the older group was associated with cognitive variables (fluid reasoning and verbal working memory) which in turn were associated with perceptual skill learning. These results confirmed earlier findings that age differences in learning are associated with differences in cognitive resources, working memory in particular (Head, Raz, Gunning-Dixon, Williamson, & Acker, 2002; Kennedy, Partridge, & Raz, 2008; Rodrigue, Kennedy, & Raz, 2005).
As said, in speech comprehension, cognitive factors in aging often go hand in hand with age-related hearing loss. Hearing loss also affects perceptual adaptation, as shown by Sommers (1997). In the present study, we are primarily interested in how listeners adapt to a naturalistic distortion of the speech signal: variations in the production of speech sounds resulting from speaking with a foreign or regional accent. Accented speech represents a variation that (elderly) listeners encounter in everyday life and that has not been studied before. In our modern-day society, due to increased mobility and increased multi-cultural influenced in the last 50 years, elderly listeners are likely to encounter others (possibly including care-givers) speaking with a foreign or regional accent. This type of variation goes beyond variation in speaker or speech rate (note that modern time-compression algorithms (Moulines & Charpentier, 1990) do not significantly affect the long-term spectral characteristics of the original speech signal).
Accented speech (regional or foreign) differs from the standard language in a number of ways. The variation in foreign-accented speech is generally assumed to arise from the interaction between the segmental and suprasegmental characteristics of a speaker’s first (L1) and second (L2) language (Best, McRoberts, & Goodell, 2001; Flege, 1991). For instance, at the segmental level, variation can occur when L2-learners produce phonetic contrasts absent in their native language, such as the /l/-/r/ distinction and the /l/-/w/ distinction for Japanese learners of American English. At the suprasegmental level, it has been demonstrated that L2-learners have difficulties producing L2-appropriate word stress (Guion, Harada, & Clark, 2004) and intonation patterns (Trofimovich & Baker, 2006). Regional accents also exhibit phonological/phonetic variation at segmental (Adank, van Hout, & Van de Velde, 2007; Clopper, Pisoni, & de Jong, 2005) and suprasegmental levels (Nolan & Grabe, 1996). In sum, accented speech may be assumed to represent phonetic and phonological variation. Furthermore, variation in foreign and regional accented speech influences speech comprehension efficiency in native listeners (Adank & Devlin, in press; Floccia, Goslin, Girard, & Konopczynski, 2006; Munro & Derwing, 1995; Rogers, Dalby, & Nishi, 2004; Van Wijngaarden, 2001). For instance, listeners show longer response times and make more errors when comprehending speech in a regional accent they are not familiar with (Floccia et al., 2006), which is aggravated in noisy listening conditions (Adank, Evans, Stuart-Smith, & Scott, 2009). However, listeners have also been shown to show more efficient speech comprehension after short-term exposure (5-15 sentences) to both foreign (Clarke & Garrett, 2004) and regionally (Maye, Aslin, & Tanenhaus, 2008) accented speech. In conclusion, we argue that accented speech represents a naturalistic type of distortion, and, in analogy with time-compressed speech, listeners may have initial difficulty understanding it, but can quickly adapt.
In the present study we investigated first whether elderly adults’ listening performance is equally affected by accented speech as younger adults’. Second, we determined whether elderly and younger listeners show a comparable rate and magnitude of perceptual learning of the accented speech. Third, we aimed to obtain more insight into the mechanisms underlying the perceptual learning process by relating elderly listeners’ comprehension of the accented speech and to relate the rate and magnitude with which they learned to comprehend the accented speech to their individual hearing acuity and to measures of cognitive function.
Relative comprehension performance of young and elderly listeners was established through an adaptive staircase procedure involving sentence comprehension in noise. In this task, participants were to repeat key words from a sentence presented in noise. Listeners were presented with blocks of sentences in the novel accent, and after each block the signal-to noise ratio was established at which listeners could still correctly repeat 50% of the key words. A decrease in SNR was used to signify perceptual learning of the accent. (See the Methods section for a detailed description of the staircase procedure.)
Listeners heard sentences in Standard Dutch and in an accent of Dutch they were unfamiliar with. This novel accent was obtained by replacing the vowels in stressed lexical positions, thus creating a non-existing - novel - accent of Dutch. It was decided to create a novel accent to avoid a confound between speaker and accent, ensuring that the listeners adapt to the accent and not (only) to the voice of the speaker. Second, using a novel accent to ensures that listeners are all equally unfamiliar with the accented speech (Adank et al., 2009; Floccia et al., 2006).
Performance on the staircase procedure was related to a measure of hearing acuity (pure-tone audiometry) and to two cognitive measures for the group of elderly listeners. The cognitive measures were a measure of information processing speed (Digit Symbol substitution test, which is part of the Wechsler Adult Intelligence Test, 2004) and the Trail Making Test (TMT), a test of visual attention and task switching. The latter test is thought to represent a measure of cognitive flexibility. We investigated whether perceptual learning of a novel accent was associated with processing speed or cognitive flexibility, or both. Kennedy et al. (2009) found that fluid reasoning tasks, in which participants had to derive a rule to solve a problem, were correlated with perceptual learning. Importantly, in Kennedy et al. (2009) the fluid reasoning tasks and the skill to be learned (fragmented picture identification) were both in the visual domain. We tried to establish whether cognitive skills tested in a non-auditory modality could be predictive of listening performance such that general, rather than modality-specific, cognitive performance can be said to underlie performance on our listening task.
If comprehension of, or perceptual learning of, accented speech is related to reduced hearing acuity and reduced cognitive flexibility, then it is expected that the measures on the staircase procedure task and hearing loss and Trail making test performance are correlated in the group of elderly listeners.
Two groups of participants, one group of younger participants (20, 5 male, mean 23.3 years, standard deviation 5 years, median 22 years, range 18-41) and a group of older participants (30, 11 male, mean 74.1 years, standard deviation 6 years, median 74.0 years, range 65-87), took part in the experiment. All were native monolingual speakers of Dutch from the Netherlands, with no history of oral or written language impairment, or neurological or psychiatric disease. The younger group was not audiometrically screened, but all stated not having any hearing problems. All participants in the younger group gave written informed consent and were paid 10 euros for their participation or received course credit. The elderly participants had contacted the researchers in response to an article in a local newspaper and received 10 euros for their participation. Their level of education was expressed on a scale from 1-5. The lowest level means that the participant had only finished primary school, the highest level meaning that one had an academic education. Mean education of the elderly was 3.6 (range 2-5, SD=1.3). The elderly participants included in this study showed varying degrees of sensorineural age-related hearing loss (see Procedure).
The stimuli used in the experiment were identical across both groups. The test stimuli set consisted of 240 sentences, 120 spoken in Standard Dutch and the same 120 sentences spoken in the novel accent. The sentences were taken from the speech reception threshold (SRT) corpus (Plomp & Mimpen, 1979a, 1979b), which has been widely used for assessing speech intelligibility (van Wijngaarden, Steeneken, & Houtgast, 2002). These sentences were recorded in both accents for a female speaker of (Standard) Dutch. She was instructed to read Dutch sentences with an adapted orthography to obtain the sentences in the novel accent. The orthography was systematically altered to elicit vowel pronunciations as listed in Table I. The novel accent was designed to merely sound different from Standard Dutch, and was not intended to mimic or replicate any existing accent of Dutch. Only vowels bearing primary or secondary lexical stress were included in the conversion of the orthography. The intended (broad) phonetic transcription using the International Phonetic Alphabet (IPA, 1999) is depicted below the Dutch examples. For example:
Standard Dutch: “De bal vloog over de schutting”
/də bɑʟ fʟoχ ofə də sχʏtɪŋ/
After conversion: “De baal flog offer de schuuttieng”
/də baʟ fʟɔχ ɔfə də sχytiŋ/
The recordings were made in a sound-attenuated booth while the sentences were presented in orthographic form on the screen of a desktop computer. The speaker was instructed to read the sentences as a declarative statement and with primary sentence stress on the first noun, as to keep the intonation pattern relatively constant across all sentences. First, all sentences in Standard Dutch were recorded, followed by those in the novel accent. Every sentence in the novel accent was repeated until it was pronounced without errors and judged by the experimenter to sound roughly as fluent as the Standard Dutch sentences. The average duration per sentence was 2.62 sec for Standard Dutch and 2.82 sec for the novel accent. The recordings were saved to hard disk directly via an Imix DSP chip plugged into the USB port of an Apple Macbook. Praat (Boersma & Weenink, 2003) was used to save all sentences into separate sound files with begin and end trimmed at zero crossings and re-sampled at 22050 Hz. Finally, every sentence was peak-normalized at 99% of its maximum amplitude and saved at 70dB (SPL).
Insert Table I about here
Pure Tone Audiometry
Hearing acuity (air conduction thresholds for pure tones) was assessed with a portable Maico ST 20 audiometer in a silent booth. Figure 1 presents the mean pure-tone thresholds (in dB HL) for the better ear at octave frequencies from 250 Hz to 8000 Hz. The sloping audiogram pattern is typical for age-related hearing loss, which particularly affects the high frequency range. Individual hearing losses were determined as the elderly participants’ pure-tone average (PTA) hearing loss over the frequencies of 1, 2, and 4 kHz in their better ear. Only one participant had hearing aids, which he was asked not to wear during the experiment. The average PTA was 25.5 dB HL (standard deviation 9.8, median 25.0, range 10-43.)
Insert Figure 1 about here
Digit Symbol Substitution Test
Scores on the Digit-Symbol Substitution test (which is part of the Wechsler Adult Intelligence Scale Test, 2004) exhibit strong correlations with measures involving processing speed (Hoyer, Stawski, Wasylyshyn, & Verhaeghen, 2004; Salthouse, 2000a). Elderly participants’ mean substitution time per symbol was 2.1 sec/symbol (SD=0.4, range 1.5-2.8). This should be corrected for motor speed (the time needed to copy a symbol), which was 1.0 sec/symbol (SD=0.2, range 0.7-1.4). The corrected coding time (substitution time minus copying time) was then 1.1 sec/symbol (SD=0.3, range 0.6-1.9). This latter score was entered as individual information processing speed.
Trail Making Test
The group of elderly participants also received the Trail Making Test (Reitan, 1958) as an index of executive control processes. The test is thought to represent a measure of cognitive flexibility (Corrigan & Hinkeldey, 1987; Gaudino, Geisler, & Squires, 1995; Reitan, 1958). In the written test, the participant is required to connect the dots of 25 consecutive targets on a sheet of paper. In version A of the test, the targets are all numbers (1-25). Processing speed may be a heavy contributor (Salthouse, 2000a) to performance on this task. In Test B, the targets are 13 numbers and 12 letters, and therefore involve shifting attention between numbers and letters (1, A, 2, B, etc.), while at the same time keeping track of where one was in the other dimension. The test has to be finished as quickly as possible and thus provides information on visual search, scanning, speed of processing, mental flexibility, and executive functions. Performance on this task was shown to be a significant predictor for performance on a target recall task where the target speaker’s speech was mixed with meaningful speech of a distracter speaker (Tun, O'Kane, & Wingfield, 2002). Mean time to complete Trails A was 48.7 sec (SD=14.4). Mean time to complete the Trails B part was 95.3 sec (SD=27.3). Switching cost (difference score between Trails A and Trails B) was therefore 46.6 sec (SD=24.1). However, a difference score that is derived from subtracting the Trails A time from the Trails B time is always greater when the participant is relatively slow to start with, such that general slowing alone will produce a greater difference between the conditions (see also (Verhaeghen & De Meersman, 1998). In order to take general slowing into account, we took ratio scores of the two Trails A and B subparts (Trails B time/Trails A time), rather than the difference score, as a measure of individual executive function.
Adaptive staircase procedure
Participants were to repeat key words from a sentence presented in noise. Listeners were presented blocks of sentences in the novel accent and after each block the signal-to noise ratio was established at which listeners could still correctly repeat 50% of the key words. A decrease in SNR was used to signify perceptual learning of the accent. The staircase procedure (Baker & Rosen, 2001) was used to establish the speech reception threshold, or SRT (Kalikow, Stevens, & Elliott, 1977; Plomp & Mimpen, 1979a) across blocks of 15 sentences. The SRT is expressed using the signal-to-noise ratio (SNR) in decibel (dB) at which listeners can repeat 50% of the key words in a sentence. The SRT has been used as a clinical measure of speech intelligibility for normal-hearing listeners and (elderly) listeners with moderate hearing loss (Chien, Tu, Shiao, Chien, Wang et al., 2008; Dubno, Dirks, & Morgan, 1984; Gelfund, Ross, & Miller, 1988; van Wijngaarden et al., 2002) and represents a naturalistic measure of listeners’ comprehension. Another advantage is that the procedure is well-suited for dealing with individual differences in listeners’ baseline performance, which may be especially pronounced when groups are heterogeneous. Earlier studies comparing comprehension in younger and elderly listeners used calibration tasks prior to their main experiments to match performance levels of the different groups (Peelle & Wingfield, 2005). It is not necessary to use pre-calibration when using SRT, as individual performance is kept constant at 50% correct by continually changing the noise level depending on the participant’s previous response and individual differences are expressed through its resulting SNR. When one individual performs the task at a lower SNR than another individual, this means that they could repeat 50% of key words at a lower SNR (i.e., with more noise added to the speech signal). A further advantage is that the task is easy to understand and does not require extensive training. Finally, a recent study evaluated task-related learning in a speech-in-noise discrimination task using the SRT. Their results showed that improvements due to task adaptation alone are small (< 1 dB for speech-shaped noise) when listeners are familiar with the accent and speaker (Cainer, James, & Rajan, 2008), and performance is thus stable throughout the procedure. Given this stability, Cainer et al. suggest that the SRT can be used to monitor perceptual learning over time. In the present experiment, the adaptive noise task was repeated four times, presenting listeners with 415 blocks of accented sentences. After each block of 15 sentences, the SRT was calculated. A decrease in SRT reflects perceptual learning. Gilbert & Rogers (1996) showed that pre-practice in a perceptual learning task was beneficial, especially for older adults. The four blocks of accented speech were therefore preceded by four blocks of speech in Standard Dutch, to exclude any task learning effects during perceptual learning.
Insets Figure 1 about here
The procedure for both Test phases (cf. Figure 2) was identical across both groups. The SRT (representing the SNR at which 50% of key words are correctly repeated) was determined using a staircase procedure consisting of a modified Levitt procedure (Baker & Rosen, 2001). The procedure started with a relatively easy stimulus at an SNR of +10dB. If this sentence was repeated correctly (i.e., >3 keywords were correctly repeated), the SNR was decreased with 8 dB to +2 dB SNR. If the participant repeated 2 keywords, the SNR stayed the same. This process was repeated until the first incorrect response (i.e., <2 keywords were correctly repeated). After the first incorrect response, the SNR increased with steps of +5dB until the next correct response. After the next correct response, the SNR decreased with steps of -2dB until the next incorrect response. At this point, each reversal (a correct response after an incorrect response, or an incorrect response after a correct response, or an incorrect or correct response following a response with two correct keywords) resulted in an upward change of 2 dB following an incorrect response, or a downward change of 2 dB following a correct response. Each block ended after presentation of 15 sentences. The SRT per block was expressed as the mean signal-to-noise ratio across all trials for which a reversal occurred.
The auditory staircase procedure was repeated eight times: four blocks of Standard Dutch and four blocks in the novel accent. For all eight blocks, participants were instructed to repeat the entire sentence in Standard Dutch, or as many words as they had heard. An experimenter immediately scored their responses for the number of correctly repeated key words. For the blocks in the novel accent, participants were instructed not to imitate the accent. They received no explicit feedback. The stimulus presentation rate was controlled by the experimenter and each sentence was presented only once. Sentences were presented in a semi-randomised order with each sentence presented only once (either in standard Dutch or in the novel accent) per participant. This sentence order was counterbalanced across the first four and the last four blocks and across participants so that every sentence occurred equally often in Standard Dutch and in the novel accent. Participants were tested individually in a sound-treated booth. The sentences were presented over headphones (Sennheiser HD477) at a comfortable sound level. The sound level was set once at a comfortable level for the younger group and once for the elderly group and this initial setting was not changed within groups. The duration of the experiment was approximately 30 minutes.
Figure 3 shows the average SRTs in dB for the two accents and the two listener groups. First, the data of the young and elderly participants were compared to investigate whether elderly adults’ listening performance is differentially affected by accented speech than younger adults’ and to compare the course of their perceptual learning.
Insert Figure 3 about here For the comparison between the young and elderly listeners, we did not enter individual background information. We only investigated the effects of the following factors on SRT in a repeated measures ANOVA: the between-subjects factor Age Group (young vs. elderly), and the within-subjects factors Accent, having two levels (standard Dutch and novel accent), and Block (with four levels). Age Group had a significant effect on SRT (F(1,48)=31.3, p<0.001): as can be seen from Figure 3, the older listeners generally needed more favourable signal-to-noise ratios for 50% accuracy sentence recognition than the young listeners. The factor Accent also significantly affected SRTs (F(1,48)=973.7, p<0.001): listeners could stand less noise when they had to identify sentences spoken in the novel accent than when they were listening to standard Dutch. There an overall effect of Block (F(3,46)=16.8, p<0.001), and there was a significant interaction between Block and Accent, indicating that performance improved more over blocks in the novel accent condition (F(3,46)=7.2, p<0.001). The overall Block effect suggests that there was some improvement in the standard Dutch condition (which may have been due to adaptation to the task or speaker), and that there was additional learning of the novel accent. The interaction between Age Group and Accent was significant as well, suggesting that the novel accent was more detrimental to speech understanding for the elderly than the young listeners (F(1,48)=29.8, p<0.001). The Age Group by Block interaction was significant (F(3,46)=3.1, p<0.05), suggesting that improvement over the blocks differed for the two age groups. More importantly, there was also a three-way interaction between Age Group, Accent and Block (F(3,46)=3.7, p<0.05). The latter interaction indicates, first, that the pattern of improvement over blocks in the novel accent condition was different for the two age groups. This is also clear from Figure 3: whereas the elderly hardly show further learning beyond the second block, the young adults do show further improvement with longer exposure. The curve of the elderly seems to be U-shaped: there is considerable learning from the first to the second block, and then performance seems to deteriorate again, possibly due to fatigue. We will come back to this in the Discussion.
In a second analysis, we only analysed the data of the elderly participants to investigate which background measures predicted performance and perceptual learning of to the novel accent. Regression analyses were performed to determine the predictive value of the background measures on the SRTs in both the standard Dutch and the novel-accent condition.
Apart from the design factor Block, we entered individual hearing acuity, the digit-symbol substitution time measure of processing speed, the Trail making test performance measure of executive function, education level, gender, and age, as background predictors of performance in each of the accent conditions. Our main question was whether any of these background measures would specifically predict how well one could understand the novel accent, or how much one would improve over Blocks. One should note that some of these background measures were correlated: age was significantly correlated with hearing loss (Pearson’s r=0.42, p<0.05), and age was also correlated with processing time (Pearson’s r=0.46, p<0.05). However, the two cognitive measures were not correlated with hearing loss. The two cognitive measures (digit-symbol substitution time and Trail performance, expressed as the ratio between TrailA and TrailB) were not correlated (r<0.1). The following background measures did not predict performance, nor did they interact with Block: Age, Gender, Educational level, and the measure of information processing speed. Table II gives an overview of three models for SRT performance in both accent conditions: the upper half of the table is on SRT performance in the standard-Dutch condition; the lower half of the table is on SRT performance in the novel-accent condition. In both accent conditions, model 0 only has the factor Block as a predictor for performance, model 1 has Block and Hearing loss; and model 2 has Block, Hearing loss, and Trail performance as predictors of performance.
Insert Table II about here
In both accent conditions, Block predicted performance, suggesting a general improvement in performance over blocks (β=-0.41 in the standard Dutch condition and β=-0.69 in the novel-accent condition, p<0.01 in both conditions). With the addition of hearing loss as a predictor (model 1), an additional 15% (in the standard Dutch condition) of the variance in SRT performance or 20% (novel-accent condition) was accounted for: the more hearing loss one had, the higher the SRT. With the addition of Trail test performance as a predictor (model 2), no additional variance was accounted for in the standard-Dutch condition. However, Trail test performance did explain a significant additional 4% of the SRT variance in the novel-accent condition. The latter relation indicated that increased relative difficulty in the executive function task predicted increased difficulty understanding the novel accent. In the novel-accent condition, a fourth model (model 3) also evaluates individual SRT (averaged over the four blocks) in the standard-Dutch condition as a predictor for novel-accent performance. When one's SRT in the standard-Dutch condition is taken into account, Hearing Loss still accounts for some additional variance (β=0.06, p=0.05), even though its predictive power is obviously reduced. Trail test performance (β=0.86, p<0.05) is not much affected, in terms of predictive power for novel-accent performance, relative to the previous model (model 2).
Note that in these regression analyses, there was also considerable improvement over blocks in the standard-Dutch condition. Figure 3 shows that this was due mainly to the elderly listeners' relatively poor performance in the very first block: their performance did not improve beyond block 1 (a separate regression model on the standard-Dutch condition data from which the first block had been excluded showed no effect of Block on SRT performance).
Importantly, we did not find that any of the background measures interacted with Block (or, more specifically, with improvement over blocks in the novel accent condition). We hypothesised that individual background information would not only predict performance, but might also predict improvement over blocks. Note that learning for the elderly participants is concentrated in the first two novel-accent blocks. We therefore constructed another subset model on the data of the first two novel-accent blocks. By zooming in on the blocks where learning occurs, we might find out which (if any) of the background measures are most important for perceptual learning.
The results of this subset model showed the following. As before, Block significantly affected performance (β=-2.63, t=-3.61, p<0.001). The same background measures as in the previous analysis showed up in this subset model. Hearing acuity had an overall effect on performance (β=0.16, t=4.22, p<0.001). Thus, the more hearing loss one had, the greater the impact of the novel accent on SRT. And, as before, Trail performance was associated with performance (β=1.72, t= 2.73, p<0.01), such that the more difficulty one had in switching between task demands on the Trail making test, the greater the impact of the novel accent on SRT. By zooming in on these two initial novel-accent blocks, the predictive power of Trail test performance has gained in importance somewhat. The latter model including Trail test performance explained 40% of the variance, whereas the same model without Trail test as a predictor explained 32% of the variance (i.e., Trail test performance now explained an additional 8% of the variance, compared to the 4% in the analysis over the four novel-accent blocks). Alternatively, if SRT in the standard-Dutch condition is entered into the model as well (as in model 3 in Table II), SRTSD significantly predicted SRT in the novel-accent condition (β=1.69, t=4.82, p<0.001). Hearing loss was then no longer significantly associated with performance, but Trail test performance was (β=1.49, t=2.79, p<0.01). However, once again, none of the background measures interacted with Block, which implies that even if we zoom in on the blocks where most of the learning occurs, we do not find direct correlates of perceptual learning.
The present study aimed first to compare comprehension of accented speech by younger and elderly listeners, second to compare adaptation to accented speech in younger and elderly listeners, and third to relate elderly listeners’ comprehension of the accented speech and the course of their adaptation to their individual hearing acuity and cognitive ability.
The results showed three important points. First, the elderly listeners had considerably more difficulty at understanding the sentences spoken in the novel accent than the younger group of listeners. Even though elderly listeners’ speech-in-noise performance was generally worse than that of the young adults, the elderly listeners were more affected by the novel accent. This can be seen in Figure 3 from the distance between two clustered bars (representing performance of the two age groups at each of the consecutive blocks): whereas the difference between the clustered bars in the standard Dutch condition was 1-2 dB, the age group difference was 2-6 dB in the novel accent blocks.
Second, the elderly listeners showed a different pattern of learning than the younger listeners. The elderly start off with considerable improvement: from the first to the second novel-accent block they even improve more in absolute terms (5.1 dB) than the young adults over the four consecutive accent blocks (3.8 dB from the first to the fourth block). Thus, the pattern of learning may be different over blocks for the two age groups, but adaptation rate was not slower and the magnitude of adaptation was not decreased for the elderly participants. These results are therefore in line with Peelle and Wingfield (2005) who found that the rate and magnitude of initial learning of time-compressed speech and of vocoded and spectrally shifted speech was similar for young and elderly listeners. The quick learning supports the idea that the ability to adapt to various new aspects of speech remains stable throughout the life span.
A further similarity to Peelle and Wingfield’s (2005) results is that older adults’ performance reached asymptote relatively early (in their study: in between 10-20 sentences) whereas the young adults still showed improvement at later trials. In our material, older adults did not improve beyond the second block (which means beyond 30 sentences as each block contained 15 sentences), whereas performance of the young adults showed a steady improvement till the last block. It is not clear whether the relatively early asymptote performance for the elderly in our study could have been due to fatigue. The adaptive-noise procedure makes listening effortful, and elderly listeners may have become tired after six blocks of effortful listening. However, the early asymptote pattern was also found in Peelle & Wingfield (2005) who had a more limited number of sentences. Evidence on perceptual adaptation in speech comprehension therefore seems to converge that it is not so much the initial adaptation process that differs between age groups, but the general impact that the speech manipulation has.
The results showed that neither PTA nor both measures of cognitive function predicted the adaptation pattern in the elderly. Our initial aim was to obtain insights into the mechanisms underlying perceptual adaptation in speech comprehension by relating adaptation to the auditory and cognitive background measures. It was expected that the rate and magnitude of adaptation would differ for the two age groups, as was found for skill learning in a number of visual modality studies (Head et al., 2002; Kennedy et al., 2009; Rodrigue et al., 2005). As young and elderly listeners showed similar rates and magnitudes of adaptation, it may not be surprising that we could not find associations between adaptation per se and the background measures. Further research may be required to elucidate how perceptual learning of novel speech conditions can be relatively unaffected by age, despite the challenges from age-related declines that older adults obviously face.
However, our results showed associations between auditory and cognitive background measures and the performance of elderly listeners on the novel accent sentences blocks. The results showed that both hearing acuity and the measure of executive function predicted an individual’s relative difficulty in understanding the sentences in the novel accent. These findings are important with respect to ‘lifelong learning’ as they elucidate how auditory and cognitive age-related factors interfere with novel speech conditions. Hearing impairment evidently interferes with identifying the speech sounds and thus with processing the peculiarities of the novel accent and the making of novel representations. Executive function is a relatively new associate of novel task performance, as earlier ‘individual difference’ studies on perceptual learning in the visual domain mainly found correlations between perceptual learning and measures of memory or fluid reasoning (Kennedy et al., 2009). Second, the latter correlations were found within the same modality, as the predictor measures and the to-be-learned skill were tested in the visual domain. Note that the cognitive measures in the present study were obtained through paper-and-pencil tasks. If age-related sensory decline impacted on performance, it must have been in the visual, and not the auditory, domain. The present results therefore show that auditory and non-auditory factors can predict listening performance.
A recent aging study showed that decline in executive function (as measured by Trail making test performance) preceded decline in memory (as measured by immediate and delayed verbal recall) by about 3 years (Carlson, Xue, Zhou, & Fried, 2009). These results make Trail performance an early measure of the cognitive flexibility associated with the task of understanding a novel accent.
It has been argued (Birdsong, 2006) that there is a relationship between age-related morphological neurological changes and the decline in the efficacy of second language (L2) learning in older adults. For instance, a correlation has been found between age-related decreases in dopaminergic (DA) functioning and cognitive processes that mediate L2 learning and L2 proficiency, such as working memory, attention and processing speed (Volkow, Wang, Fowler, Ding, Gur et al., 1998). Furthermore, a relationship was found between age-related declines in cognitive functioning and changes and anatomical neural changes. Measures for working memory, attention, and speed of processing correlate with volumetric declines in the frontal lobe and prefrontal cortex (Raz et al., 2000). Following Birdsong's argument, it may be possible that these age-related functional (DA functioning) and morphological changes and associated declines in cognitive performance underlie the poorer comprehension of the novel accent in our elderly listener group.
In sum, investigating perceptual learning of novel listening conditions in young and older populations offers the opportunity to study how perceptual learning is shaped by ‘ear’ and ‘brain’. Our results add to a growing body of studies addressing how aging affects speech perception (Golomb et al., 2007; Peelle & Wingfield, 2005). Our study further confirms results from earlier studies that elderly listeners can adapt effectively to new speech types. The novelty elderly listeners had to adapt to in the present study, accented speech, is a naturalistic type of variation they may (frequently) encounter in everyday life. Finally, the present results show further evidence that both declining hearing acuity as well as poorer performance at cognitive function tasks result in poorer language comprehension in challenging or novel listening conditions.
We wish to thank Erik van den Boogert for technical assistance and Esther Aarts for lending her voice. Inge van de Sande is acknowledged for her assistance in testing the elderly participants. We thank Peter Hagoort for his useful suggestions on earlier versions of this paper. This research was supported by the Netherlands Organization for Research (NWO) under project numbers 275-75-003 (Patti Adank) and 275-75-004 (Esther Janse).
Adank, P., & Devlin, J. T. (in press). On-line plasticity in spoken sentence comprehension: Adapting to time-compressed speech. NeuroImage.
Adank, P., Evans, B. G., Stuart-Smith, J., & Scott, S. K. (2009). Familiarity with a regional accent facilitates comprehension of that accent in noise. Journal of Experimental Psychology: Human Perception and Performance, 35(2), 520-529.
Adank, P., van Hout, R., & Van de Velde, H. (2007). An acoustic description of the vowels of Northern and Southern Standard Dutch II: Regional Varieties. . Journal of the Acoustical Society of America, 121, 1130-1141.
Alain, C., & Snyder, J. S. (2008). Age-related differences in auditory evoked responses during rapid perceptual learning. Clinical Neurophysiology, 119(2), 356-366.
Baker, R. J., & Rosen, S. (2001). Evaluation of maximum-likelihood threshold estimation with tone-in-noise masking. British Journal of Audiology, 35, 43-52.
Best, C. T., McRoberts, G. W., & Goodell, E. (2001). Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. Journal of the Acoustical Society of America, 109(2), 775-794.
Birdsong, D. (2006). Age and second language acquisition and processing: a selective overview. Language and Learning, 56(S1), 9-49.
Boersma, P., & Weenink, D. (2003). Praat: doing phonetics by computer. Downloaded August 11, 2008, from http://www.fon.hum.uva.nl/praat.
Cainer, K. E., James, C., & Rajan, R. (2008). Learning speech-in-noise discrimination in adult humans. Hearing Research, 238, 155-164.
Carlson, M. C., Xue, Q. L., Zhou, J., & Fried, L. P. (2009). Executive decline and dysfunction precedes declines in memory: The women’s health and aging study II. Journal of Gerontology A: Biological sciences and medical sciences, 64(1), 110-117.
Chien, C. H., Tu, T. Y., Shiao, A. S., Chien, S. F., Wang, Y. F., et al. (2008). Prediction of the Pure-Tone Average from the Speech Reception and Auditory Brainstem Response Thresholds in a geriatric population. Journal for oto-rhino-laryngology and its related specialities, 70(6), 366-372.
Clarke, C. M., & Garrett, M. F. (2004). Rapid adaptation to foreign-accented English. The Journal of the Acoustical Society of America, 116(6), 3647-3658.
Clopper, C. G., Pisoni, D. B., & de Jong, K. (2005). Acoustic characteristics of the vowel systems of six regional varieties of American English. Journal of the Acoustical Society of America, 118(3), 1661-1676.
Corrigan, J. D., & Hinkeldey, M. S. (1987). Relationships between parts A and B of the Trail Making Test. Journal of Clinical Psychology, 43(4), 402-409.
Dubno, J. R., Dirks, D. D., & Morgan, D. E. (1984). Effects of age and mild hearing loss on speech recognition in noise. Journal of the Acoustical Society of America, 76(1), 87-96.
Dupoux, E., & Green, K. (1997). Perceptual adjustment to highly compressed speech: Effects of talker and rate changes. Journal of Experimental Psychology: Human Perception and Performance, 23(3), 914-927.
Fernandez-Ruiz, J., Hall, C., Vergara, P., & Diaz, P. (2000). Prism adaptation in normal aging: slower adaptation rate and larger aftereffect. Cognitive Brain Research, 9(3), 223-226.
Flege, J. E. (1991). Perception and production: The relevance of phonetic input to L2 phonological learning. In C. Ferguson & T. Huebner (Eds.), Crosscurrents in second language acquisition and linguistic theories. Philadelphia, PA: John Benjamins.
Floccia, C., Goslin, J., Girard, F., & Konopczynski, G. (2006). Does a regional accent perturb speech processing? Journal of Experimental Psychology: Human Perception and Performance, 32, 1276-1293.
Gaudino, E. A., Geisler, M. W., & Squires, N. K. (1995). Construct validity in the Trail Making Test: What makes Part B harder? Journal of Clinical and Experimental Neuropsychology, 17(4), 529-535.
Gelfund, S. A., Ross, L., & Miller, S. (1988). Sentence reception in noise from one versus two sources: effects of aging and hearing loss. Journal of the Acoustical Society of America, 83, 248-256.
Gilbert, D. K., & Rogers, W. A. (1996). Age-related differences in perceptual learning. Human Factors, 38(3), 417-424.
Golomb, J., Peelle, J. E., & Wingfield, A. (2007). Effects of stimulus variability and adult aging on adaptation to time-compressed speech. The Journal of the Acoustical Society of America, 121(3), 1701-1708.
Gordon-Salant, S., & Fitzgibbons, P. J. (1993). Temporal factors and speech recognition performance in young and elderly listeners. Journal of Speech and Hearing Research, 36(6), 1276-1285.
Gordon-Salant, S., & Fitzgibbons, P. J. (2001). Sources of age-related recognition difficulty for time-compressed speech. Journal of Speech, Hearing and Language Research, 44(4), 709-719.
Greenspan, S. L., Nusbaum, H. C., & Pisoni, D. P. (1988). Perceptual learning of synthetic speech produced by rule. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 421-433.
Guion, S. G., Harada, T., & Clark, J. J. (2004). Early and late Spanish-English bilinguals' acquisition of English word stress patterns. Bilingualism: Language and Cognition, 7, 207-226.
Head, D., Raz, N., Gunning-Dixon, F., Williamson, A., & Acker, J. D. (2002). Age-related differences in the course of cognitive skill acquisition: The role of regional cortical shrinkage and cognitive resources. Psychology and Aging, 17, 72-84.
Hoyer, W. J., Stawski, R. S., Wasylyshyn, C., & Verhaeghen, P. (2004). Adult age and digit symbol substitution performance: A meta-analysis. Psychology and Aging, 19, 211-214.
IPA. (1999). Handbook of the International Phonetic Association : A Guide to the Use of the International Phonetic Alphabet. Cambridge: Cambridge University Press.
Janse, E. (2009). Processing of fast speech by elderly listeners. Journal of the Acoustical Society of America, 125(4), 2361-2373.
Kalikow, D. N., Stevens, K. N., & Elliott, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictatbility. Journal of the Acoustical Society of America, 61(5), 1337-1351.
Kennedy, K. M., Partridge, T., & Raz, N. (2008). Age-related differences in acquisition of perceptual-motor skills: Working memory as a mediator. Aging, Neuropsychology and Cognition, 15, 165-183.
Kennedy, K. M., Rodrigue, K. M., Head, D., Gunning-Dixon, F., & Raz, N. (2009). Neuroanatomical and Cognitive Mediators of Age-Related Differences in Perceptual Priming and Learning. Neuropsychology, 23(4), 476-491.
Maye, J., Aslin, R. N., & Tanenhaus, M. (2008). The weckud wetch of the wast: lexical adaptation to a novel accent. Cognitive Science, 32, 543-562.
Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9(5-6), 453-467.
Munro, M. J., & Derwing, T. M. (1995). Foreign accent comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45, 73-97.
Nolan, F., & Grabe, E. (1996). Preparing a voice line-up. Forensic Linguistics,, 3(1), 74-94.
Pallier, C., Sebastián-Gallés, N., Dupoux, E., Christophe, A., & Mehler, J. (1998). Perceptual adjustment to time-compressed speech: A cross-linguistic study. Memory and Cognition, 26, 844-851.
Peelle, J. E., & Wingfield, A. (2005). Dissociations in perceptual learning revealed by adult age differences in adaptation to time-compressed speech. Journal of Experimental Psychology: Human Perception and Performance, 31(6), 1315-1330.
Plomp, R., & Mimpen, A. M. (1979a). Improving the reliability of testing the speech reception threshold for sentences in quiet for sentences. Audiology, 18, 42-53.
Plomp, R., & Mimpen, A. M. (1979b). Speech reception threshold for sentences as a function of age and noise level. Journal of the Acoustical Society of America, 66(5), 1333-1342.
Raz, N., Williamson, A., Gunning-Dixon, F., Head, D., & Acker, J. D. (2000). Neuroanatomical and cognitive correlates of adult age differences in acquisition of a perceptual-motor skill. Microscopy Research and Technique, 51, 85-93.
Reitan, R. M. (1958). Validity of the Trail Making test as an indicator of organic brain damage. Perceptual and Motor Skills, 8, 271-276.
Rodrigue, K. M., Kennedy, K. M., & Raz, N. (2005). Aging and longitudinal change in perceptual-motor skill acquisition in healthy adults. Journals of Gerontology: Psychological Sciences, 60, 174-181.
Rogers, C. L., Dalby, J., & Nishi, K. (2004). Effects of noise and proficiency level on intelligibility of Chinese-accented English. Language and Speech, 47, 139-154.
Rosen, S., Faulkner, A., & Wilkinson, L. (1999). Adaptation by normal listeners to upward spectral shifts of speech: Implications for cochlear implants. Journal of the Acoustical Society of America, 106(6), 3629-3636.
Salthouse, T. A. (2000a). Aging and measures of processing speed. Biological Psychology, 54(1-3), 35-54.
Salthouse, T. A. (2000b). Steps toward the explanation of adult age differences in cognition In T. Perfect & E. Maylor (Eds.), Theoretical Debate in Cognitive Aging. London: Oxford University Press.
Sebastián-Gallés, N., Dupoux, E., Costa, A., & Mehler, J. (2000). Adaptation to time-compressed speech: Phonological determinants. Perception & Psychophysics, 62, 834-842.
Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270, 303-304.
Sommers, M. S. (1997). Stimulus variability and spoken word recognition. II. The effects of age and hearing impairment. Journal of the Acoustical Society of America, 101(4), 2278-2788.
Trofimovich, P., & Baker, W. (2006). Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition, 28(1-30).
Tun, P. A., O'Kane, G., & Wingfield, A. (2002). Distraction by competing speech in young and older adult listeners. Psychology and Aging, 17(3), 453-476.
Van Wijngaarden, S. J. (2001). Intelligibility of native and non-native Dutch speech. Speech Communication, 35(103-113).
van Wijngaarden, S. J., Steeneken, H. J., & Houtgast, T. (2002). Quantifying the intelligibility of speech in noise for non-native talkers. Journal of the Acoustical Society of America, 112(6), 3004-3013.
Verhaeghen, P., & De Meersman, L. (1998). Aging and the Stroop effect: A meta-analysis. Psychology and Aging, 13, 120-126.
Volkow, N. D., Wang, G.-J., Fowler, J. S., Ding, Y.-S., Gur, R., et al. (1998). Parallel loss of pre and postsynaptic dopamine markers in normal aging. Annals of Neurology, 44, 143-147.
Wingfield, A., Peelle, J. E., & Grossman, M. (2003). Speech rate and syntactic complexity as multiplicative factors in speech comprehension by young and older adults. Aging, Neuropsychology and Cognition, 10(4), 310-322.
Wingfield, A., Tun, P. A., Koh, C. K., & Rosen, M. J. (1999). Regaining lost time: adult aging and the effect of time restoration on recall of time-compressed speech. Psychology and Aging, 14(3), 380-389.
Table I. Intended vowel conversions for obtaining the novel accent. The left column shows the altered orthography in the Standard Dutch sentences, and the right column shows the intended change in pronunciation of the vowel in broad phonetic transcription, using the International Phonetic Alphabet (IPA, 1999).
Table II. Results of the three regression analyses on SRT performance in the Standard Dutch (SD) and novel accent (NA) conditions. For the models with individual predictors (models 1 and 2), the additional variance explained is indicated, relative to the previous, simpler, model. Significance: ***p<0.001; **p<0.01; *p<0.05