ORIGINAL RESEARCH article

The role of literal features during processing of novel verbal metaphors.

\nCamilo R. Ronderos

  • 1 Institut für Deutsche Sprache und Linguistik, Humboldt-Universität zu Berlin, Berlin, Germany
  • 2 Center for Advanced Research in Education, Institute of Education, Universidad de Chile, Santiago, Chile
  • 3 Einstein Center for Neurosciences Berlin, Berlin, Germany
  • 4 Berlin School of Mind and Brain, Berlin, Germany

When a word is used metaphorically (for example “walrus” in the sentence “The president is a walrus”), some features of that word's meaning (“very fat,” “slow-moving”) are carried across to the metaphoric interpretation while other features (“has large tusks,” “lives near the north pole”) are not. What happens to these features that relate only to the literal meaning during processing of novel metaphors? In four experiments, the present study examined the role of the feature of physical containment during processing of verbs of physical containment. That feature is used metaphorically to signify difficulty, such as “fenced in” in the sentence “the journalist's opinion was fenced in after the change in regime.” Results of a lexical decision task showed that video clips displaying a ball being trapped by a box facilitated comprehension of verbs of physical containment when the words were presented in isolation. However, when the verbs were embedded in sentences that rendered their interpretation metaphorical in a novel way, no such facilitation was found, as evidenced by two eye-tracking reading studies. We interpret this as suggesting that features that are critical for understanding the encoded meaning of verbs but are not part of the novel metaphoric interpretation are ignored during the construction of metaphorical meaning. Results and limitations of the paradigm are discussed in relation to previous findings in the literature both on metaphor comprehension and on the interaction between language comprehension and the visual world.

1. Introduction

In conversation, speakers usually use words in a way that is close to the word's conventional meaning. When this is the case, listeners are assumed to retrieve this word from their mental lexicon in order to grasp the meaning intended by the speaker. But what happens in a listener's mind when words are used in a previously unheard sense that requires a rapid integration of context in order to be understood? Such is the case of novel metaphors:

1. It was difficult for the journalist to see his opinion fenced-in after the change in regime .

In (1), a verb of physical confinement ( fenced-in ) is used to predicate over an abstract noun which does not have a physical dimension ( the journalist's opinion ), yet the intended meaning can be readily derived: The journalist is no longer allowed to speak freely. In this example, the feature of “physical confinement” is not part of the metaphor and is even incompatible with the speaker's intended meaning.

The role ascribed to features that relate only to the literal meaning of a word and are incompatible with that word's novel metaphorical meaning (henceforth “literal features”) during processing varies depending on the theoretical perspective (see Holyoak and Stamenković, 2018 for a systematic review of competing views). Some accounts see metaphor comprehension as a type of category inclusion: They claim that understanding a metaphor, such as The president is a walrus involves a contextual adjustment of the meaning of the metaphoric vehicle ( walrus ) on the basis of the dimensions provided by the metaphoric topic ( The president ). Language comprehenders thus create a new, occasion-specific category ( McGlone and Manfredi, 2001 ; Glucksberg, 2003 ; Rubio Fernandez, 2007 ; Sperber and Wilson, 2008 ), an idea inspired by Barsalou's work on ad hoc categories ( Barsalou, 1983 ). This type of meaning modulation unfolds via rapid suppression of incompatible literal features (e.g., “has large tusks,” “lives near the north pole”) and enhanced activation of only those features that are compatible with the dimensions provided by the metaphoric topic and are relevant for interpretation (e.g., “very fat,” “slow-moving”) ( Gernsbacher et al., 2001 ).

A competing set of views sees metaphor understanding as a process of indirect comparison. When encountering a metaphor, we reason analogically about the conceptual structure of both topic and vehicle in order to reach a final utterance interpretation ( Gentner and Holyoak, 1997 ; Wolff and Gentner, 2000 , 2011 ; Coulson and Oakley, 2005 ; Gentner and Bowdle, 2008 ). A necessary first step in this process is that of structural alignment: Topic and vehicle are scanned for commonalities in their structures, and only after these commonalities have been established, inferences are projected from vehicle to topic. Here, metaphor-incompatible features of the vehicle are not immediately suppressed and can only be discarded after structural alignment has been achieved ( McGlone and Manfredi, 2001 ). The “career of metaphor” hypothesis ( Bowdle and Gentner, 2005 ), an extension of the indirect comparison view, claims that there is a difference in processing between novel and conventional metaphors. For conventional metaphors, they claim that meaning is not constructed via analogical reasoning but is instead retrieved via category selection. Researchers working within the framework of category inclusion, however, have argued against this providing evidence suggesting that not conventionality but aptness (i.e., how “good” a metaphor is) determines a metaphor's processing mode, meaning that there should not be an a priori difference in processing route between novel and conventional metaphors ( Jones and Estes, 2006 ) 1 .

Several studies have dealt with whether these literal features are activated or suppressed during processing (and if so, when). As a whole, the results do not unequivocally support one or the other set of accounts (e.g., Gernsbacher et al., 2001 ; McGlone and Manfredi, 2001 ; Rubio Fernandez, 2007 ; Weiland et al., 2014 ). We argue that three common features of these studies could be improved upon when striving for consensus. Firstly, these studies restricted their investigations to sentences, such as “Some lawyers are sharks” (known in the literature as nominal metaphors), in which both metaphoric topic and vehicle are nouns and they have the surface form of a category statement. Considering that metaphors in the wild can take a wide range of morphosyntactic forms (see for example Bambini et al., 2019 ), it is problematic for theory development to consider only a small subset of metaphors.

Secondly, these studies usually make use of materials in which the relation between the metaphors and the tested literal features varies for every item. For example, two of the metaphoric items from McGlone and Manfredi (2001) (one of the most prominent studies on the role of literal features during metaphor comprehension) were some stomachs are barrels and some cats are princesses . The study examined the relationship between these sentences and the literal features captured in the sentences barrels can be wooden and cats can be siamese , respectively. Wooden and siamese are very different types of properties that require different kinds of world knowledge from a listener, and it is unclear to what extent we can meaningfully compare the relationship of each of these literal sentences to its metaphoric counterpart. It could be the case that variation in the relationship between literal features and target metaphors across experimental items is (at least partially) responsible for some of the contradictory results in the literature. A similar argument was made by Thibodeau and Durgin (2008) with regards to the difference in results of their study (facilitation effect of conventional metaphors on processing subsequent related novel metaphors) when compared to the results of Keysar et al. (2000) (no facilitation effect of conventional metaphors on processing subsequent related novel metaphors).

Finally, the majority of experiments investigating the role of literal features of a metaphor have been conducted using sentence reading times or reaction times as the dependent measures (but see Weiland et al., 2014 , for a notable exception). As a result, the timing of the activation of literal feature representations remains unclear and should be addressed with a finer-grained method. With the present set of studies we intend to make a contribution to the debate on the role of literal features during metaphor processing by improving on these three issues.

Concretely, we set out to study the role of conceptual features that are part of the encoded meaning of a verb but are incompatible with its novel metaphoric interpretation: We conducted a series of experiments investigating the role of the specific feature of physical containment during processing of novel verbal metaphors, such as (1). In these metaphors, the vehicle is always a verb of physical containment used to signify difficulty. This allowed us to use the same animated videos displaying physical containment as a visual representation of the same literal feature across items. We based our paradigm and hypotheses on insights coming from psycholinguistic accounts of metaphor comprehension ( Glucksberg, 2003 ; Gentner and Bowdle, 2008 ), as well as from research on metaphor production ( Sato et al., 2015 ). Crucially, we relied on the insights and on the methodology of research conducted on the interaction of (written) language processing and the visual context ( Guerra and Knoeferle, 2014 , 2017 ) to create our experimental paradigm.

The paper is structured as follows: The next two subsections provide an overview of the different views on metaphor processing and their predictions and briefly introduce the literature on the interaction between language processing and the visual world. We then present two eye-tracking during reading studies, one self-paced reading experiment and one lexical decision task, all investigating to what extent a depiction of physical containment influences the processing of novel verbal metaphors. Results are discussed in light of the background presented in section 1.

1.1. Understanding Metaphors

An issue of importance for metaphor theories is the role of the literal meaning of a metaphoric vehicle during processing: In (1), the verb fenced-in entails the concept of physical containment; its direct object is something that is not allowed to physically move. However, when we hear that the journalist's opinion has been fenced-in , the feature of a physical barrier is not part of the final interpretation. What happens to this literal feature during comprehension?

From a category inclusion perspective, the noun opinion in (1) provides the dimension of [+ abstract]. This dimension, together with the relevant utterance context, determines the interpretation of the verb: relevant features are selected while irrelevant ones are actively discarded. Evidence for this view comes from priming experiments. Gernsbacher et al. (2001) showed participants either a metaphoric or a literal sentence as a prime ( That defense lawyer is a shark or That large hammerhead is a shark ) and then asked them to perform a verification task on a sentence describing a feature of the vehicle that was irrelevant or relevant for the construction of the metaphoric meaning ( sharks are good swimmers or sharks are tenacious ). They found that, after reading metaphorical primes, participants were faster at verifying sentences describing a relevant feature for the metaphoric interpretation compared to when they read a literal prime. They also found that verifying sentences about a metaphor-irrelevant property took longer after reading a metaphor than after reading a literal statement. They interpreted these results in terms of activation of relevant features and suppression of irrelevant ones: When the word shark is used metaphorically, features, such as “tenacious” are enhanced and features, such as “good swimmer” are inhibited.

Rubio Fernandez (2007) conducted a similar study with the key difference that the target was a single word and it was shown at varying intervals. She found that at early intervals (0 and 400 ms) irrelevant literal features were primed by the metaphor and only actively suppressed when presented 1,000 ms after the prime. McGlone and Manfredi (2001) deployed a reversed version of this paradigm and showed participants irrelevant or relevant features as primes and then metaphorical sentences as targets. They found that relevant features facilitated whereas irrelevant features hindered comprehension compared to a baseline condition without a prime, suggesting that irrelevant properties are suppressed early on during processing. Weiland et al. (2014) created an ERP version of this paradigm: they showed participants a masked prime consisting of a word representing an irrelevant feature ( furry ) of a metaphor ( my lawyer is a hyena ) followed by the metaphor itself. They found that the N400 effect (computed as the difference in stimulus-related average electrical responses between the metaphor and a literal equivalent) was reduced when participants saw the irrelevant prime compared to when they did not see any prime at all, suggesting that irrelevant features can indeed ease comprehension of a metaphor, a result which is in conflict with that of McGlone and Manfredi (2001) .

From the perspective of indirect comparison, on the other hand, the activation of relevant and irrelevant features of the vehicle are not contingent upon dimensions provided by the topic. Gentner and Holyoak (1997) , Gentner et al. (2001) , Bowdle and Gentner (2005) , Gentner and Bowdle (2008) have argued that, during initial stages of comprehension, the elements of a novel metaphor are scanned for structural similarities: listeners reason analogically about the relationship between vehicle and topic. This requires irrelevant features of the vehicle to be initially activated and only suppressed or ignored during later stages, once structural alignment has already taken place ( Gentner and Bowdle, 2008 ). This view is compatible with the findings of Weiland et al. (2014) but incompatible with those of McGlone and Manfredi (2001) . According to the indirect comparison view, it is also likely that literal features remain active after a metaphor has been understood, because the pattern of structural mappings between topic and vehicle can be used for subsequent processing, as has been shown to be the case for extended metaphors. For these, words belonging to the same semantic domain are used to “extend” a metaphoric expression beyond a single topic-vehicle pairing, as in the famous lines from Shakespeare's As you like it : “All the world's a stage and all the men and women merely players; they have their exits and their entrances, and one man in his time plays many parts.” Support for this view comes from priming paradigms, where it has been shown that novel metaphors facilitate processing of subsequent novel metaphors that share the same conceptual mappings between domains ( Keysar et al., 2000 ) and even that conventional metaphors can prime subsequent related novel metaphors ( Thibodeau and Durgin, 2008 ).

Findings on extended metaphors are somewhat challenging to account for from the perspective of category inclusion, which seems to posit that metaphor comprehension occurs only locally: If the meaning of the metaphoric vehicle is altered so that irrelevant literal features are suppressed, how can these features be re-activated to prime subsequent related metaphors? One answer, coming from within Relevance Theory, is given by Carston (2010) . She claims that, in an extended metaphor, the multiple related words that are semantically associated are mutually reinforcing, resulting in an enhanced activation of the literal meaning (which she calls the “lingering” of the literal meaning). This can lead to the entire literal meaning of the extended metaphor to be meta-represented and considered as a sort of “imaginary world,” where the individual metaphors are understood literally. This activates a second processing route for extended metaphors where metaphoric meaning is only derived in later stages of processing ( Rubio-Fernández et al., 2016 ).

Regarding the activation of literal features, the difference between indirect comparison views and Carston (2010) seems to be that Carston (2010) might predict a facilitation effect of metaphors on subsequent related metaphors based on semantic reinforcement of related words, whereas Gentner et al. (2001) predicts a general activation of structural mapping patterns after any metaphor has been activated. In other words, the indirect comparison view predicts that literal features of a metaphor remain active after a metaphor is understood because these are part of a complex network of mappings between the encoded meanings of the metaphoric topic and the metaphoric vehicle. The category inclusion view of Carston (2010) , on the other hand, predicts activation of the encoded semantic features of a metaphoric vehicle (i.e., “lingering” of the literal meaning), not of a network of systematic mappings.

In short, there is a lack of consensus in the literature on the timing of suppression and activation of literal features during and after metaphor comprehension: whereas, it has been suggested that irrelevant literal features hinder processing ( McGlone and Manfredi, 2001 ) and are immediately suppressed after comprehension ( Gernsbacher et al., 2001 ), others claim that literal features can ease subsequent processing of a metaphor ( Weiland et al., 2014 ), remain active for at least 400 ms after processing ( Rubio Fernandez, 2007 ), and even facilitate processing of subsequent related metaphors ( Thibodeau and Durgin, 2008 ). It is therefore important to seek out more evidence in this debate since it has repercussions for theory development, as highlighted above.

It is crucial to note that the activation of literal features during metaphor comprehension could be affected by item-specific factors, such as a metaphor's conventionality (i.e., the subjective frequency of exposure to a specific vehicle in its metaphoric meaning) (e.g., Blasko and Connine, 1993 ; Wolff and Gentner, 2000 ; Bowdle and Gentner, 2005 ). It could, for instance, be the case that literal features facilitate access for novel metaphors and hinder comprehension for more conventional metaphors (in line with the “career of metaphor” hypothesis, Bowdle and Gentner, 2005 ). McGlone and Manfredi (2001) , for example, found that the metaphors in their study that were rated as less conventional displayed less interference from irrelevant literal features during processing compared to the metaphors that were rated as more conventional. The effect was nevertheless one of interference, for both novel and conventional metaphors separately (−8 and −143 ms, respectively) and when taken as a whole. Weiland et al. (2014) , on the other hand, controlled for conventionality by selecting only metaphors that were rated as being halfway between highly novel and highly conventional for their experiments, and did not report any mediating effect of conventionality. Gernsbacher et al. (2001) operationalized conventionality as the percentage of comprehension errors for each of the metaphors in their study. They found that it did not correlate with the effect size of each of the items in their experiment, suggesting that conventionality did not modulate the way that literal features were suppressed after metaphor comprehension. Finally, Rubio Fernandez (2007) did not report having controlled for conventionality. This specific literature therefore does not strongly suggest that conventionality mediates the role of literal features during metaphor comprehension. Furthermore, it is still an open question whether conventionality actually modulates processing, or whether it only appears to do so because it tends to be correlated with aptness (i.e., the degree to which the figurative meaning of the vehicle captures relevant properties of the topic, or how “good” the metaphor is), which has been claimed by some to be the true underlying factor that mediates metaphor processing ( Jones and Estes, 2006 ; Glucksberg, 2008 ). An investigation on the effect of conventionality on the activation of literal features is beyond the scope of our current investigation, which focuses on the processing of novel metaphors exclusively.

Specifically, our contribution to the debate on the activation of literal features is to examine the effect of pre-activating said features on the processing of subsequent verbal metaphors, which, unlike nominal metaphors, have been largely overlooked in the literature. We do this by showing participants short animated clips of the literal feature of containment prior to participants reading verbal metaphors that entail this feature as part of their literal meaning. Our study makes use of eye-tracking during reading and draws its inspiration from research on the relation between visual attention and language production and processing. We will now turn to a brief overview of this specific research field.

1.2. The Interaction of Visual and Linguistic Information During Sentence Processing

Given the lack of converging evidence coming from the studies described above, we turned to neighboring disciplines for inspiration. One possibility is to draw from research on language-vision interactions (see Knoeferle and Guerra, 2016 for an introduction to this field). The seminal work of Cooper (1974) showed that there is a close temporal adjacency between language understanding and the processing of visual stimuli. In the study, participants heard stories while simultaneously being presented with images of potential referents while their eye movements were monitored, something that years later came to be known as the Visual World Paradigm (for a review, see Huettig et al., 2011 ). The results of this study showed that participants looked at the visual representations of objects immediately after they were mentioned in a story, highlighting the rapid and automatic way in which language and visual processes interact.

Through eye-tracking technology it has also been shown that the processing of visual stimuli interacts with the processing of written abstract language. Guerra and Knoeferle (2014) showed participants a video of two playing cards that either moved closer together or further apart. Participants then read German sentences that dealt with semantic dissimilarity, such as Frieden und Krieg sind bestimmt verschieden (“Peace and war are certainly different”) or similarity, such as Kampf und Krieg sind freilich entsprechend (“Battle and war are certainly similar”). Their results showed that when the motion of the cards was conceptually aligned with the direction of the semantic relation (close~similar; far~different), participants were faster at reading the second of the presented nouns (Experiment 3) as well as the adjective (Experiments 1 and 2) than when there was no such conceptual alignment. The result was interpreted as evidence for an abstract co-indexing link between spatial distance and semantic similarity. One characteristic of the eye-tracking during reading method is that it allows for a rough mapping of the results onto different stages of language processing ( Clifton et al., 2007 ; see Vasishth et al., 2013 , for a counterpoint). The fact that Guerra and Knoeferle (2014) found effects in first-pass reading times (considered a measure of early stages of processing) can be interpreted as a sign of the early and rapid integration of language processing and the visual context.

It's important to note that Guerra and Knoeferle (2014) investigated the effects of the visual context on the processing of concepts that have been retrieved from memory, such as the meaning of the words “war” and “peace.” But how does the visual world interact with processing concepts that are not retrieved from one's mental lexicon, but are instead constructed on the fly, such as novel metaphors? We might find an answer to this question if we look at how the visual world interacts with the production of metaphoric expressions. Sato et al. (2015) investigated whether showing participants images depicting spatial containment would encourage them to produce expressions in which spatial containment is used metaphorically to speak of abstract difficulty. They found that even when the sentences they produced were thematically unrelated to the images viewed, participants still produced more metaphors drawing from the domain of spatial containment than when they saw a neutral picture as prime. The authors, who work within the framework of Conceptual Metaphor Theory ( Lakoff and Johnson, 2008 ), interpreted the result as evidence for an activation of the Conceptual Metaphor DIFFICULTY IS CONTAINMENT after having seen the pictures, leading to the production of individual linguistic metaphors derived from this specific Conceptual Metaphor.

It's possible that these results could translate to language comprehension: activating the feature of spatial containment could facilitate comprehension of novel metaphors of difficulty that have spatial containment as part of their encoded meanings. This would suggest that literal features of a metaphor are important for the construction of metaphoric meaning and would be broadly in line with the indirect comparison view. With this in mind, we now turn to the description of our investigation, in which we explore the role of literal features during comprehension of novel verbal metaphors.

2. The Present Study

The current set of studies seeks answers to the following questions raised in section 2: Does activating literal features hinder or ease processing of novel metaphors? And do said features remain active after a metaphor has been understood? We conducted four experiments to answer these questions. In Experiments 1 and 2 (eye-tracking during reading), participants saw short animated clips depicting physical containment. They then read sentences in which verbs of physical containment were metaphorically used to signify difficulty [such as in sentence (1)], and then answered questions about either the sentences or the videos. The animated clips showed a moving ball: In one video, the ball bounces freely while in the other the ball is trapped by a box.

The goal of these two experiments was to study how seeing a video depicting physical containment—which we assume to be a prominent feature of the encoded meaning of the verbs used in all our sentences, yet incompatible with the meaning of the individual metaphors—interacts with the processing of verbs of spatial containment used metaphorically. We compared this to how the same sentences are processed after seeing a video clip that does not share the conceptual feature of containment with the verbs. In these two experiments participants also answered questions about what they saw in the video after reading the sentence. This should provide insight on the role that literal features might play after a metaphor has been understood.

In Experiment 3 (self-paced reading), we examined how participants would naturally answer the same questions asked in experiments 1 and 2 (after sentence comprehension) when the video clips are followed by literal sentences instead of metaphors. Doing this gives us a baseline measure to interpret the results of the question-answering times of Experiments 1 and 2.

Finally, Experiment 4 (lexical-decision task) investigated how the same video clips of Experiments 1–3 interact with the processing of spatial containment verbs from Experiments 1 and 2 when these verbs are read in the absence of a context (i.e., when participants are expected to retrieve the literal meaning only).

3. Experiment 1

We began our investigation by asking the following question: Will watching video clips of spatial containment facilitate or hinder comprehension of metaphors made up by verbs of spatial containment? Additionally, how will the activation of spatial containment interact with processing the metaphorically used verbs after the metaphors have been understood? Experiment 1, an eye-tracking during reading study, was designed to answer these questions.

3.1. Participants

Forty-eight monolingual university students who were native speakers of German (ages 18–31, 30 female) were recruited and tested at the Humboldt-Universität zu Berlin. All participants were right handed and had normal or corrected-to-normal vision. They all gave their written informed consent and were payed 8 euros upon completing the experiment. This study was covered by the ethics vote granted to the psycholinguistics lab of the Humboldt-Universität zu Berlin by the German Linguistic Society (Deutsche Gesellschaft für Sprachwissenschaft, DGfS).

3.2. Materials and Design

We created 40 critical items consisting of German metaphorical sentences. All sentences had an identical syntactic structure, namely a main clause with an infinitive subject clause, as exemplified in (3). In the infinitive clause, a verb of physical containment, which always appeared in the same position, was used metaphorically to denote abstract difficulty. In the main clause, it was asserted that the situation described in the infinitive clause was “difficult.” All critical and filler sentences can be found in the Supplementary Material .

(3) Es war für den Redakteur /schwierig ADJ /, seine / Meinung TARGET NOUN / nach dem Regimewechsel / umgittert VERB / zu sehen .

It was for the journalist/ difficult ADJ /, his / opinion TARGET NOUN / after the change in regime/ fenced-in VERB / to see

“It was difficult for the journalist to see his opinion be fenced-in after the change in regime.”

3.2.1. Sentence Norming

Our goal when creating the materials was to use metaphors that were novel yet readily understandable. To make sure the metaphors could be understood, we conducted a norming study of the target sentences. A sample of 15 participants, who did not participate in the main study, were asked to rate 80 sentences on a scale of 1–7, with 1 being totally incomprehensible and 7 being totally comprehensible. The 80 sentences were made up of the critical 40 metaphoric sentences and 40 semantically incoherent filler sentences (e.g., It was sad that Thomas drank the car so fast ). Order of presentation of the sentences was randomized. The goal of the norming task was to establish whether any of the critical metaphorical sentences would be rated as incomprehensible (meaning a rating of 3.5 or lower) and whether the metaphorical sentences were rated significantly higher than the semantically incoherent sentences.

3.2.2. Results of the Norming Task

Four of the forty critical sentences were rated lower than 3.5 on average and were dropped from the investigation. The remaining 36 sentences formed the base for all subsequent experiments.

To determine whether these 36 sentences were in fact understood, an ordered logistic regression model was fitted to the data ( Gelman and Hill, 2006 ). The model was constructed to see whether our critical items and the semantically incoherent fillers could predict the 1–7 ratings. The results show that a change from level 0 (semantically incoherent) to level 1 (critical item) was associated with an increase of odds ratio of 7.96 (t = 17.5, p < 0.001) This means that for metaphorical sentences, the odds of being rated higher were 7.96 times those of incoherent sentences, holding constant all other variables. The data therefore strongly suggests that participants were able to determine a difference in meaning between the semantically incoherent sentences and the novel metaphoric sentences.

Finally, to confirm that the resulting 36 sentences were in fact perceived as novel, we asked a further 50 participants (who did not take part in the main experiment) to rate how familiar they thought the metaphoric sentences were on a scale from 1 (very novel) to 100 (very familiar). The mean familiarity score was 27.98 with a standard deviation of 10.2. We take this as confirmation that the metaphors created were indeed perceived to be fairly novel.

3.2.3. Filler Sentences

Seventy-two filler sentences were constructed to reduce the likelihood of strategic behavior and to mask the purpose of our investigation. We thus had 24 German idioms as fillers with similar syntactic structure to our critical items, as well as 24 novel metaphors different to the critical items. The remaining 24 filler sentences were literal statements.

3.2.4. Visual Primes

Two critical videos were created by animating individually created images with proprietary video editing software. Each video showed a ball bouncing with identical motion: In one of them (used in the “match” conditions) the ball was seen to be captured by a moving box, forcing the ball to a still stand. In the other (used in the “mismatch” conditions), the ball bounces freely and stops on its own. Figure 1 shows a series of stills for each of the videos. The videos themselves can be seen in full length in the Supplementary Material .

www.frontiersin.org

Figure 1 . Stills from the video used in the “match” and “mismatch” conditions of experiments 1–4. (A) Visual prime—Containment. (B) Visual prime—Non-containment.

Furthermore, inspired by Experiment 1 of Guerra and Knoeferle (2014) , two versions of each video were created: One with a printed word from each critical sentence on the ball and one without any printed word. Participants thus saw, for example, a video of a box trapping a ball (or a ball bouncing freely) that had the word opinion written on it, and subsequently read sentence (3), in which an “opinion” is said to be fenced in . This was done to maximize the possibility that participants would establish a relation between the visual context and the written sentence.

For the filler trials, four other animated videos were created that were randomly paired with the 72 filler sentences. To prevent participants from identifying the critical videos, the filler videos presented the same objects as the critical ones, i.e., a combination of bouncing balls and boxes. In the filler videos a box lands next to a bouncing ball without trapping it (filler video 1); two balls cross each other diagonally and bounce toward each other (filler video 2) or away from each other (filler video 3); and two balls fall on top of a box but only one of the balls goes in the box (filler video 4).

3.2.5. Comprehension Questions

To investigate the role of literal features after a metaphor has been comprehended, we included a comprehension question after every trial. For critical trials, the question was always about the video, either (a) referring to the ball ( Was the ball in the box? ) or (b) to the metaphoric topic that may or may not have appeared written on the ball in the video ( Was the opinion in the box? ). Trials with incorrect answers were discarded from the analysis.

The idea of having these two different questions was that they might allow us to investigate different ways in which literal features could be activated after metaphor comprehension: It could be the case that literal features are simply activated because they are seen in the video and mentioned in the sentence, in which case question (a) should be easier to respond to when the video-prime seen prior to the metaphor activates the literal feature of containment. This would be compatible with indirect comparison views and with Carston's (2010) “lingering” of the literal meaning view. Alternatively, literal features could remain activated because they are part of a network of systematic mappings between topic and vehicle established during structural alignment, as suggested by Gentner and Boronat (1992) , Gentner et al. (2001) , and Thibodeau and Durgin (2008) . This would result in a facilitation effect when answering question (b), considering that it suggests a parallel in structure between video and sentence by effectively “blending” together both representations. Finally, it could be the case that literal features are always suppressed after metaphor comprehension, in which case neither type of question should be easier to answer when the video activates the literal feature of containment compared to when the video does not activate it. This would be compatible with the category inclusion view ( Glucksberg, 2008 ). We return to these positions and how they relate to the experimental design when discussing the results of the question-response times.

3.3. Design

Experiment 1 had a 2 × 2 × 2 Latin square design with three factors: “containment” (match vs. mismatch), “question type” (video-question vs. noun-question), and “prime type” (animation-prime vs. mixed-prime). “Containment” refers to whether the video showed the ball bouncing freely (mismatch conditions) or being trapped by a box (match conditions) (see Figure 1 ). “Question type” refers to whether the comprehension question inquired about the video (video-question conditions) or about the metaphoric topic [ the opinion in (3)] (noun-question conditions). Finally, “prime type” refers to whether the metaphoric topic was written on the ball (mixed-prime conditions) in the video prime or whether the video prime had no written language in it (animation-prime conditions).

We calculated three eye-tracking measures commonly associated with different temporal processing stages (see Rayner, 1998 , 2009 ) for our three regions of interest (i.e., the adjective, the noun, and the verb region): First-pass reading times, defined as the duration of all fixations made in a region until the first time the region is abandoned either to a subsequent or to a prior word; regression path duration, defined as the duration of all fixations from the first fixation in a region up to (but excluding) the first fixation to the right of this region (but including the duration of all fixations made to the left of the critical region after the first fixation in the critical region); and total reading times, defined as the sum of the duration of all fixations in a critical region. These three measures were chosen since they can provide insight about the point in time in which effects might arise: If effects are found in first-pass reading times, it would suggest that they occur during the earliest stages of processing. If they are visible in regression path duration, it would likely point to it being related to the way in which a region is integrated into the sentential context, whereas if they are found only in total reading times, it would suggest that such an effect might appear incrementally but only during later processing.

3.4. Predictions by Region

Our first set of predictions concerns the effect of the video on reading comprehension. We focused on three specific regions which we believed to be likely to interact with the visual prime: the adjective, noun, and verb regions.

3.4.1. Adjective Region

In Guerra and Knoeferle (2014) the authors found that visually depicted spatial distance facilitated reading comprehension of adjectives denoting abstract similarity. They reasoned that this facilitation effect might be due to an existing co-indexing link between spatial distance (close, far) and semantic distance (similar, dissimilar). They borrowed this idea from Conceptual Metaphor Theory, which hypothesizes the existence of such a link ( Lakoff and Johnson, 2008 ). This theory also posits the existence of a link between the concepts of difficulty and containment. Thus, watching videos of spatial containment might ease processing of an adjective denoting difficulty. We therefore reasoned that if there is a link between difficulty and containment similarly to that found for the case of similarity and distance, we should find a main effect of containment in the adjective region, with shorter reading times in the match vs. mismatch conditions.

3.4.2. Noun Region

By adding the word in the noun region to the video (mixed prime type conditions), we expected a clear repetition priming effect to appear when participants encountered this word in the sentence. Concretely, if participants were able to integrate the written word from the video with the subsequently read sentence, we should observe a main effect of prime type in all dependent measures, with the mixed-prime conditions being overall faster to read than the animation-prime conditions.

3.4.3. Verb Region

Our predictions for this region are derived from the debate on metaphor processing presented in section 2. We expected a facilitation effect on an early measure, such as first-pass reading times, provided that the video relates to the literal meaning of the verb. This finding would suggest that features related to the literal meaning of a verb (in this case, physical containment) are initially active even though they might be absent from the intended metaphoric meaning. This would be in line with the results of Weiland et al. (2014) , who observed that masked primes made up of irrelevant features of the metaphoric vehicle reduced the N400 effect found upon encountering the metaphoric vehicle, and would also generally support the indirect comparison view of metaphor understanding.

Alternatively, if activating the spatial representation of containment interferes with processing the metaphorically used verb, we should find longer reading times in the match vs. mismatch conditions. This would be more in line with the findings of McGlone and Manfredi (2001) and generally with category inclusion accounts that claim that literal features irrelevant for understanding the metaphor are actively suppressed during processing. Activating them should therefore interfere with the construction of metaphoric meaning.

3.5. Post-sentence Comprehension Question

A second set of predictions relates to how understanding each metaphor affects participants' response time patterns for questions related to the content of the video.

The main prediction for the response patterns to the post-comprehension questions was that if the feature of physical containment is active after participants have understood the sentence, it should be possible to find a main effect of containment on question-answering times, with overall shorter answering times in the match vs. mismatch conditions. This would suggest that the feature of containment activated in the match conditions (the ball is trapped by the box) was not suppressed after the metaphor was understood and facilitates answering both question (a) Was the ball in the box? and (b) Was the opinion in the box? If, on the other hand, the features activated by the video are suppressed after the metaphor has been understood, there should be either an interference or a null-effect of containment on response times.

However, given that there were two types of post-sentence comprehension questions, (a) and (b) above, it would be possible to observe different result patterns beyond the prediction of a main effect of Containment. Such patterns would bring about a more nuanced view on the activation of literal features after a metaphor has been comprehended, which could further inform theories of metaphor comprehension. Table 1 presents a description of all conditions for the response times.

www.frontiersin.org

Table 1 . Description of all conditions for the question-response times in Experiments 1, 2, and 3.

Of particular importance for a nuanced view on the role of literal features are the response times in the noun-question/animation-prime conditions. This is because, in these conditions, participants were asked a question that effectively “blended” the representations of video and sentence by asking whether the “opinion” was in the box when there was nothing written on the ball in the video but they had read about an opinion in the sentence.

If the feature of physical containment is activated after sentence comprehension, we would expect this feature to interfere with correctly answering the question in the noun-question/animation-prime conditions (because the correct response here would be NO and participants might want to answer YES if the feature of Containment is active), particularly in the match condition, where physical containment was seen in the video. This should in turn result in an interaction of question type and prime type, with the noun-question/animation-prime conditions showing longer reaction times than all other conditions. If the match level (of the noun-question/animation-prime conditions) is harder to respond to than the mismatch level, there should additionally be a three-way interaction between question type, prime type, and containment. If, on the other hand, the feature of physical containment is not active after participants have understood the sentence, we should expect the noun-question/animation-prime conditions to take just as long as the others, thus not resulting in a significant interaction of question type and prime type.

3.6. Procedure

Participants' eye movements were recorded using an Eyelink 1000 plus desktop head-stabilized tracker, produced by SR Research. At the beginning of each experimental session, the eye-tracker was calibrated with a 9-point calibration procedure to ensure accurate monitoring of the eyes. The procedure was performed and repeated until there was less than a maximum error of 0.5°. If it was not possible to meet this criterion, the experiment was aborted and participants were replaced. Re-calibration was performed after every block, i.e., twice more. After calibration, participants saw three practice trials before the experiment began. Each trial in the experiment consisted of three phases (see Figure 2 ): First, participants saw an animated video presented on the screen for 8 s. The video disappeared and a sentence appeared on the screen. Participants read the sentence and pressed a button on a Cedrus response pad that was in front of them when they had finished reading. The sentence then disappeared and a question appeared on the screen. Participants had to answer this question by pressing either the YES or NO button on the pad (position of YES and NO buttons was counterbalanced across participants). An entire experimental session lasted an average of around 50 min.

www.frontiersin.org

Figure 2 . Example of the progression of a trial in experiments 1–3.

3.7. Analysis and Results

3.7.1. analysis of eye-tracking data.

Prior to analysis, an intercepts-only regression model was fitted to the data in order to observe the distribution of the residuals. These were not normally distributed (which violates the assumptions of the linear model), and thus a box-cox test ( Box and Cox, 1964 ) was performed. The test showed that the reading times measures needed to be transformed using a Lambda value of −0.7, which was used for transforming all eye-tracking measures and regions. Cases in which participants gave an incorrect answer to the comprehension question were also excluded from all analyses. This procedure was followed for all subsequent experiments. Accuracy for comprehension questions in experiment 1 was above 85% in all conditions.

We analyzed all data in our experiments using the R statistical programming environment and the LME4 package for regression analysis. To test our predictions, we fitted mixed-effects linear regression models to every measure and every region. For constructing the statistical models, we followed the recommendations of Barr et al. (2013) . First, we tried fitting the largest possible random effects structure granted by our experimental design (in our case, random intercepts and slopes by items and subjects for both independent variables). If the model failed to converge, we reduced the random effects structure step-wise until a converging model was found by first removing the random correlations, then the random intercepts, followed by the interaction effects and the main effects. We used the same maximally converging random effects structure for all dependent measures in every region.

All models included trial order as a fixed effect, since it significantly improved the model fit. The models were fitted using a sum-contrast coding scheme (unless stated otherwise). Alpha thresholds for assessing statistical significance for eye-tracking data were Bonferroni-corrected, following the recommendations of von der Malsburg and Angele (2017) .

The final random effects structure used for every model is shown in Table 2 . Figures 3 – 5 show bar-plots of the results in the adjective, noun, and verb region respectively. The output of the respective statistical models can be seen in Tables 3 – 5 . Figure 6 shows the results of the post-sentence comprehension question response times.

www.frontiersin.org

Table 2 . Random effects structure for models in every experiment.

www.frontiersin.org

Figure 3 . Summary of results for the ADJ region, Experiment 1.

www.frontiersin.org

Figure 4 . Summary of results for the NOUN region, Experiment 1.

www.frontiersin.org

Figure 5 . Summary of results for the VERB region, Experiment 1.

www.frontiersin.org

Table 3 . Regression analysis of reading times in the ADJECTIVE region of Experiment 1.

www.frontiersin.org

Table 4 . Regression analysis of reading times in the NOUN region of Experiment 1.

www.frontiersin.org

Table 5 . Regression analysis of reading times in the VERB region of Experiment 1.

www.frontiersin.org

Figure 6 . Summary of results for the question response time, Experiment 1.

3.7.2. Results of Eye-Tracking, Adjective Region

No significant main effects or interactions were found in any measure for this region.

3.7.3. Results of Eye-Tracking, Noun Region

As predicted, we observed a significant main effect of prime-type in all three measures, with shorter reading times in the mixed-prime vs. animation-prime conditions. This confirms that our experimental paradigm was sensitive enough to detect identity priming effects, and that participants were actively integrating the information processed during the video with the information from the sentence.

3.7.4. Results of Eye-Tracking, Verb Region

No significant main effects or interactions of our manipulated variables were found in any measure for this region.

3.7.5. Analysis and Results of Question Response Times

A box-cox test determined that the response times needed to be log-transformed. We thus fitted a linear mixed-effects regression model to the log-transformed reaction times. This model was fitted only to correct responses, which were over 92% of all trials. The results pattern can be seen in Figure 6 and the output of the model is summarized in Table 6 .

www.frontiersin.org

Table 6 . Regression analysis of response-times in Experiment 1.

There was a main effect of question type, showing that participants were significantly slower at answering questions in the noun vs. video-question conditions. There was also a main effect of prime type, indicating that participants were faster to answer questions in the mixed-prime compared to the animation-prime condition, and a main effect of containment, showing that there was an overall facilitation in the match vs. mismatch conditions. There were also significant interactions between question type and prime type and containment and prime type, reflecting in particular that the noun-question/animation-prime conditions displayed a different pattern than all others (see Figure 6 ). The three-way interaction was not significant.

A potential response bias was discovered after running the experiment: The correct answer to the question asked was always NO in the mismatch conditions and YES in the match conditions (see Table 1 ). It is therefore not possible to tell whether the effect of containment was caused by the difference in the conditions (match vs. mismatch) or by the differences in correct answer (YES vs. NO).

The noun-question/animation-prime was the only exception to this: Here, the correct response was NO in both match and mismatch levels. Because of this, we re-fitted the statistical model for the question-response times using a treatment contrast coding scheme in order to look at the noun-question/animation-prime condition exclusively. This was important because both match and mismatch levels of this condition were the only ones where both the question (“Was the NOUN in the box?”) and the correct answer (NO) were the same. This type of contrast coding allows for direct comparisons between the condition set as the intercept of the model and the other individual conditions. This model showed no significant difference between match and mismatch levels of the noun-question/animation-prime. This model is shown in Table 12 .

3.8. Discussion

In Experiment 1, we failed to find a difference in reading times between conditions in the adjective region. More importantly, we found no differences in the verb region, the main interest region of the experiment. However, the presence of the effect of priming type in the NOUN region suggests that the absence of an effect of containment might be interpreted meaningfully: It could be the case that we did not find an effect of containment on reading times of the verb because the feature of containment is not relevant for the construction of the metaphoric meaning and it is thus ignored during processing, exerting neither facilitation nor interference. This interpretation would be broadly compatible with views that ascribe an insignificant role to features related exclusively to the encoded meaning of the metaphoric vehicle during processing.

However, it might also be possible that no effect was found given the temporal distance between presentation of the visual prime and reading of the metaphorically used verb. Perhaps this distance masked a true facilitation or interference effect that the video would have otherwise exerted on processing the verbs. This lays the groundwork for Experiment 2, in which we changed the sentence structure so that the verb could be temporally closer to the video prime.

Results from the post-sentence comprehension questions present an intricate pattern. The results showed a main effect of question type, with longer response times in the noun-question conditions than in the video-question conditions. There was a main effect of containment, with shorter response times in the match compared to the mismatch conditions in all but the noun-question/animation-prime conditions (as evidenced by the interaction effect between containment and question type).

To better understand this pattern, it is useful to think about how the results might possibly be linked to the theoretical debate on the activation of literal features following metaphor comprehension. Indirect comparison views suggest that after a metaphor is understood, literal features remain active because they are part of the network of established mappings between topic [in this case, the target noun opinion in sentence (3)] and vehicle [the verb fenced in in (3)], which can be used to reason analogically about subsequent linguistic input (see for example Gentner et al., 2001 ). If this holds, it would accommodate a facilitation effect of match vs. mismatch levels in the video-question conditions, signifying a sustained activation of the feature “containment.” It would also account for an interference effect of match vs. mismatch levels in the noun-question/animation-prime conditions, which could be explained as a sustained activation of established mappings between different conceptual domains which interferes with answering a question about an “opinion” being in the video. This is because question (a) is a reference to the video alone, requiring only information about the feature of containment in order to answer it. If the feature is active, this should result in a facilitation effect compared to when containment was not presented (i.e., the mismatch condition). Question (b), on the other hand, is a complex combination of information about the sentence (given the presence of the target noun) and the video (given the reference to the box, which could have only been seen in the video). In this case, an interference effect for answering question (b) in the match vs. mismatch conditions would suggest that not only the feature of containment has been activated (as would be the case in the video-question conditions), but also its relationship with the metaphoric topic (the target noun). This should cause difficulty when negatively answering a question about an “opinion” being in the box. Carston (2010) suggests that literal features might “linger” after a metaphor has been understood. However, her theory seems to suggest that they “linger” only as semantic features, not as part of a network of systematic associations between topic and vehicle. That being the case, it would explain a facilitation effect of match vs. mismatch video on the question-response times in the video-question/animation-prime condition, but there should not be an effect on the response times in the noun-question/animation-prime conditions.

At first glance then, the pattern of results found in Experiment 1 seems to be in line with the idea that when the conceptual feature of containment was activated by the verb, it generally facilitated responses, resulting in shorter response times in the contained vs. not-contained conditions in all but the noun-question/animation-prime conditions.

This could suggest that the feature of containment was activated after the metaphor was understood, but not as part of a complex mapping between containment and the metaphoric topic (which would have caused a difference in the noun-question/animation-prime conditions), compatible with Carston's (2010) view on the “lingering” of the literal meaning, but incompatible with the stronger view of Gentner et al. (2001) , according to which the pattern of mappings should remain available for further processing and potentially cause interference with the answering of the question.

There is, however, a simpler explanation for the current pattern of results. As mentioned in the results section, the correct responses were confounded with the match and mismatch conditions, with match conditions always requiring a YES response and mismatch conditions a NO response in all but the noun-question/no-label conditions, were the correct response was NO in both levels of containment. It is therefore likely that it was simply easier for participants to answer YES than to answer NO, explaining the main effect of containment. Additionally, the effect of question type could be due to the fact that questions in the “noun” conditions (which varied according to the target noun in every trial, 33 characters on average) were on average longer than the questions in the “video” conditions (which were always the same, i.e., Was the ball in the box? , 30 characters in German). It is possible that participants just took longer to read the questions in the noun compared to the video conditions and thus took longer to answer the question.

The only comparison not affected by these two issues was that between match and mismatch levels of the noun-question/animation-prime condition. For these two levels, the question and correct response remained the same. We found no significant difference between these two conditions. It's important to note, however, that the YES/NO confound affected only the question response times and not the eye-tracking data. We address the issue of the interpretation of question-response times in Experiment 3, where we examine the response patterns to the same questions in the absence of metaphorical verbs. For now, we turn to Experiment 2, where we attempted to replicate the pattern of reading times displayed in Figure 5 using sentences with a different syntactic structure.

4. Experiment 2

The goal of Experiment 2 was to determine the robustness of the results of Experiment 1. First, we altered the sentence structure in order to minimize the temporal distance between prime and verb. We did this because we thought it was likely that participants were not able to use the information extracted from the visual prime to facilitate processing of the metaphoric verb due to working memory constraints. This possibility finds some support in the literature on working memory, where it has been noted that people have a relatively low average number of sequentially presented meaningful units that they can remember (somewhere between 3 and 7, Miller, 1956 ; Chen and Cowan, 2005 ). We also increased the number of participants, from 48 to 64, to obtain higher statistical power. We did this following a power analysis via simulation using the R package SimR ( Green and MacLeod, 2016 ). For the power analysis, we took the model of the total reading times for the verb region as starting point. The simulations suggested that with 64 participants we would have over 80% power to detect a main effect of containment, assuming a true effect size of containment of Cohen's d = 0.15, i.e., somewhat smaller than the rule of thumb for a “small” effect size ( Sawilowsky, 2009 ). By doing this we aimed to either detect a small effect that we were not able to find in the previous experiment, or to replicate the pattern of results of Experiment 1 with more validity.

4.1. Participants

Sixty-four native speakers of German (ages 18–31, 39 female) with normal or corrected-to-normal vision were recruited and tested at the Humboldt-Universität zu Berlin. None of them had participated in Experiment 1. They gave their informed consent and received 8 euros as compensation upon finishing the experiment. Experiment 2 was covered by the ethics vote granted to the psycholinguistics lab of the Humboldt-Universität zu Berlin by the German Linguistic Society (Deutsche Gesellschaft für Sprachwissenschaft, DGfS).

4.2. Materials, Design, and Procedure

The materials, design, and procedure were identical to those in Experiment 1, except for the syntactic structure of the critical sentence, which now displayed a leftward movement of the subject clause. This allowed for the verb to appear as the fourth word in the sentence, making it temporally closer to the video prime. The structure of the sentences was as follows:

(4) Dass seine / Meinung TARGET NOUN / umgittert VERB / wurde nach dem Regimewechsel, war / schwierig ADJ / für den Redakteur .

“ That his / opinion TARGET NOUN / fenced-in VERB / was after the change in regime, was /difficult ADJ / for the journalist”

“The fact that his opinion was fenced-in after the change in regime was difficult for the journalist.”

4.3. Predictions

Our predictions were motivated by the results of Experiment 1: If the absence of an effect of containment on the verb region was due to the temporal distance between verb and video, moving the verb closer to the video should correct this. Specifically, if priming physical containment facilitates processing of verbs of spatial containment used metaphorically, we should find shorter reading times in the match vs. mismatch conditions in the VERB region.

With regards to the question-answering times: The overall facilitation effect of match vs. mismatch in Experiment 2 was confounded with the type of response (“YES” for matches and “NO” for mismatches) in all but one relevant comparison: The noun-question/animation-prime conditions. We did not find a significant difference between these two conditions. In Experiment 2 we hoped to replicate the question-answering pattern in general, and the results of the noun-question/animation-prime conditions in particular.

4.4. Results

4.4.1. eye-tracking.

Results for all regions and measures are shown in Figures 7 – 9 . The output of the statistical models can be seen in Tables 7 – 9 .

www.frontiersin.org

Figure 7 . Summary of results for the ADJ region, Experiment 2.

www.frontiersin.org

Figure 8 . Summary of results for the NOUN region, Experiment 2.

www.frontiersin.org

Figure 9 . Summary of results for the VERB region, Experiment 2.

www.frontiersin.org

Table 7 . Regression analysis of reading times in the ADJECTIVE region of Experiment 2.

www.frontiersin.org

Table 8 . Regression analysis of reading times in the NOUN region of Experiment 2.

www.frontiersin.org

Table 9 . Regression analysis of reading times in the VERB region of Experiment 2.

4.4.1.1. Adjective

No significant effects of containment or of prime type were found in this region.

4.4.1.2. Noun

We replicated the main effect of prime type on all measures, with the mixed-prime conditions showing overall shorter reading times than the animation-prime conditions. This shows that our participants were in fact relating video to sentence, leading to a reliable priming effect.

4.4.1.3. Verb

We failed to find an effect of containment on any measure, as was the case in Experiment 1. There was also no effect of prime type and no significant interaction of containment and prime type.

4.4.2. Question-Response Times

Question-response times were analyzed in the same way as in Experiment 1. As can be seen in Figure 10 , the results are very similar to those of Experiment 1. We replicated all previous findings with the exception of the main effect of containment: There was a main effect of question type and of prime type. There was an interaction between containment and question type and an interaction between question type and prime type. This model can be seen in Table 10 .

www.frontiersin.org

Figure 10 . Summary of results for the question response time, Experiment 2.

www.frontiersin.org

Table 10 . Regression analysis of response-times in Experiment 2.

As in the previous experiment, we re-fitted the model using a treatment-contrast scheme in order to directly compare match and mismatch levels of the noun-question/animation-prime condition. This model showed no significant difference between these conditions, replicating the result found in Experiment 1 (see Table 12 ).

4.5. Discussion

In Experiment 2 we tried to facilitate the interaction between video prime and metaphoric verb by increasing statistical power and decreasing the temporal distance between verb and video. We again failed to find an effect of containment in the verb region. Besides this, we replicated the effect of prime type on all measures in the noun region: Seeing the word opinion written on the ball in the video facilitated reading times of that same word once it appeared in the sentence. This confirms that participants were able to use the information presented in the video to ease processing of the noun, and were nevertheless unable to use the feature of “containment” presented in the video to speed up (or slow-down) reading times in the verb region. This suggests that during processing of the metaphoric verb, participants largely ignored the feature of physical containment, seeing as it neither interfered with nor facilitated processing. This is consistent with a category inclusion view of metaphor comprehension that states that literal features are not initially activated if they are not necessary for the construction of the appropriate ad hoc category during metaphor processing.

However, it could also be the case that the lack of effects in the verb region is caused by inadequate materials: Activating the feature of spatial containment could indeed facilitate or hinder processing, but our video primes were simply not able to activate this feature. It is thus necessary to assess whether these videos could modulate processing in an environment in which they would be expected to do so reliably, namely when the verbs are processed in their encoded, literal meaning only. If the videos facilitate access to the literal meaning of the verbs, the current interpretation of the results of Experiments 1 and 2 becomes more plausible. We addressed this issue in Experiment 4.

The results of the question response task broadly replicated the findings of Experiment 1. It was easier for participants to answer the question in the match vs. mismatch levels of the video-question conditions. In the noun-question conditions, there was an effect of prime type, with the animation-prime conditions showing slower response times than the mixed-prime conditions.

The noun-question/animation-prime conditions did not show a significant difference between match and mismatch levels, just as in Experiment 1. This finding is important because the noun-question/animation-prime conditions were the only ones without a confound between condition and correct answer. Furthermore, there was an effect of prime type in the noun-question conditions, with the “animation” conditions showing longer response times than the “mixed” conditions.

As mentioned in the discussion of Experiment 1, these results could be interpreted as meaning that when reading the sentence, the conceptual feature of containment is activated, facilitating responses in the match vs. mismatch conditions and interfering with the responses in the noun-question/animation-prime conditions.

This interpretation, however, is contingent upon the assumption that the response patterns were caused by the interaction of processing video and metaphor and not by the YES/NO response confound or by other external factors. We sought to test this assumption in Experiment 3.

5. Experiment 3

Question-response times in Experiments 1 and 2 show an overall facilitation effect for match vs. mismatch conditions, except for the noun-question/animation-prime conditions, which showed no difference between match and mismatch levels. In Experiment 3, we set out to test whether these results were caused by the interaction of video, metaphor and question, or whether they could be explained by the interaction of video and question only. To do this, we ran a version of Experiment 2 in which the sentences read by participants did not contain any metaphors whatsoever: If the same pattern of results as in the previous two experiments is visible, it would suggest that the results are not related to the processing of verbal metaphors. Since we were not interested in the reading patterns of these sentences, but only in the question-response times, Experiment 3 was not run as an eye-tracking study. Instead, it was implemented as a self-paced reading reaction time task: Participants first watched the video-prime and then read the (non-metaphoric) sentence. When they were done reading, they pushed a button in front of them and were presented with the comprehension questions, which they answered by pushing either a YES or NO button. We measured only the response times to the comprehension questions.

5.1. Participants

Sixty-four native speakers of German (ages 18–31, 34 female) with normal or corrected-to-normal vision were recruited and tested at the Humboldt-Universität zu Berlin. None of them had participated in Experiments 1, or 2. They gave their informed consent and received 8 euros as compensation after completing the experiment. Experiment 3 was covered by the ethics vote granted to the psycholinguistics lab of the Humboldt-Universität zu Berlin by the German Linguistic Society (Deutsche Gesellschaft für Sprachwissenschaft, DGfS).

5.2. Materials and Design

To construct the materials in Experiment 3, we modified the sentences from Experiment 2 by replacing the verb with a non-metaphorical one that did not have the feature of spatial containment as part of its literal meaning, as presented in (5):

(5) “ Dass seine Meinung ignoriert wurde nach dem Regimewechsel, war für den Redakteur schwierig”

“The fact that his opinion was ignored after the change in regime was difficult for the journalist”

The design was identical to that of the previous experiments, with the factors containment, question type and prime type. The experiment was programmed using the open source software Open Sesame and was run on a PC computer. The only dependent measure in this experiment was question response time.

5.3. Procedure

Participants were instructed to wear noise-reducing headphones throughout the experiment to avoid being distracted by the other participants. Each trial consisted of three phases: First, participants saw the same animated video presented in experiments 1–3. They then read a sentence and pressed the space bar on the keyboards that was in front of them. The sentence then disappeared and a question appeared on the screen. They had to answer this question by pressing either the letter F or J, which were counterbalanced across participants to stand for either YES or NO.

5.4. Predictions

Our predictions are derived from the results of Experiments 1 and 2: If we find the same pattern of results in Experiment 3 as in the previous two iterations, it would suggest that the results were not driven by the interaction of video, metaphor and question, but just by the interaction of video and question, given that there are no metaphors in Experiment 3. If we find a different pattern than this, it would suggest that the results found in Experiments 1 and 2 were (at least partially) caused by the way participants processed the verbal metaphors. In this sense, Experiment 3 serves as a baseline against which we can interpret the results of the question-response times of Experiments 1 and 2. Of particular interest are again the noun-question/animation-prime conditions: These are the only match/mismatch pair where both the question asked and the correct response remained constant.

5.5. Results

We fitted a linear mixed effects regression model to the log-transformed reaction times. We found a main effect of containment, prime type and question type. We also found significant interactions of containment and question type, containment and prime type, question type and prime type and question type, prime type and containment. The results are shown in Figure 11 and the model details are given in Table 11 .

www.frontiersin.org

Figure 11 . Summary of results for the question response time, Experiment 3.

www.frontiersin.org

Table 11 . Regression analysis of response-times in Experiment 3.

Re-fitting the model with treatment contrasts, as we did for the previous experiments, showed a significant difference between match and mismatch levels of the noun-question/animation-prime conditions, with the match condition showing significantly faster responses than the mismatch condition. The details of this model are shown in Table 12 .

www.frontiersin.org

Table 12 . Model fitted with treatment-contrast coding for response times of Experiments 1–3.

5.6. Discussion

The pattern of results is very similar to that found in Experiments 1 and 2. This suggests that the response times found in those experiments were mostly modulated by factors independent of the metaphorical verb, since there was no metaphorical verb in Experiment 3. This confirms the simple explanation that the response time results follow from a general response bias (Easier to answer YES than NO and easier to answer to shorter than to longer questions), and are not a product of metaphoric interpretation.

However, the results of the noun-question/animation-prime conditions require further explanation. In Experiment 3, the match vs. mismatch conditions were significantly different from one another, whereas in Experiments 1 and 2, no significant difference was found. It is thus likely that this difference between experiments is the only one that is contingent on the presence of the metaphorical sentences in Experiments 1 and 2: If in the absence of a metaphor there are shorter response times in the mismatch compared to the match level of the noun-question/animation-prime condition (our baseline result), then the lack of a difference between conditions in the presence of a metaphor (Experiments 1 and 2) could actually be interpreted as a facilitation effect of the match compared to the mismatch condition relative to the baseline result of Experiment 3.

This interpretation, as well as the interpretation of the results of the gaze record of Experiments 1 and 2, relies on the assumption that participants can indeed derive the conceptual feature of containment from our prime videos and that this feature interacts with the way the verbs are processed. Experiment 4 directly addresses this issue.

6. Experiment 4

In this experiment we dealt with the question of whether or not the videos used in Experiments 1–3 can activate a mental representation of containment that leads participants to process verbs of physical containment more readily than when they first see a video that does not depict containment.

6.1. Participants

A sample of 259 German native speakers (ages 18–31, 120 female) were recruited online via the platform “clickworker.” They gave their informed consent and received 50 cents as compensation upon finishing the experiment. Experiment 4 was covered by the data protection policy of the Humboldt-Universität zu Berlin.

6.2. Materials and Design

Experiment 4 was a web-based lexical decision task in which participants saw the same video clips from Experiments 1–3 as primes and then read the same verbs from Experiments 1 and 2, which were presented here without context. The experiment thus only had the factor containment with the levels match and mismatch.

6.3. Procedure

The experiment was designed and run using an instance of the IBEX farm (created by Alex Drummond) coupled with the Penncontroller extension ( Zehr and Schwarz, 2018 ), which allows for a simple integration of video and linguistic stimuli. On each trial, participants first saw a video prime and then a target word in the middle of the screen, and had a total of 5 s to decide whether the word was a real word by either pressing F (“not a real word”) or J (“real word”). After one practice item, participants were presented with six experimental trials (two critical, four fillers). There was a 1 s pause in-between trials. One experimental session lasted around 4–5 min.

6.4. Predictions

If the video in the “match” condition is not capable of eliciting a mental representation of “containment” that can aid lexical recognition of verbs of physical containment, there should be no difference in reaction times between conditions. If, on the other hand, the video in the “match” condition is indeed capable of eliciting a mental representation of “containment” that can ease lexical recognition of verbs of physical containment, we expect shorter reaction times in the match condition compared to the mismatch condition.

6.5. Analysis and Results

Prior to the analysis, participants who got <4/6 correct responses were excluded ( n = 9), leaving the total number of participants at 250. Reaction times were log-transformed following the results of a box-cox test ( Box and Cox, 1964 ).

A linear mixed effects model was then fitted to the data. The results showed a significant difference between the two conditions, with the match condition displaying shorter reaction times compared to the mismatch condition. The effect size had a value of Cohen's d = 0.21 (i.e., a “small” effect size according to Cohen, 1992 ). The results are presented in Figure 12 and the model summary in Table 13 .

www.frontiersin.org

Figure 12 . Summary of results for the lexical decision task, Experiment 4.

www.frontiersin.org

Table 13 . Regression analysis of response-times in Experiment 4.

6.6. Discussion

Experiment 4 showed that the video-clip primes used in Experiments 1–3 facilitated the retrieval of the encoded, literal meaning of different verbs of physical containment. This finding suggests that participants were able to derive the conceptual feature of physical containment from the videos in the match conditions, since this is the key feature we believe the videos share with the verbs.

7. General Discussion and Conclusion

Theories of metaphor processing make different predictions regarding the role of conceptual features related only to the literal meaning during and immediately after processing of (novel) metaphors. Category inclusion views believe that these literal features should not play a role during processing and might even hinder comprehension ( McGlone and Manfredi, 2001 ). Furthermore, they should be rapidly suppressed after the metaphor has been understood ( Gernsbacher et al., 2001 ; Rubio Fernandez, 2007 ). Indirect comparison views, instead, claim that features related to the literal meaning of a metaphor are initially active. This is caused by an alignment stage in which encoded meanings are fully retrieved prior to the projection of inferences ( Gentner et al., 2001 ; Bowdle and Gentner, 2005 , i.a.). This means that literal features should facilitate early stages of processing, as shown by Weiland et al. (2014) , and can remain active after comprehension, easing understanding of subsequent, related novel, or conventional metaphors ( Thibodeau and Durgin, 2008 ).

In our investigation, we looked at how priming the conceptual feature of spatial containment would interact with the processing of verbal metaphors in which physical containment is a crucial part of the literal meaning but (arguably) not of the metaphoric interpretation. The results of two eye-tracking experiments showed that the videos neither facilitated nor hindered processing of the verbs used (e.g., fenced-in ), regardless of whether the verb appeared early on or late in the sentence (Experiments 1 and 2). This absence of an effect was accompanied by a reliable priming effect of the noun that appeared in both video and sentence, suggesting that participants were actively integrating the input of the video with the input of the sentence. Furthermore, we showed that the videos did elicit a priming effect on those same verbs in a de-contextualized lexical decision task (Experiment 4).

Data from the question-response times showed that participants were overall faster answering questions in the match vs. mismatch conditions. They were also overall slower to answer questions about the interaction between video and sentence ( Was the opinion in the box? ) than about just the video. Since these effects were present in both the experiments with a metaphoric verb (Experiments 1 and 2) and our baseline experiment without a metaphoric verb (Experiment 3) they do not tell us much about how the metaphors interacted with video and question type during processing. However, in the absence of a metaphor (Experiment 3), participants were significantly faster at correctly answering the question in the noun-question/animation-prime mismatch condition ( Was the opinion in the box? When there was no word written on the ball and the ball bounced freely) compared to the noun-question/animation-prime match condition ( Was the opinion in the box? When there was no word written on the ball and the ball was trapped by the box). In Experiments 1 and 2, there was no difference between these conditions. This suggests that in the presence of a metaphor there could be a facilitation effect of the match compared to the mismatch noun-question/animation-prime conditions, which might mean that the metaphor itself activated the feature of spatial containment which later facilitated response times to the post-sentence questions. However, the evidence for this is very tenuous since the overall question-response pattern in all three experiments was similar.

We interpret the data as showing that the feature of physical containment is ignored during comprehension of novel verbal metaphors of containment and neither facilitates nor hinders processing. Failing to find a significant difference between conditions is not equivalent to finding that there is no difference between them. However, given the results of Experiment 4 and the fact that in Experiments 1 and 2 there was a significant effect of prime type (showing that some aspects of the prime were indeed integrated with the sentence), we believe that the absence of an effect of containment in Experiments 1 and 2 can be interpreted as meaningful.

We see this as being in line with a metaphor processing view that does not ascribe an important role to literal features of the metaphoric vehicle during initial stages of processing. Such is the case of category membership views ( Glucksberg, 2001 ; Sperber and Wilson, 2008 ), which claim that the meaning of the vehicle is quickly modulated given the dimensions provided by the topic. In this process, features of the literal meaning that are not compatible with the dimensions provided by the topic do not need to be activated. However, pre-activating these features does not interfere with the lexical modulation of the metaphoric vehicle either.

It is important to note that the goal of the current set of studies was to investigate novel verbal metaphors only. Given that other factors, such as conventionality ( Bowdle and Gentner, 2005 ), aptness ( Jones and Estes, 2006 ), and familiarity ( Thibodeau and Durgin, 2011 ) can modulate metaphor processing, it would be interesting to observe whether the current results would hold when examining metaphors that varied along those three dimensions. We leave this specific point for future research to examine.

Furthermore, it could be that metaphor processing varies according to syntactic class such that nominal metaphors are processed differently than verbal metaphors. This would mean that nominal metaphors could be understood via indirect comparison (following Gentner and Bowdle, 2008 ) and verbal metaphors via lexical modulation (as posited by category inclusion views). However, neuroimaging evidence suggests that the mechanisms for different types of metaphors might be the same. Cardillo et al. (2012) investigated processing of both nominal and verbal metaphors using functional magnetic resonance. Their results show that the neural processes associated with both of these types of metaphors do not differ significantly, suggesting that the underlying cognitive mechanisms are likely the same. We therefore believe that our results generalize beyond the case of verbal metaphors.

In terms of how our results relate to the literature on the interaction between language and the visual world we can draw the following conclusions: Guerra and Knoeferle (2014) found a facilitation effect of visual primes of distance on processing of semantic similarity. They argued that this was indicative of an abstract co-indexing link between distance and similarity. In Experiments 1 and 2 of the current investigation we failed to find such a link between videos of containment and adjectives of difficulty. It could be the case that these co-indexing links are constructed and stored in memory via repeated, conventional use: Perhaps speaking of semantic similarity in terms of distance is a more common occurrence than speaking of difficulty in terms of containment, leading to facilitation effects in the former but not in the latter case.

In a production study, Sato et al. (2015) found a priming effect of metaphors of difficulty after participants saw images of physical containment, an effect which we failed to find in the present language comprehension study. This difference in results could be explained by a difference in conventionality of the types of metaphors used: Sato et al. (2015) counted the production of spatial prepositions, such as in and out (e.g., Bobbie fell in love working in the potato factory ) and of idiomatic expressions ( Nick said time is full of shit ) as instances of a containment-as-difficulty metaphor. These types of conventional, “fossilized” metaphoric expressions are likely to be processed differently than novel metaphors ( Keysar et al., 2000 ; Bowdle and Gentner, 2005 ) making the results difficult to compare, given that the materials in our study were all novel verbal metaphors (It is not clear whether participants in the study by Sato and collaborators even produced any novel metaphors at all).

There are some caveats with our interpretation of the results: First, in Experiment 4 each participant saw only two critical items, whereas in Experiments 1 and 2 participants saw the full set of 36 items. It could therefore be the case that repeated exposure to the video primes interfered with an underlying true priming effect that our experimental set-up in Experiments 1 and 2 could not detect. To assess this possibility we conducted post-hoc analyses examining the pattern of results of Experiments 1 and 2 in the first third of the Experiment (i.e., after 36 trials). These showed the exact same pattern found for the entire experiment (i.e., no effect of video-prime on reading measures). It is thus not likely that a repetition effect is solely responsible for the differences in effect found between Experiments 1, 2, and 4.

It is also possible that the lack of an effect was due to the verbs being embedded in a sentence, regardless of whether the context encourages a literal or metaphoric interpretation of the verb. This is unlikely, considering that in Experiment 2 the Video-Prime and the verb were almost as temporally adjacent as in Experiment 4, but it cannot be ruled out completely. Further research is necessary in order to determine the exact nature of the prime-verb relation and the different contexts under which a priming effect could arise. We nevertheless see our set of experiments as a step forward in understanding how metaphors are processed outside of the narrow realm of nominal metaphors.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics Statement

The studies involving human participants were reviewed and approved by Ethics committee of the Humboldt-Universität zu Berlin. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

CR, EG, and PK planned and designed the experiments, edited the paper, and produced the final version. CR conducted the experiments and wrote the first draft of the paper. CR and EG analyzed the data. All authors contributed to the article and approved the submitted version.

This work was supported by ANID/ PIA/ Basal Funds for Centers of Excellence FB0003 (EG) and FONDECYT individual grants 11171074 (EG) by ANID (National Agency for Research and Development, Government of Chile) are gratefully acknowledged.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2020.556624/full#supplementary-material

Video 1. Example prime video for a critical item in the containment-mismatch conditions used in Experiments 1–4 (no text was written on the ball in Experiments 1 and 4).

Video 2. Example prime video for a critical item in the containment-match conditions used in Experiments 1–4 (no text was written on the ball in Experiments 1 and 4).

1. ^ Whether aptness or conventionality modulates the processing route is still a matter of debate and outside of the scope of the current investigation, which focuses on novel verbal metaphors only. For in-depth discussions on the role these factors might play during processing see Gentner and Bowdle (2008) , Glucksberg (2008) , Holyoak and Stamenković (2018) , and Pouscoulous and Dulcinati (2019) .

Bambini, V., Canal, P., Resta, D., and Grimaldi, M. (2019). Time course and neurophysiological underpinnings of metaphor in literary context. Discour. Process. 56, 77–97. doi: 10.1080/0163853X.2017.1401876

CrossRef Full Text | Google Scholar

Barr, D. J., Levy, R., Scheepers, C., and Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: keep it maximal. J. Mem. Lang. 68, 255–278. doi: 10.1016/j.jml.2012.11.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Barsalou, L. W. (1983). Ad hoc categories. Mem. Cogn. 11, 211–227. doi: 10.3758/BF03196968

Blasko, D. G., and Connine, C. M. (1993). Effects of familiarity and aptness on metaphor processing. J. Exp. Psychol. Learn. Mem. Cogn. 19, 295–308. doi: 10.1037/0278-7393.19.2.295

Bowdle, B., and Gentner, D. (2005). The career of metaphor. Psychol. Rev. 112, 193–216. doi: 10.1037/0033-295X.112.1.193

Box, G. E. P., and Cox, D. R. (1964). An analysis of transformations. J. R. Stat. Soc. B Methodol. 26, 211–252.

Google Scholar

Cardillo, E. R., Watson, C. E., Schmidt, G. L., Kranjec, A., and Chatterjee, A. (2012). From novel to familiar: tuning the brain for metaphors. Neuroimage 59, 3212–3221. doi: 10.1016/j.neuroimage.2011.11.079

Carston, R. (2010). XIII-metaphor: ad hoc concepts, literal meaning and mental images. Proc. Aristot. Soc. 110, 295–321. doi: 10.1111/j.1467-9264.2010.00288.x

Chen, Z., and Cowan, N. (2005). Chunk limits and length limits in immediate recall: a reconciliation. J. Exp. Psychol. Learn. Mem. Cogn. 31, 1235–1249. doi: 10.1037/0278-7393.31.6.1235

Clifton, C., Staub, A., and Rayner, K. (2007). “Eye movements in reading words and sentences,” in Eye Movements: A Window on Mind and Brain , eds R. van Gompel, M. H. Fischer, W. S. Murray, and R. L. Hill (Oxford: Elsevier), 341–372. doi: 10.1016/B978-008044980-7/50017-3

Cohen, J. (1992). A power primer. Psychol. Bull. 112:155.

PubMed Abstract | Google Scholar

Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: a new methodology for the real-time investigation of speech perception, memory, and language processing. Cogn. Psychol. 6, 84–107. doi: 10.1016/0010-0285(74)90005-X

Coulson, S., and Oakley, T. (2005). Blending and coded meaning: literal and figurative meaning in cognitive semantics. J. Pragmat. 37, 1510–1536. doi: 10.1016/j.pragma.2004.09.010

Gelman, A., and Hill, J. (2006). Data Analysis Using Regression And Multilevel/Hierarchical Models , Vol. 3. Cambridge: Cambridge University Press.

Gentner, D., and Boronat, C. (1992). “Metaphor as mapping,” in Workshop on Metaphor (Tel Aviv).

Gentner, D., and Bowdle, B. (2008). “Metaphor as structure-mapping,” in The Cambridge Handbook of Metaphor and Thought , ed R. Gibbs (New York, NY: Cambridge University Press), 109–128. doi: 10.1017/CBO9780511816802.008

Gentner, D., Bowdle, B., and Wolff, P. (2001). “Metaphor is like analogy,” in The Analogical Mind: Theory and Phenomena , eds D. Gentner, K. J. Holyoak, and B. Kokinov (Cambridge, MA: MIT Press), 199–253.

Gentner, D., and Holyoak, K. J. (1997). Reasoning and learning by analogy: introduction. Am. Psychol. 52, 32–34. doi: 10.1037/0003-066X.52.1.32

Gernsbacher, M. A., Keysar, B., Robertson, R. R. W., and Werner, N. K. (2001). The role of suppression and enhancement in understanding metaphors. J. Mem. Lang. 45, 433–450. doi: 10.1006/jmla.2000.2782

Glucksberg, S. (2001). Understanding Figurative Language . Oxford: Oxford University Press.

Glucksberg, S. (2003). The psycholinguistics of metaphor. Trends Cogn. Sci. 7, 92–96. doi: 10.1016/S1364-6613(02)00040-2

Glucksberg, S. (2008). “How metaphors create categories–quickly,” in The Cambridge Handbook of Metaphor and Thought , ed R. Gibbs (New York, NY: Cambridge University Press), 67–83. doi: 10.1017/CBO9780511816802.006

Green, P., and MacLeod, C. J. (2016). SIMR: an R package for power analysis of generalized linear mixed models by simulation. Methods Ecol. Evol. 7, 493–498. doi: 10.1111/2041-210X.12504

Guerra, E., and Knoeferle, P. (2014). Spatial distance effects on incremental semantic interpretation of abstract sentences: evidence from eye tracking. Cognition 133, 535–552. doi: 10.1016/j.cognition.2014.07.007

Guerra, E., and Knoeferle, P. (2017). Visually perceived spatial distance affects the interpretation of linguistically mediated social meaning during online language comprehension: an eye tracking reading study. J. Mem. Lang. 92, 43–56. doi: 10.1016/j.jml.2016.05.004

Holyoak, K. J., and Stamenković, D. (2018). Metaphor comprehension: a critical review of theories and evidence. Psychol. Bull. 144, 641–671. doi: 10.1037/bul0000145

Huettig, F., Rommers, J., and Meyer, A. S. (2011). Using the visual world paradigm to study language processing: a review and critical evaluation. Acta Psychol. 137, 151–171. doi: 10.1016/j.actpsy.2010.11.003

Jones, L., and Estes, Z. (2006). Roosters, robins, and alarm clocks: aptness and conventionality in metaphor comprehension. J. Mem. Lang. 55, 18–32. doi: 10.1016/j.jml.2006.02.004

Keysar, B., Shen, Y., Glucksberg, S., and Horton, W. S. (2000). Conventional language: how metaphorical is it? J. Mem. Lang. 43, 576–593. doi: 10.1006/jmla.2000.2711

Knoeferle, P., and Guerra, E. (2016). Visually situated language comprehension. Lang. Linguist. Compass 10, 66–82. doi: 10.1111/lnc3.12177

Lakoff, G., and Johnson, M. (2008). Metaphors We Live By . Chicago, IL: University of Chicago Press.

McGlone, M. S., and Manfredi, D. A. (2001). TopicVehicle interaction in metaphor comprehension. Mem. Cogn. 29, 1209–1219. doi: 10.3758/BF03206390

Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63:81.

Pouscoulous, N., and Dulcinati, G. (2019). “Metaphor,” in Oxford Handbook of Experimental Semantics and Pragmatics , eds C. Cummins and N. Katsos (Oxford: Oxford University Press), 298–315. doi: 10.1093/oxfordhb/9780198791768.013.19

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychol. Bull. 124:372.

Rayner, K. (2009). The 35th Sir Frederick Bartlett lecture: eye movements and attention in reading, scene perception, and visual search. Q. J. Exp. Psychol. 62, 1457–1506. doi: 10.1080/17470210902816461

Rubio Fernandez, P. (2007). Suppression in metaphor interpretation: differences between meaning selection and meaning construction. J. Semant. 24, 345–371. doi: 10.1093/jos/ffm006

Rubio-Fernández, P., Cummins, C., and Tian, Y. (2016). Are single and extended metaphors processed differently? A test of two relevance-theoretic accounts. J. Pragmat. 94, 15–28. doi: 10.1016/j.pragma.2016.01.005

Sato, M., Schafer, A. J., and Bergen, B. K. (2015). Metaphor priming in sentence production: concrete pictures affect abstract language production. Acta Psychol. 156, 136–142. doi: 10.1016/j.actpsy.2014.09.010

Sawilowsky, S. S. (2009). New effect size rules of thumb. J. Mod. Appl. Stat. Methods 8:26. doi: 10.22237/jmasm/1257035100

Sperber, D., and Wilson, D. (2008). “A deflationary account of metaphors,” in The Cambridge Handbook of Metaphor and Thought , ed R. Gibbs (New York, NY: Cambridge University Press), 84–105. doi: 10.1017/CBO9780511816802.007

Thibodeau, P., and Durgin, F. (2008). Productive figurative communication: conventional metaphors facilitate the comprehension of related novel metaphors. J. Mem. Lang. 58, 521–540. doi: 10.1016/j.jml.2007.05.001

Thibodeau, P., and Durgin, F. (2011). Metaphor aptness and conventionality: a processing fluency account. Metaphor Symbol 26, 206–226. doi: 10.1080/10926488.2011.583196

Vasishth, S., von der Malsburg, T., and Engelmann, F. (2013). What eye movements can tell us about sentence comprehension: eye movements and sentence comprehension. Wiley Interdiscipl. Rev. Cogn. Sci. 4, 125–134. doi: 10.1002/wcs.1209

von der Malsburg, T., and Angele, B. (2017). False positives and other statistical errors in standard analyses of eye movements in reading. J. Mem. Lang. 94, 119–133. doi: 10.1016/j.jml.2016.10.003

Weiland, H., Bambini, V., and Schumacher, P. B. (2014). The role of literal meaning in figurative language comprehension: evidence from masked priming ERP. Front. Hum. Neurosci. 8:583. doi: 10.3389/fnhum.2014.00583

Wolff, P., and Gentner, D. (2000). Evidence for role-neutral initial processing of metaphors. J. Exp. Psychol. Learn. Mem. Cogn. 26, 529–541. doi: 10.1037//0278-7393.26.2.529

Wolff, P., and Gentner, D. (2011). Structure-mapping in metaphor comprehension. Cogn. Sci. 35, 1456–1488. doi: 10.1111/j.1551-6709.2011.01194.x

Zehr, J., and Schwarz, F. (2018). PennController for Internet Based Experiments (IBEX) . OSF.

Keywords: verbal metaphors, eye-tracking, experimental pragmatics, figurative language comprehension, metaphor processing

Citation: Ronderos CR, Guerra E and Knoeferle P (2021) The Role of Literal Features During Processing of Novel Verbal Metaphors. Front. Psychol. 11:556624. doi: 10.3389/fpsyg.2020.556624

Received: 07 May 2020; Accepted: 15 December 2020; Published: 26 January 2021.

Reviewed by:

Copyright © 2021 Ronderos, Guerra and Knoeferle. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Camilo R. Ronderos, Y2FtaWxvLnJvZHJpZ3Vlei5yb25kZXJvcyYjeDAwMDQwO2dtYWlsLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

Sage Choice logo

Activation of Literal Word Meanings in Idioms: Evidence from Eye-tracking and ERP Experiments

Ruth kessler, andrea weber, claudia k friedrich.

  • Author information
  • Article notes
  • Copyright and License information

Ruth Kessler, Developmental Psychology, University of Tübingen, Schleichstraße 4, Tübingen, D-72076, Germany. Email: [email protected]

Issue date 2021 Sep.

This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License ( https://creativecommons.org/licenses/by-nc/4.0/ ) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page ( https://us.sagepub.com/en-us/nam/open-access-at-sage ).

How the language processing system handles formulaic language such as idioms is a matter of debate. We investigated the activation of constituent meanings by means of predictive processing in an eye-tracking experiment and in two ERP experiments (auditory and visual). In the eye-tracking experiment, German-speaking participants listened to idioms in which the final word was excised ( Hannes let the cat out of the . . . ). Well before the offset of these idiom fragments, participants fixated on the correct idiom completion ( bag ) more often than on unrelated distractors ( stomach ). Moreover, there was an early fixation bias towards semantic associates ( basket ) of the correct completion, which ended shortly after the offset of the fragment. In the ERP experiments, sentences (spoken or written) either contained complete idioms, or the final word of the idiom was replaced with a semantic associate or with an unrelated word. Across both modalities, ERPs reflected facilitated processing of correct completions across several regions of interest (ROIs) and time windows. Facilitation of semantic associates was only reliably evident in early components for auditory idiom processing. The ERP findings for spoken idioms compliment the eye-tracking data by pointing to early decompositional processing of idioms. It seems that in spoken idiom processing, holistic representations do not solely determine lexical processing.

Keywords: Idioms, eye-tracking, ERP, online processing

1 Introduction

There is an open debate in psycholinguistic research on whether and how formulaic sequences or multi-word expressions, as for example in collocations ( black coffee ), phrasal verbs ( dig into something ), or idioms ( kick the bucket ), are stored in the mental lexicon (for a review, see Conklin & Schmitt, 2012 ). In some accounts, the linguistic system is assumed to store formulaic sequences as larger units and to process them holistically (e.g., Jackendoff, 2002 ; Swinney & Cutler, 1979 ; Wray, 2005 ). According to this account, formulaic sequences have their own lexical entry comparable to “long words.” More recently, other accounts emphasize the internal syntactic and semantic structure of these multi-word expressions (e.g., Kyriacou, Conklin, & Thompson, 2020 ; Mancuso et al., 2020 ; Marantz, 2005 ; Snider & Arnon, 2012 ; Sprenger, Levelt, & Kempen, 2006 ; Tremblay & Baayen, 2010 ). While parsers are indeed sensitive to phrase frequencies, they access representations of all individual constituents in a phrase simultaneously ( Arnon & Christiansen, 2017 ). According to these accounts, single constituents within multi-word units can be accessed separately.

In order to capture the hybrid nature of multi-word sequences, accounts of idiom processing have been proposed in which the structural properties of an idiom are preserved, while its meaning and form are also stored holistically. For example, the Configuration Hypothesis by Cacciari and Tabossi (1988) assumes that idioms are processed like novel, literal language, but only until the parser recognizes a phrase as an idiom. After this “idiom key,” the parser directly retrieves the idiom configuration and associated meaning from the mental lexicon. According to a multidetermined view of idiom processing, factors such as familiarity or literal plausibility in addition to predictability determine the time point of recognition ( Libben & Titone, 2008 ; Titone et al., 2019 ). Thus it is not surprising that in highly predictable idioms the recognition of the phrase can occur prior to the final word ( Cacciari & Corradini, 2015 ). The Superlemma Hypothesis for speech production states that single word meanings within idiomatic expressions are necessarily activated ( Sprenger et al., 2006 ). According to this view, idioms are accessible as both individual words ( simple lemmas ) and lexical units ( superlemmas ). In the present study, we investigated the neurocognitive reality of the representation of formulaic language in the mental lexicon by tracing the temporal dynamics of online activation of idiom constituent meaning.

The assumption of holistic processing is typically backed up by empirical evidence of greater processing ease for formulaic than for comparable non-formulaic language (e.g., Conklin & Schmitt, 2008 ; Gibbs, 1980 ; Siyanova-Chanturia, Conklin, & Schmitt, 2011 ; Strandburg et al., 1993 ; Swinney & Cutler, 1979 ; Tabossi, Fanari, & Wolf, 2009 ; Tremblay et al., 2011 ; Underwood, Schmitt, & Galpin, 2004 ). Several studies have found, for example, that participants read fixed multi-word expressions faster than novel phrases (e.g., Conklin & Schmitt, 2008 ; Tremblay et al., 2011 ), and that they fixate on words in idioms less extensively than on words in control sentences ( Siyanova-Chanturia et al., 2011 ; Underwood et al., 2004 ). Based on holistic processing accounts, it has been argued that formulaic sequences are retrieved faster from the semantic memory than novel controls because there is no need for the parser to access single word meanings.

However, processing advantages might not originate exclusively from a purely holistic representation of formulaic phrases. They might also emerge from phrase frequency, predictive mechanisms for frequently co-occurring constituents, or phrase familiarity (e.g., Carrol & Conklin, 2020 ). For example, Canal et al. (2010) propose that predictive mechanisms within idioms are based on the knowledge of their specific lexical form in the mental lexicon and these predictive mechanisms might differ qualitatively from predictions within literal, non-formulaic expressions. Arguably, more direct evidence for holistic representations would show that the linguistic system does not process single words and their meanings separately, but receives multi-word expressions unanalyzed from the mental lexicon (see the discussion in, e.g., Siyanova-Chanturia, 2015 ).

Idioms are well suited for investigating holistic processing versus decomposition into single constituents of formulaic expressions. In many idioms, the figurative meaning cannot be inferred from the compositional meaning of the constituent words. For example, the figurative meaning of to let the cat out of the bag (to reveal a secret unintentionally) is not derived from the meaning of the single noun constituents ( bag and cat ) or from their combination with the verb ( to let ). Therefore, evidence that such multi-word idioms are obligatorily decomposed into their single constituents would strongly speak against a model assuming solely holistic processing of idioms not allowing access to single words. One way to test whether single constituents within idioms are processed individually is to measure the activation of semantic associates ( basket ) of these constituents ( bag ). Because, in general, activation of a word in the mental lexicon will spread to semantically related words ( Collins & Loftus, 1975 ), activation of semantic associates within idioms would indicate that the parser processes individual constituents.

Following the approach of spreading semantic activation, priming and word production studies have indeed shown that parsers have single word meanings available quickly during idiom processing (e.g., Beck & Weber, 2016 ; Smolka, Rabanus, & Rösler, 2007 ; Sprenger et al., 2006 ; van Ginkel & Dijkstra, 2019 ). In these studies, participants typically first read idioms ( Rabanus et al., 2008 ; Smolka et al., 2007 ; van Ginkel & Dijkstra, 2019 ) or listened to idioms ( Beck & Weber, 2016 ), such as to pull someone’s leg (meaning “to spoof someone”), and subsequently performed a lexical decision task on immediately following written target words. Across these studies, participants responded faster to targets that were semantically related to the literal meaning of a constituent word (e.g., walk ) compared to unrelated targets. In two experiments conducted by Sprenger and colleagues (2006 , Experiment 2 and Experiment 3), participants read idiom fragments (e.g., Jan liep tegen de [lamp] , literally translated: Jan walked against the [lamp] , meaning “to get caught” in Dutch) and were asked to complete the idiom by speaking aloud the final, missing noun (e.g., lamp ). Both experiments tested whether participants have semantic associates of idiom-final words (e.g., candle ) available while they prepare their responses. In Experiment 2, participants received a spoken prime while they prepared their response. Semantic associates facilitated participants’ responses compared to unrelated primes. In Experiment 3, participants were prompted to produce the idiom-final word when a question mark appeared on the screen. However, when another word appeared on the screen instead of the question mark, they had to switch the task and produce that word. In this production task, participants responded faster to semantic associates of the idiom-final constituent compared to unrelated probes.

Evidence for spreading semantic activation originating from single idiom constituents was also found in an eye-tracking study by Holsinger (2013) . Participants listened to idiomatic phrases ( hit the hay ) while they saw four printed words on the screen, including an associate of a constituent word ( barn ). Fixations showed that participants considered the semantic associate more often than they considered unrelated distractors. Together, priming and eye-tracking results are in line with accounts assuming that the parser has idiom internal structures available (e.g., Marantz, 2005 ; Snider & Arnon, 2012 ; Sprenger et al., 2006 ; Tremblay & Baayen, 2010 ).

In contrast to priming and eye-tracking work, data from an event-related potentials (ERP) study found no apparent involvement of single word meanings during idiom processing ( Rommers, Dijkstra, & Bastiaansen, 2013 ). In this experiment participants read highly predictable Dutch idioms (e.g., literally translated to walk against the lamp ). In a related condition, a semantic associate replaced the idiom’s final noun ( candle ), and in an unrelated condition, an unrelated word replaced the final noun ( fish ). Semantic associates of idiom-final nouns did not elicit different ERPs than completely unrelated words did (see Experiment 2, for further discussion of the specific ERP effects elicited in this study). Rommers and colleagues argued that participants did not form semantic predictions of idiom-final constituents. Results rather indicated holistic processing of idioms, as would be suggested by representational accounts viewing idioms as “large words” ( Jackendoff, 2002 ) or “lexical items” ( Swinney & Cutler, 1979 ), which are processed as a whole.

Design-related differences (such as modality, paradigm, and idiom characteristics) in previous studies might account for the mixed results regarding the processing of idiom constituents. For example, the modality in which idioms were presented differed between experiments and this comes with different amounts of linguistic information available to participants at any given point in time. While connected spoken language makes single words only sequentially available (as they are unfolding over time), written language makes complete words or phrases available at once. Using spoken idioms combined with written probes, Beck and Weber (2016) and Holsinger (2013) found semantic activation of single idiom constituents. Other studies presented idioms and probes visually, either phrase-wise ( Sprenger et al., 2006 , Experiment 3) or word-by-word ( Rabanus et al., 2008 ; Rommers et al., 2013 ; Smolka et al., 2007 ). The experiments by Sprenger et al. (2006 , Experiments 2 and 3) using phrase-wise presentation, where the whole idiom fragment was available at once, revealed semantic activation of the idiom constituent. Results were mixed for experiments using word-by-word presentation in a rapid serial sequence ( Rabanus et al., 2008 ; Rommers et al., 2013 ; Smolka et al., 2007 ). Clearly, the time course of word recognition and semantic activation might differ depending on the amount of linguistic information available at a certain point in time (e.g., Anderson & Holcomb, 1995 ; Van Petten et al., 1999 ) and this might play a role in processing differences found across different studies.

Different experimental paradigms could also relate to different results. In most studies that support decomposition of idioms ( Beck & Weber, 2016 ; Holsinger, 2013 ; Rabanus et al., 2008 ; Smolka et al., 2007 ), activation of semantically related words might have resulted from bottom-up spreading activation, due to the critical idiom constituent being actually presented. For example, the eye-tracking study by Holsinger (2013) reported biased eye movements towards semantic associates ( barn ) shortly after the participants heard the critical idiom constituent ( hay ) as part of the idiom. Similarly, the critical constituent was part of the primes in priming studies showing semantic activation ( Rabanus et al., 2008 ; Smolka et al., 2007 ; van Ginkel & Dijkstra, 2019 ). In these studies, the critical idiom constituent might have briefly activated semantic associates in a bottom-up fashion without the idiom representation being involved. In contrast, participants were not presented with the critical idiom constituent ( lamp ) in the ERP study by Rommers et al. (2013) , which did not find evidence for activation of semantic associates ( candle ). According to the authors of the latter study, the prediction of the correct idiom-final word might not be sufficient to activate single word meanings within idioms and, thus, no processing benefit for semantically related words was found. However, while critical idiom constituents were also not presented in the production study by Sprenger et al. (2006 , Experiments 2 and 3), these authors did find that facilitation of semantically related words was induced merely by planning to produce the idiom-final constituent.

Finally, experiments differed in terms of idiom characteristics such as predictability. Depending on the amount of given linguistic constraints, individual idioms can be recognized prior to their last constituent ( Libben & Titone, 2008 ). Earlier versus later activation of the idiomatic form might result in higher versus lower predictability of the idiom-final word ( Canal et al., 2010 ). According to the Configuration Hypothesis ( Cacciari & Tabossi, 1988 ), predictability might affect the activation of literal constituent meanings. Since in highly predictable idioms the idiom key should be well before the final constituent, literal activation of the latter would be less likely. Nevertheless, Rabanus et al. (2008) , Rommers et al. (2013) , Smolka et al. (2007) , and Sprenger et al. (2006) measured lexical activation of highly predicted, idiom-final constituents and came to different conclusions. Taken together, different idioms used across different studies render comparisons of results obtained with different paradigms and presentation modalities difficult.

In the present study, we targeted the previously obtained inconsistencies regarding literal meaning activation of single idiom constituents. To this end, (a) we varied presentation modality by presenting idioms and probes cross-modally (Experiment 1), auditorily (Experiment 2), and visually (Experiment 3), (b) we focused on top-down prediction mechanisms, for example by not presenting the critical constituent in the input in order to discourage pure bottom-up spreading of semantic information (Experiments 2 and 3), and (c) we kept the idiom characteristics constant by using the same highly predictable idioms across experiments. Furthermore, we employed different implicit methods by relying on eye-tracking (Experiment 1) and ERPs (Experiments 2 and 3) measures. Implicit online measures might be more sensitive in detecting spreading semantic activation ( Heil, Rolke, & Pecchinenda, 2004 ).

2 Experiment 1

In Experiment 1, we addressed the question of semantic activation of idiom constituent meanings through predictive processing by conducting an eye-tracking study. We exploited the tendency of gaze behavior (e.g., time course and amount of fixations) to be biased towards implicit linguistic aspects of displayed words (for a review, see Huettig, Rommers, & Meyer, 2011 ). Fixation biases include semantic associates of target words as reflected, for example, in more fixations towards the printed word shark while the word turtle is mentioned ( Huettig & McQueen, 2011 ). These results imply that eye movements are a powerful tool to investigate bottom-up spreading semantic activation exerted by spoken input.

In the eye-tracking study on idiom processing by Holsinger (2013) , participants’ eye movements were attracted by semantic associates of idiom constituents while they listened to the idiom containing the respective constituent. For example, while listening to hay in hit the hay , participants fixated the printed word BARN more often than unrelated control words. That is, the design of this former study does not allow disentangling rapid bottom-up semantic spread exerted by the presentation of the single word and decomposition of the idiom during processing. In order to study the latter, we have to rely on a paradigm that does not present the critical idiom constituent in the input.

In order to avoid presentation of the critical idiom constituent, we exploited predictive processing in online comprehension. Numerous eye-tracking studies have shown that participants use sentence contexts to predict upcoming words and their semantic properties ( Altmann & Kamide, 1999 , 2007 ; Kamide, Altmann, & Haywood, 2003 ). For instance, when participants listened to a sentence such as the boy will eat the cake , they fixated on the picture of a cake in a visual scene at the offset of the verb eat ( Altmann & Kamide, 1999 ). That is, eye fixations reflect predictions built during online processing before the critical word can exert bottom-up semantic spread. Therefore, prediction of semantic features for idiom constituents that are not part of the input can indicate decomposable memory traces for idioms.

In order to investigate prediction within idioms, we measured predictive fixations to displayed words before the full idiom has been heard and processed. In Experiment 1, we used highly predictive German idiomatic phrases. Participants listened to incomplete idioms, missing the final critical word, without any biasing context (e.g., Hannes ließ die Katze aus dem . . . , “Hannes let the cat out of the . . .”). Visual displays included four printed words: the correct idiomatic completion ( SACK , “BAG”), a semantic associate of the correct completion ( KORB , “BASKET”), and two unrelated distractors, with a semantic relation to each other ( ARM , “ARM” and BAUCH , “STOMACH”). Participants had to choose which of the displayed words was the correct completion of the idiomatic phrase. In order to fixate the correct item, participants had to anticipate the complete idiom. This should result in fixations of the correct completions. Fixation biases to correct completion will be informative about idiom recognition. If semantic associates of single idiom constituents are available for predictive processing, this would be indicated in fixations to semantic associates of the correct completion as soon as the idiom is recognized. This would support word-by-word predictions based on decomposable memory traces for idioms. If semantic associates of single idiom constituents do not attract more fixations than unrelated distractors, this would speak for holistic idiom representations, not allowing a word-by-word analysis.

2.1 Methods

2.1.1 participants.

Thirty-one adults (mean age = 20.97, range = 18–30, 22 female, 9 male) participated in the experiment. Participants were recruited at the University of Tübingen and received subject credits as compensation. Prior to the experiment, participants gave written informed consent. All participants were native, monolingual speakers of German. Participants had no hearing impairments and normal or corrected-to-normal vision. Experiment 1 was approved by the Ethical Committee for Psychological Research at the University of Tübingen (reference number: 2016/1027/22).

2.1.2 Stimuli and design

We selected 20 well-known German idioms (see Appendix ). 1 The idioms were embedded in sentences with a comparable structure (see Table 1 ): (a) a person carrying out the action of the sentence, (b) a sentence body that originated from a German idiom, and (c) the final target word of the idiom (which was not presented auditorily in Experiment 1). All idiomatic sentences were spoken in their complete form by a native speaker of German and digitally recorded. For Experiment 1, we removed the final target word from the recording. Participants heard each idiomatic sentence fragment once, while seeing four visual words on a computer screen. The four words represented one of these four types: (1) Correct Completion: correct completion of the idiomatic phrase, (2) Related Distractor: semantic associate of the correct completion, and (3&4) Distractors Unrelated 1 and Unrelated 2: semantically unrelated to the correct completion. Unrelated 1 and Unrelated 2 words were matched word pairs from Correct Completions and Related Distractors used with other sentence fragments in the experiment (avoiding phonological overlap). All words on the screen had the same grammatical gender fitting the preceding sentence context.

German Example Sentence for Types 1–4 with English Equivalent.

We ensured semantic relatedness between correct and related words by comparing pairwise semantic spaces using the R package LSAfun ( Günther, Dudschig, & Kaup, 2015 ) and testing these similarity values with a Wilcoxon signed rank test. On average, semantic similarity between correct-related word pairs was significantly higher than between correct-unrelated1 ( Z = 189, p < .001) and correct-unrelated2 ( Z = 185, p = .002) word pairs. Semantic similarity between correct-unrelated1 and correct-unrelated2 word pairs did not differ ( Z = 75, p = .28). Furthermore, semantic similarity between correct-related word pairs was significantly higher than between related-unrelated1 ( Z = 180, p = .004) and related-unrelated2 ( Z = 182, p = .003) word pairs.

Displayed words were presented in white font (Arial, font size 28) on a gray background. The position of the displayed words was counterbalanced across items and participants. The order of the trials was randomized.

2.1.3 Procedure

Participants completed the experiment in a single session. For the experimental task, participants received both written and oral instructions. Prior to the experimental task, each participant received a 5-point grid for calibration and a practice block consisting of five trials.

An exemplary trial scheme is displayed in Figure 1 . Each trial began with a 1500 ms inter-trial interval followed by a 500 ms presentation of a fixation cross. Then the visual display of the set of four words appeared on the screen and remained until the end of the trial. The presentation of the audio stimuli started after a total of 2150 ms and was presented via headphones. After they heard the auditory stimuli, the task of the participants was to decide for each item which of the visually presented words was the best completion for the idiom by saying their choice out loud. 2 The experimenter noted the participants’ responses. Participants were instructed to press a button after their oral response in order to continue on to the next trial.

Figure 1.

Example with times indicating the duration of the respective displays.

We recorded fixations using a portable Tobii eye-tracker with a sampling size of 60 Hz. In total, the eye-tracking experiment took around 20 minutes including instructions, calibration and the experimental task, which took around 10 minutes.

2.2 Results

For the analysis, we divided the screen into four areas of interest. The analysis time window was aligned to the offset of each audio stimulus (offset = 0 ms). For the statistical analysis, we only included items responded to correctly, that is, in which the participants completed the sentence aloud with the correct final word of the idiom (these were 98.87% of all trials). Figure 2 (Panel A) shows fixations proportions towards correct, related, and aggregated unrelated words as fixation proportions from 800 ms before to 1000 ms after the offset of the spoken stimuli. Running t -tests comparing fixations towards correct completions and unrelated distractors at succeeding measurement points (every 16.67 ms) showed that participants’ fixations were biased towards the correct idiomatic completion 464 ms prior to the offset of the audio stimuli ( p < .01). This can be interpreted as the recognition point of the idiom. To compare the amount and time course of fixations towards related and unrelated distractors, we conducted a growth curve analysis (GCA) with orthogonal polynomials ( Mirman, Dixon, & Magnuson, 2008 ). As the starting point of the GCA time window, we chose the start of observable anticipation (464 ms prior to the offset) for a duration of 1200 ms.

Figure 2.

Panel (A) Fixation percentage for correct completions (black), related distractors (green) and mean of unrelated distractors (red); black vertical line = offset of spoken stimuli (0 ms); blue vertical, dashed line = start of the anticipation (-464 ms); gray background = time window for GCA. Panel (B) Fixation percentage for semantically related and unrelated distractors (points = mean; error bars = standard error) with fit of the growth curve model (line).

Fixation proportions were modeled with third-order orthogonal polynomials, because visual inspection of the time course bent at two points. To test the effect of Distractor Type (related vs. unrelated), we compared models using the –2LL deviance statistic. Including the effect of Distractor Type significantly improved the model fit ( χ 2 = 42.98, p < .0001). Estimated parameter terms of distractor type are summarized in Table 2 . The intercept term reflects the average magnitude of the curve. Thus, the significant effect on the intercept term indicates that participants fixated more on related than on unrelated distractors across the complete time window. The linear term is comparable to the overall slope of the curve. In this case, the significant effect on the linear term implies variation across time with larger differences between the distractor types at the beginning of the time window. The quadratic term reflects symmetric inflection of the curve around the center meaning. Thus, the curve of the related distractor is shallower than the curve of the unrelated distractor, and towards the end of the time window the proportions of looks to related and unrelated distractors converge. The cubic term reflects inflections of the curve at the ends of the analysis time window. We found no significant effect on this term.

Parameter Estimates for the Model including Distractor Type (Related vs. Unrelated).

2.3 Discussion

The predictive eye movements recorded in an eye-tracking paradigm in Experiment 1 were in line with previous priming and eye-tracking research ( Beck & Weber, 2016 ; Holsinger, 2013 ; Smolka et al., 2007 ; Sprenger et al., 2006 ) in showing that single word meanings are available in online processing of idioms. In Experiment 1, participants looked more often at distractor words that were related to the idiom’s final word than at unrelated distractor words. Moreover, this fixation bias emerged anticipatorily, meaning well before the point in time at which the idiom’s final word would have become evident in the speech signal. In fact, participants started to anticipate — that is, look at — correct idiomatic completions around 460 ms prior to the offset of the phrase fragment. Because programming of saccades after onset of the critical word typically takes around 200 ms ( Saslow, 1967 ), we can assume that recognition of the idiom occurred even before 460 ms. Simultaneously with the increase in fixations on correct idiomatic completions, the fixation bias for semantic associations emerged. The fixation bias towards semantic associates diminished over time and ended 400 ms after the offset of the phrase fragment. In sum, our eye-tracking data suggest rapid prediction of upcoming idiom completions revealing that listeners represented these ordered strings in their mental lexicon. In addition, predictive eye movements to semantic associates of idiom completions demonstrate that listeners not only pre-activate and predict words within idioms in a holistic fashion, but they also appear to pre-activate single constituents together with their respective meanings.

For some of the used idiom fragments, the related distractor provided a literally plausible interpretation which might have compromised the fixation towards this distractor. In a post-hoc visual inspection, we plotted fixation data for items that allow a literal interpretation of the related completion ( Sarah band sich einen Klotz ans Knie ., literally translated: Sarah tied herself a chunk to her knee .) and implausible, related completions ( Hannah schlug sich die Zeit um die Augen ., literally translated: Hannah hit herself the time around the eyes . ) separately. We did not observe decreased semantic activation for literally implausible, related completions supporting the interpretation of pre-activation of idiom constituents together with their semantic features. This complements the results from a visual world experiment using literal, novel phrases that show anticipatory fixations towards predicted words and semantic competitors, although the latter were implausible completions of the phrase ( Ito & Husband, 2017 ).

The relatively long preview window that we implemented in the present experiment might have biased participants towards predictive processing. For example, Ferreira, Foucart, and Engelhardt (2013) , suggested that preview time is associated with the strength of expectations participants form. Accordingly, longer preview of words or objects on the display is associated with stronger expectations participants form with regard to which word on the display is likely to be referred to. In this case, we would expect participants of Experiment 1 to build up stronger expectations for the correct idiom completion as part of the conventionalized phrase, and therefore weaken any tendency to look at related words. As a result of these stronger expectations, we might have overestimated the timing of the anticipation onset.

Another aspect of the eye-tracking design in Experiment 1 potentially limits its straightforward interpretation: the visually presented probes might have induced spreading semantic activation in a bottom-up fashion. Although we did not present a spoken version of the idiom-final constituent, a written version of it was available on the visual display, simultaneously with a written version of its semantic associate. Thus, fast fixations towards the correct idiomatic completion might have induced fast visual word processing and spreading semantic activation, which might have rapidly biased fixations towards the semantic associate. However, similar onsets of the fixation biases towards correct completions, on the one hand, and semantic associates, on the other hand, somewhat restrict an interpretation in terms of spreading activation exerted by the visual versions of the correct completions, because this mechanism might need some extra processing time (i.e., visual word recognition of the correct completion, spreading activation, and elicitation of eye movements towards its semantic associate). Nevertheless, similar to results of other studies ( Beck & Weber, 2016 ; Holsinger, 2013 ; Smolka et al., 2007 ), the present eye-tracking data might overestimate decomposition because an instance of the critical constituent was visually included in each trial. In Experiment 2 and Experiment 3, we attempted to further rule out this alternative interpretation by avoiding any presentation of the critical idiom constituent for which we attempt to measure prediction effects in ERP experiments.

3 Experiment 2

In the following two experiments, we exploited semantic expectancy in spoken (Experiment 2) and written (Experiment 3) idioms in an ERP paradigm comparable to that of Rommers et al. (2013) . As in the former study, we focused on N400 effects. Typically, reduction of the N400 ERP component is related to facilitated semantic processing, including semantic expectancy mechanisms (e.g., Federmeier & Kutas, 1999 ; Kutas & Federmeier, 2011 ; Laszlo & Federmeier, 2009 ). The N400 is a centro-posterior negative-going ERP component peaking around 400 ms after word onset. In N400 experiments, semantic expectations are usually determined via the cloze probability of a critical word within a given context. This measure reflects how often participants complete a phrase or sentence with a specific word. The N400 amplitude inversely correlates with this index: the higher the cloze probability of a word, the smaller the N400 amplitude it elicits ( Kutas & Hillyard, 1984 ). Respective predictive mechanisms are so strong that even the processing of an unexpected word (with a low cloze probability) might reduce N400 amplitude if it shares semantic features with the expected stimulus (e.g., Federmeier & Kutas, 1999 ; Federmeier et al., 2002 ).

Evidence for the sensitivity of the N400 to the prediction of semantic features originally came from Federmeier and Kutas (1999) , who presented participants with written versions of highly predictive sentences, such as “ They wanted to make the hotel look more like a tropical resort. So along the driveway, they planted rows of . . . ” Sentences ended with either a highly expected word ( palms ), an unexpected word from the same semantic category ( pines ) or an unexpected word from a different semantic category ( tulips ). In this experiment, the N400 amplitude for unexpected words from both categories clearly differed from the N400 amplitude for expected words. Moreover, N400 amplitudes were graded: words from the same semantic category as the expected word elicited a significantly smaller N400 amplitude than words from a different semantic category. Therefore, the N400 effect shows that semantic features of expected words are co-activated during online comprehension and words sharing these features benefit from predictive processing.

In the context of written idioms, Rommers et al. (2013) did not replicate the N400 prediction effect for semantic associates of final words. Participants read Dutch idioms embedded in figuratively biasing contexts ( After many transactions the careless scammer eventually walked against the lamp yesterday .) in which the final word of the embedded idiom was either correct ( lamp ), not expected but from the same semantic category as the correct completion ( candle ), or not expected and from a different semantic category ( fish ). An N400 reduction for correct idiom-final words was found. This effect emerged with the typical topography (posterior) and within the typical time window of the N400 (300–400 ms). Yet semantic associates of correct idiom completions did not elicit an N400 reduction. That is, ERPs did not indicate facilitated processing of semantic associates of single idiom constituents. In addition to the N400 effect, Rommers et al. (2013) found a reduced late positivity ranging between 500 and 800 ms for correct idiom completions compared to related and unrelated completions. Again, the related and the unrelated condition did not differ. Rommers and colleagues interpreted this positivity as an instance of the P600 component reflecting a violation of the idiom representation as a linguistic unit.

In Experiment 2 and Experiment 3, we adopted the semantic expectancy ERP paradigm by Rommers et al. (2013) to preclude possible bottom-up spreading semantic activation (as in the eye-tracking paradigm in Experiment 1). In Experiment 2, we examined spoken versions of idioms in a unimodal design in order to be able to relate the results to previous cross-modal designs with spoken idioms that found activation of semantic associates of idiom constituents (eye‑tracking paradigm in Experiment 1, Beck & Weber, 2016 ; Holsinger, 2013 ). In the literature, results for the semantic N400 effect in sentences is fairly comparable for visual and auditory processing ( Connolly et al., 1992 ; Federmeier et al., 2002 ; Hagoort & Brown, 2000 ). This includes semantic expectancy effects ( Federmeier et al., 2002 ). Only the onset of N400 might differ, in that it starts earlier for auditory than for visual processing. It is still a matter of debate whether this early onset is functionally different from the N400 or not ( Connolly & Phillips, 1994 ; Diaz & Swaab, 2007 ; Nieuwland, 2019 ; Van Den Brink, Brown, & Hagoort, 2001 ).

We again presented German idioms in short sentences without further context, including the ones we used in our eye-tracking study (Experiment 1) as well as additional items. Participants listened to highly predictive idiomatic phrase onsets (e.g., Hannes ließ die Katze aus dem . . . , “Hannes let the cat out of the . . .”). Phrase onsets were completed either (1) with the expected and correct final idiom word ( Sack , “bag”), (2) with an unexpected but semantically related completion ( Korb , “basket”), or (3&4) with an unexpected and semantically unrelated completion ( Arm , “arm”; Bauch , “stomach”). If processing is solely holistic, the words in related and unrelated conditions should show comparable ERP amplitudes, as was shown by Rommers et al. (2013) . Such a finding would suggest that fixations towards semantic associates of correct completions in Experiment 1 were merely an epiphenomenon of bottom-up spreading activation exerted by the visual probe being presented together with the correct completion within the same display. If literal meanings of expected words are accessed, the processing of semantically related words should benefit more from this expectation when compared to unrelated words. This would yield graded ERP amplitudes for related and unrelated completions.

3.1 Methods

3.1.1 participants.

Forty-two healthy participants volunteered for Experiment 2. None of the participants had taken part in Experiment 1. We excluded the data of one bilingual participant and of one participant for whom we had technical problems with the ERP recording. Participants whose data were included in the analysis ( N = 40, mean age = 22.9 years, range = 18–32, 20 female and 20 male) were right-handed as assessed by the Edinburgh Handedness Questionnaire ( Oldfield, 1971 ), monolingual native speakers of German, and had no history of a neurological, psychiatric, or hearing disorder. As compensation, subjects were paid for the experiment or provided with subject credits. Experiment 2 was approved by the Ethical Committee of the German Psychological Society (reference number: RK 112015).

3.1.2 Stimuli

In order to arrive at a sufficient number of trials for an ERP study, we extended the experimental materials from Experiment 1 from 20 to 40 phrases using the same criteria of familiarity and predictability (see Appendix ). Linguistic stimuli resulted from the combination of the sentence body with the four sentence final target words in four conditions with a combination logic following that of Experiment 1 (see Table 1 ). The conditions were the following: (1) Correct Condition: the target word was the correct completion of the idiomatic phrase, (2) Related Condition: the target word was semantically related to the correct completion, and (3&4) Conditions Unrelated 1 and Unrelated 2: the target word was semantically unrelated to the correct completion. Unrelated 1 and Unrelated 2 words were matched word pairs from Correct and Related Conditions used with other sentence bodies in the experiment (no phonological or semantic overlap). Each sentence body was repeated four times, once in all four conditions. This resulted in 160 different combinations of sentence bodies and target words. The same native speaker of German as in Experiment 1 spoke all linguistic stimuli. The linguistic stimuli that were repeated across conditions (sentence body and final words) were realized as the same recordings.

We conducted rating studies to determine some characteristics of the materials essential for ERP research. In a cloze probability task, 17 participants read the 40 sentence bodies and filled in the word that they considered to be the most likely completion. The mean cloze probability of the correct idiom-final word was 93.82% ( SD = 9.69).

Furthermore, we controlled for the semantic relatedness between critical words by means of a second rating study. Fifteen participants received lists of word pairings of the target words and judged their relatedness on a scale from 1 to 7. The association strength between words presented as critical words in the Correct Condition (i.e., between the correct idiom completion) and words presented in the Related Condition (see Table 1 ) was significantly higher than the association strength of critical words presented in the Correct Condition and both Unrelated Conditions (Wilcoxon signed rank test: Unrelated 1 Z = 120, p < .001; Unrelated 2 Z = 120, p < .001). The association strength between critical words presented in the Correct Conditions and those presented in both Unrelated Conditions did not differ ( Z = 78, p = .32).

3.1.3 Procedure

Participants completed the experimental task in a single session. After signing an informed consent form, participants sat in a comfortable chair facing a computer screen in a dimly lit room. During the experimental task, they were instructed to sit still and avoid eye movements including blinking. Later, participants took part in a calibration task at the beginning and the end of the session. In this task, eye movements were systematically evoked for offline ocular correction. Before the experimental task, participants received both written and oral instructions. The participants received a practice block consisting of eight trials to ensure that they were familiar with the procedure and the task.

For each experimental trial, a sentence was presented auditorily via loudspeakers on both sides of the computer screen. During the presentation of the sentences, participants viewed a fixation cross at the center of the screen. After the auditory presentation, the task of the participants was to decide for each sentence whether it was a correct idiomatic phrase or not by pressing buttons with the index fingers of the right or the left hand. 3 The side for yes- and no-buttons was counterbalanced across participants. The response type was a delayed response; 1200 ms after onset of the target stimulus a question mark appeared at the center of the screen to signal the start of the response window for the participants. If they responded before the start of the response window, participants were given feedback ( too fast ). The interval between succeeding trials was 1500 ms.

The experiment consisted of eight blocks of 20 trials, 160 trials in total, with five trials in each condition in each block. The order of trials was pseudorandomized in such a way that the same sentence body or target word never occurred in the same block. After each block, participants had the opportunity to take a self-timed break. The order of blocks was randomized using the Latin Square method. In total, the EEG experiment took around 1.5 hours including electrode application, instruction, calibration and the experimental task; the experimental task itself took around 15–20 minutes.

3.1.4 Electrophysiological recordings

Electrophysiological brain potentials were recorded with 46 active electrodes (Ag/AgCl) mounted in an elastic cap (Easycap GmbH, Herrsching, Germany) according to the 10–20 system (see Figure 3 ), online referenced to the nose. The ground electrode was positioned at the location of the AF3. In order to record eye movements, we attached two ocular electrodes below both eyes. The raw data were sampled at 500 Hz (bandpass filter 0.01–100 Hz, BrainAmpStandard, Brain Products, Gilching, Germany).

Figure 3.

Electrode configuration used in the experiment. Anterior-Left, Anterior-Right, and Posterior-Central ROIs are highlighted in light gray. Anterior-Central, Posterior-Left, and Posterior-Right ROIs are highlighted in dark gray.

3.1.5 EEG analysis

For the ERP analysis, the raw data were re-referenced offline to the average reference and filtered with a 0.3 Hz Low-Cut-Off filter. Using surrogate MultipleSource EyeCorrection (MSEC) by Berg and Scherg (1994) , we removed horizontal and vertical eye movements as well as blinks from the continuous EEG signal. The EEG data were segmented into trials in epochs from 100 ms before and 1000 ms after the stimulus onset with a 100 ms pre-stimulus baseline subtraction. We excluded trials contaminated with artifacts and in which participants responded before the onset of the response time window -1200 ms after stimulus onset. Further, we only included individual items that participants responded to correctly in the Correct Condition in the analysis in all conditions, because we assumed that when participants recognized the idiom correctly in the Correct Condition (94.31%), they had established memory traces of the correct idiom form. These inclusion criteria resulted in the following percentage of trials per condition: Correct: 79.4%; Related: 81.3%; Unrelated 1: 78%; Unrelated 2: 78.7%. For further analyses, we aggregated the conditions Unrelated 1 and Unrelated 2 into one condition Unrelated by averaging the mean voltages of the two conditions for each participant. Following this process, the final three conditions discussed in the analyses were: Correct, Related, and Unrelated.

Based on visual inspection of ERP results, we chose six regions of interest (ROIs), covering lateral and midline anterior and posterior sites (see Figure 3 ). Both lateral anterior ROIs included six electrode positions over both temporal cortices (left: F9, F7, FT9, FT7, FC5, T7; right: F10, F8, FT10, FT8, FC6, T8). The anterior midline ROI covered six fronto-central electrodes (F3, Fz, F4, FC1, FCz, FC2). Both lateral posterior ROIs included six temporo-parietal electrode positions (left: TP9, TP7, CP5, P7, PO9, O1; right: CP6, TP8, TP10, P8, PO10, O2). The posterior midline ROI covered six centro-parietal electrode positions (CP1, CP2, P3, Pz, P4, POz).

For statistical analysis, we conducted a 3 x 3 x 2 repeated-measures ANOVA (RM-ANOVA) with the within-participant factors Condition (Correct, Related, Unrelated), Hemisphere (Left, Central, Right), and Region (Anterior, Posterior). First, we conducted RM-ANOVAs for each 100 ms time window. We identified three relevant time windows, which showed three-way interactions for Condition, Region, and Hemisphere (see Table 3 ): 100–200 ms, 300–500 ms, and 700–1000 ms. Both later time windows approximately align with the effects obtained in Rommers et al. (2013) , with the 300–500 ms time window reflecting an N400 effect, and the 700–1000 ms time window reflecting a late positivity. The early time window does not find a parallel in previous ERP work on idiom processing. We label it as “pre-N400” throughout the results section. For further analysis, we aggregated amplitudes across these time windows.

RM-ANOVAs. C–Condition, R–Region, H–Hemisphere. * for significant main effects and interactions.

3.2 Results

Figure 4 (Panel A) depicts Grand-Average ERPs aggregated over ROIs. Visual inspection of grand-averaged ERPs justified the selected time windows. As shown in the difference topographies ( Figure 4 , Panel B), the effect is most prominent over posterior sites. Moreover, a late positivity was observable over posterior sites.

Figure 4.

Grand-Averaged ERPs (A) ERP-waveforms for the ROIs Anterior-Left, Anterior-Central, Anterior-Right, Posterior-Left, Posterior-Central, and Posterior-Right. (B) Difference topographies for the time windows 100–200 ms, 300–500 ms, and 700–1000 ms.

RM-ANOVAs revealed significant three-way interactions for 100–200 ms, F (4, 156) = 3.04, p = .03, 300–500 ms, F (4, 156) = 12.62, p < .0001, and 700–1000 ms, F (4, 156) = 4.27, p = .004. All reported p -values are Greenhouse-Geisser or Bonferroni (for post-hoc t -tests) corrected.

3.2.1 100–200 ms time window (pre-N400)

Post-hoc analyses of the three way interaction revealed a significant Condition effect for the Anterior-Left, Anterior-Central, Anterior-Right, and Posterior-Central ROIs, all F (2, 78) ⩾ 7.18, p ⩽ .002. Over the Anterior-Left sites only, all three conditions differed from each other: Correct vs. Related, t1 (39) = -2.93, p < .018, Correct vs. Unrelated, t2 (39) = -6.21, p < .001, and Related vs. Unrelated, t3 (39) = -3.09, p < .018. Over the remaining three sites, we found differences between the Correct Condition vs. the Related Condition, all t2 (39) ⩾ |3.71|, all p ⩽ .002, and for the Correct Condition vs. the Unrelated Condition, all t3 (39) ⩽ |1.04|, all p ⩾ .91. In sum, we found parallel effects of semantic activation and no semantic activation.

3.2.2 300–500 ms time window (N400)

For the 300–500 ms time window, a Condition effect was only evident over Posterior-Central sites, F (2, 78) = 38.17, p < .0001. Bonferroni-corrected post-hoc tests revealed significant differences between all three conditions: Correct vs. Related, t1 (39) = 5.91, p < .001, Correct vs. Unrelated, t2 (39) = 7.43, p < .001, and Related vs. Unrelated, t3 (39) = 2.71, p < .03. Across Posterior-Central electrodes, amplitudes for the Unrelated Condition were more negative than those for the Related Condition, and amplitudes for the Correct Condition were most positive. Together, we found graded condition effects for a Posterior-Central electrode cluster typically associated with the N400.

3.2.3 700–1000 ms time window (late positivity)

For the 700–1000 ms time window, we report those ROIs where a condition effect was significant, F (2, 78) > 8.12, p ⩽ .002. Post-hoc tests for these regions revealed differences of Related and Unrelated Conditions with Correct Conditions, but not between Related vs. Unrelated. Over Left-Anterior sites, amplitudes for the Correct Condition were more positive than for the Related Condition, t1 (39) = 3.94, p < .001, and the Unrelated Condition, t2 (39) = 3.35, p < .006, but amplitudes for the Related Condition and the Unrelated Condition did not differ significantly, t3 (39) = -1.7, p = .294. Similarly, over Right-Anterior sites, amplitudes for the Correct Condition were more positive than for the Related Condition, t1 (39) = 2.87, p < .020, and the Unrelated Condition, t2 (39) = 3.58, p < .003, while amplitudes for the Related Completion and the Unrelated Condition did not differ significantly, t3 (39) = 0.05, p = 1. Over Posterior-Central sites, amplitudes for the Correct Condition were more negative than for the Related Condition, t1 (39) = -5.55, p < .001, and the Unrelated Condition, t2 (39) = -5.89, p < .001, but amplitudes for the Related Condition and Unrelated Condition did not differ significantly, t3 (39) = -0.28, p = 1. In sum, late ERPs show that related and unrelated violations of the idiom yield comparable amplitudes of a late positivity with posterior distribution (and reversed amplitudes over anterior regions).

3.3 Discussion

Using a semantic expectancy ERP paradigm in Experiment 2, we investigated processing mechanisms in highly predictive spoken idiomatic phrases. In contrast to Experiment 1, the critical idiom constituent itself did not appear in trials in which we probed the activation of semantic associates of this idiom constituent. This way, we aimed to rule out potential bottom-up spread from sensory input, which could have biased results in the visual world eye-tracking design exploited in Experiment 1.

Across ERP amplitudes, there was a clear effect of expectancy of the correct idiom: both related and unrelated violations showed significantly higher ERP amplitudes than correct completions. This indicates that correct completions of an idiom were highly expected and easier to access than both related and unrelated substitutes. Because idioms were presented without biasing context, this broadly supports the notion that predictability within idioms mainly stems from the knowledge of the idiom form ( Vespignani et al., 2010 ).

Using spoken idioms, N400 amplitudes reflected semantic expectancy within violation trials. That is, we not only obtained N400 reductions for correct completions, but also for semantic associates of correct completions. Since N400 reductions are interpreted in terms of facilitated semantic processing, including semantic expectancy mechanisms (for a review, see Kutas & Federmeier, 2011 ), it seems that the anticipation of the correct completion activated semantic associates, for which semantic processing was facilitated. In this sense, the N400 effect observed here is compatible with the eye-tracking data from Experiment 1. It appears that single constituents and their individual meanings are available when these are predicted. These results in the auditory modality do not replicate those obtained for visually presented idioms obtained by Rommers et al. (2013) , and challenge the conclusions drawn by these authors, who concluded that the top-down prediction of idiom completions does not lead to beneficial processing of substitutes that are semantically related to idiom constituents.

The ERPs obtained in Experiment 2 mainly reflect an N400 effect followed by a late positivity. Recently, it has been discussed whether during the processing of idioms or other formulaic sequences the N400 is preceded by a P300 effect ( Molinaro & Carreiras, 2010 ; Siyanova-Chanturia et al., 2017 ; Vespignani et al., 2010 ). The authors of those studies found an enhanced P300 amplitude for correct and expected idiomatic forms compared to violations of those forms. They concluded that the P300 reflects a template matching process. Although we cannot rule out that the present N400 effect might also include an instance of the P300, we hypothesize that an activation of semantic information as found in Experiment 2 would only be detectable in the N400 component. We therefore conclude that the graded ERP effect between 300 and 500 ms in Experiment 2 are indeed an instance of the semantic N400 effect.

A late positivity between 700 and 1000 ms across posterior sites was independent of semantic relatedness, that is, it did not show amplitude differences between related and unrelated violations. This effect converges with findings by Rommers et al. (2013) , who interpreted this late effect as a violation of the idiom as a lexical item. More recently, the late positivity following the semantic N400 (post-N400 positivity, PNP) in prediction paradigms has been interpreted as revision of a predicted sentence representation ( Brothers, Swaab, & Traxler, 2015 ; Kuperberg & Wlotko, 2020 ) irrespective of the semantic relations between presented sentence-final words ( Thornhill & Van Petten, 2012 ). Thus, in idiom processing the late positivity might also reflect that listeners revise the activated representation of the idiom string when hearing related or unrelated violations. Together with the N400 effect suggesting decomposition, the late positivity effect could be interpreted as evidence for a dual representation of idioms in the mental lexicon as both individual words and chunked items ( Sprenger et al., 2006 ).

Similarly, early ERP effects obtained in the present study suggest that decomposition is not the only strategy followed by the parser. In contrast to the results from the written presentation of idioms by Rommers et al. (2013) , we obtained ERP effects preceding the N400 in our study with spoken materials. This is comparable with previous findings (for a review, see Nieuwland, 2019 ). Already early on (between 100 and 200 ms), we see evidence for parallel processing. Across anterior-left electrodes, amplitudes for related conditions significantly differed from amplitudes for unrelated conditions. Across central and anterior-right sites, amplitudes for related and unrelated conditions did not differ. These early ERP effects might relate to parallel pre-activation of lexical representations (e.g., Friedrich & Kotz, 2007 ). If so, the present ERP results dissociate two types of lexical idiom representations: a form and a meaning representation of the single constituents. The former is indicated by the mid to right-anterior ERP deflections, while the latter is indicated by the left-lateralized ERP deflection. Thus, within familiar and highly predictable idioms, final constituents including their semantic properties can be pre-activated before they are fully processed ( Smolka & Eulitz, 2020 ).

In general, the results of Experiment 2 corroborate other studies presenting idioms auditorily ( Beck & Weber, 2016 ; Holsinger, 2013 ) by showing that listeners activate idiom constituents and have semantic associates of these constituents available. Possibly, the pre-N400 ERP effects and the graded N400 effect that we found might be due to modality-related differences compared to the study by Rommers et al. (2013) . In contrast to the written and serial presentation (word-by-word) in that study, we presented idioms and violated idioms auditorily in Experiment 2. Semantic information might be accessible earlier in spoken language processing compared to written language processing. For example, preceding information speeds up spoken word identification even before enough acoustic information has accumulated ( Van Petten et al., 1999 ). Therefore, we conducted a third experiment in which we used the same task and material as in Experiment 2, but presented them in the written modality.

4 Experiment 3

In Experiment 3, we conducted a semantic expectancy ERP experiment using the same material as in Experiment 2, but with written instead of spoken idioms. While experiments on spoken idiom processing clearly point to decomposition within idioms (Experiments 1 and 2; Beck & Weber, 2016 ; Holsinger, 2013 ), the evidence from word-by-word presentations of idioms is mixed ( Rabanus et al., 2008 ; Rommers et al., 2013 ; Smolka et al., 2007 ). Therefore, we aimed to address the question of processing differences across modalities in Experiment 3 by using the same idiomatic expressions and violations of these forms as in Experiment 2. If there are any prediction effects inherent to the idioms we used, we should not replicate results by Rommers et al. (2013) .

4.1 Methods

4.1.1 participants.

Thirty adults participated in Experiment 3, of whom we had to exclude data of five participants, due to incorrect instructions (3), a psychiatric disorder (1), and insufficient eye movement correction (1). This resulted in a sample of 25 participants for statistical analysis ( N = 25, mean age = 21.4 years, range = 18–27, 18 female and 7 male). Participants were recruited at the University of Tübingen and received subject credits or payment as compensation. All participants included in the analysis were native, monolingual speakers of German, right-handed as assessed by the Edinburgh Handedness Questionnaire ( Oldfield, 1971 ), and had no history of a neurological, psychiatric, or hearing disorder and normal or corrected-to-normal vision. None of the participants took part in Experiments 1 or 2. Prior to the experiment, participants gave written informed consent.

4.1.2 Stimuli

In Experiment 3, we used the same stimuli as in Experiment 2, but these were presented visually at the center of a computer screen.

4.1.3 Procedure

The procedure was the same as in Experiment 2 except for the presentation modality of the stimuli. We used the same timing of presentation as in the EEG study by Rommers et al. (2013) . Each trial started with a fixation cross (+) for 1500 ms. Sentences were presented word-by-word with 300 ms presentation duration of a word and 300 ms blank screen. At 900 ms after the presentation of the sentence-final word, a question mark (?) appeared on the screen, resulting in a 1200 ms delayed response after the onset of the target word. When the question mark appeared, participants were asked to decide whether the presented sentence was a correct idiom or not. They gave their answers via button press. The response hand was counterbalanced across participants.

4.1.4 Electrophysiological recordings

Same as in Experiment 2.

4.1.5 EEG analysis

As in Experiment 2, we included items that did not contain artifacts and to which participants responded correctly in the Correct Condition (92.8%) and after the onset of the response time window (1200 ms after stimulus onset). This resulted in the following percentage of included trials per condition for the analysis: Correct: 69.5%, Related: 72.3%, Unrelated 1: 72.4%, Unrelated 2: 71.5%. Compared to Experiment 2, the number of trials was lower, because the EEG recordings were more artifactual.

Conducting RM-ANOVAs for 100 ms time window steps, we identified two relevant time windows, which showed three-way interactions for Condition, Region, and Hemisphere (see Table 4 ): 300–400 ms, and 500–700 ms. The first time window aligns with the early N400 time window found in Rommers et al. (2013) . The later time window partly aligns with the time window for the late positivity in Rommers et al. (2013 , 500–800 ms). For further analysis, we aggregated amplitudes across these time windows.

4.2 Results

Figure 5 (Panel A) depicts Grand-Average ERPs aggregated over ROIs. As shown in the difference topographies ( Figure 5 , Panel B), the N400 effect is most prominent over posterior sites. RM-ANOVAs revealed significant three-way interactions for 300–400 ms, F (4, 96) = 5.31, p = .002, and 500–700 ms, F (4, 96) = 5.82, p < .001. All reported p -values are Greenhouse-Geisser or Bonferroni (for post-hoc t -tests) corrected.

Figure 5.

Grand-Averaged ERPs (A) ERP-waveforms for the ROIs Anterior-Left, Anterior-Central, Anterior-Right, Posterior-Left, Posterior-Central, and Posterior-Right. (B) Difference topographies for the time windows 300–400 ms and 500–700 ms.

4.2.1 300–400 ms time window (N400)

Across Left sites, we found a main effect for Condition, F (2, 48) = 21.21, p < .001, which was due to significant amplitude differences between Correct vs. Related, t1 (24) = -3.60, p = .004, Correct vs. Unrelated, t2 (24) = -6.11, p < .001, and Related vs. Unrelated, t3 (24) = -3.01, p = .018. Across Central sites, we also found a main effect for Condition, F (2, 48) = 9.93, p < .001. Post-hoc t -tests revealed amplitude differences between Correct vs. Related, t1 (24) = 4.91, p < .001, and Correct vs. Unrelated, t2 (24) = 6.02, p < .001, to be significant. Amplitudes for the Related and Unrelated Conditions did not differ, t3 (24) = 1.08, p = .869. Across Right sites, we found an interaction of Condition with Region, F (2, 48) = 3.64, p = .04. For Right-Anterior electrodes there was no effect of Condition, F (2, 48) = 2.51, p = .11, but for Right-Posterior electrodes the main effect for Condition was significant, F (2, 48) = 13.24, p < .001. Across the latter region, amplitudes between Correct vs. Related, t1 (24) = -3.66, p = .004, and Correct vs. Unrelated, t2 (24) = -4.75, p < .001, differed significantly. There was no amplitude difference between Related vs. Unrelated, t3 (24) = -0.29, p = 1. Altogether, in the typical N400 time window and region we did not find evidence for a graded pattern of semantic expectancy.

4.2.2 500–700 ms time window (late positivity)

For the 500–700 ms time window, we found a main effect of Condition across Left sites, F (2, 48) = 5.41, p = .008, and an interaction of Region and Condition across Central sites, F (2, 48) = 6.00, p = .006. Over Left electrodes, we found no amplitude differences between Correct vs. Related, t1 (24) = -0.70, p = 1, significant amplitude differences between Correct vs. Unrelated, t2 (24) = -3.02, p = .018, and marginally significant amplitude differences between Related vs. Unrelated, t3 (24) = -2.53, p = .055. Across Central electrodes, there was only a Condition effect for Central-Anterior electrodes, F (2, 48) = 4.92, p = .02, with a significant amplitude difference between Correct vs. Unrelated, t2 (24) = 2.65, p = .042, and no amplitude differences between Correct vs. Related, t1 (24) = 1.92, p = .199, and Related vs. Unrelated, t3 (24) = 1.44, p = .489. In sum, for the late effect the amplitude differences show a mixed pattern.

4.3 Discussion

In Experiment 3, we again conducted a semantic expectancy ERP experiment to investigate top-down spreading semantic activation within idioms. In contrast to Experiment 2, where we used the same material in a unimodal auditory paradigm, we presented idioms as written stimuli on the screen to further explore potential modality-related differences in processing. Therefore, the design was directly comparable to that of Rommers et al. (2013) , who did not find activation of semantic associates of final constituents within written versions of idioms.

For written idioms, we found a clear expectancy effect on the N400: amplitudes for related and unrelated completions differed significantly from amplitudes for correct completions. This parallels findings for spoken idioms in Experiment 2. However, in contrast to Experiment 2, we did not find amplitude differences between related and unrelated targets in the typical semantic N400 region. Thus, we did not find an effect of semantic expectancy here. Based on the results by Rommers et al. (2013) and the present Experiment 2, these results might indicate that for online prediction of semantic features within idioms, the modality in which the idioms are presented might indeed play a role.

Nevertheless, we found differences between related and unrelated targets over left-hemispheric electrode leads in the N400 time window. Since there was no evidence of an N400 effect localized in this region either in Experiment 2, in the experiment by Rommers et al. (2013) , or in the literature on the semantic N400 effect in non-idiomatic language an interpretation of this effect is difficult.

During a later time window, we did not replicate effects of a late positivity found in Experiment 2 and by Rommers et al. (2013) . This effect was previously interpreted as indexing a violation of the holistic idiom representation. In Experiment 3, we only found consistent differences between correct and unrelated words. Amplitude differences between related and unrelated completions were mixed. Rommers et al. (2013) interpreted the late positive ERP effect as reflecting the difficulty to revise a predicted idiomatic multi-word representation. However, as we did not replicate such a late positive ERP effect with written idioms (Experiment 3), we are not confident about an interpretation at this point.

Like Rommers et al. (2013) , we did not find a pre-N400 component for written idioms in an early time window. This suggests that the early component found in Experiment 2 was indeed specific to processing in the auditory modality ( Connolly & Phillips, 1994 ; Connolly, Phillips, & Forbes, 1995 ).

Different ERP effects obtained for spoken idioms in Experiment 2 and written idioms in Experiment 3 challenge an alternative interpretation of activation effects for semantic associates in our study. Even though some semantically related completions of the idioms we have presented might have a literally plausible interpretation, the results of Experiment 3 reveal that it is unlikely that the N400 is modulated by literal plausibility. If this were the case, we would also see an effect of semantic activation for written idioms in Experiment 3, because we used the same material for both modalities. Instead semantic activation was only observable for the processing of spoken idioms in Experiment 2. Moreover, Rommers et al. (2013) did not find N400 amplitude differences between related and unrelated completions although related completions were rated as more plausible than unrelated completions. For these reasons, we argue that literal plausibility does not account for the effects of semantic co-activation on the N400 component found in Experiment 2. Furthermore, research on plausibility and predictability in literal language suggests that rather than the N400 component, a post-N400 positivity is affected by the plausibility of the interpretation ( DeLong, Quante, & Kutas, 2014 ; Quante, Bölte, & Zwitserlood, 2018 ). In the present study, we found no amplitude reduction of the late positivity for related completions indicating effects of plausibility. Furthermore, in the idiom literature amplitude differences in the N400 were not associated with semantic integration processes ( Canal et al., 2017 ). Altogether, we hypothesize that the reduction of the N400 amplitude for the spoken idioms that we obtained in Experiment 2 results from a short-lived semantic activation of the final constituent.

5 General discussion

In the present study, we aimed to shed light on previous contradictory evidence on the extent to which idioms are processed holistically or decomposed into single items. Indirect evidence for holistic processing stems from studies showing faster processing for idioms compared to novel phrases (e.g., Conklin & Schmitt, 2008 ; Swinney & Cutler, 1979 ; Tabossi et al., 2009 ). Here, we tested idiom processing more directly by measuring their possible decomposition by means of semantic activation of individual idiom components (see e.g., Siyanova-Chanturia, 2015 ). Previous research demonstrated that semantic features of idiom constituents are available at least for priming processes in reading and listening ( Beck & Weber, 2016 ; Holsinger, 2013 ; Smolka et al., 2007 ). However, when focusing on prediction mechanisms in reading, evidence for decomposition in idiom processing was lacking ( Rommers et al., 2013 ). To rule out design and modality-related differences, we measured the level of semantic expectancy during online processing of highly predictive, spoken idioms in an eye-tracking paradigm with written words (Experiment 1) as well as in a semantic expectancy ERP paradigm with spoken (Experiment 2) and written (Experiment 3) idioms.

Across all three experiments, we found evidence that participants built up an expectation of the idiom-final word. They fixated the correct idiom completion well before the idiom fragments presented in Experiment 1 ended, and they showed reduced N400 amplitudes for correct idiom completions compared to unrelated words in Experiments 2 and 3. Based on this, we conclude that idioms and their conventionalized forms can be recognized and activated before their offset ( Libben & Titone, 2008 ; Smolka & Eulitz, 2020 ; Vespignani et al., 2010 ). Together, these findings are evidence for multi-word representation of idioms. It appears that the mental lexicon stores information about the co-occurrence of specific words making up an individual idiom. Activation of respective multi-word representations triggers expectation of individual words that are part of these multi-word expressions. Here, we do not preclude a certain flexibility of these multi-word representations, but propose rather a strong coherence between the words of which they are composed ( Cacciari & Tabossi, 1988 ; Geeraert, Baayen, & Newman, 2017 ; Kyriacou et al., 2020 ; Mancuso et al., 2020 ).

Across eye-tracking and ERP methods with spoken idioms, we found evidence for early, short-lived semantic activation of individual idiom constituents. As soon as participants fixated correct idiom completions, they also fixated respective semantic associates (Experiment 1). In the ERPs, semantic associates of correct idiom completions elicited effects in the same early time window in which correct completions elicited effects (Experiment 2). Since we found anticipation of correct idiom completions in the fixation data, we conclude that early effects for semantic associates of idiom completions in the ERPs indeed relate to pre-activation of idiom constituents. Based on knowledge of conventionalized idiom forms, parsers seem to pre-activate a multi-word representation before the respective idiom is completely available in the auditory signal and this pre-activation includes single word representations that spread semantic activation within the network. It appears that even though literal constituent meanings typically do not contribute to the understanding of the idiomatic meaning, their processing is still automatically carried out. This conclusion is comparable to the notion that semantic processing cannot be switched off, as for example Connolly, Stewart, and Phillips (1990) showed for spoken language processing. We speculate that this is similar to a Stroop-like effect ( Stroop, 1935 ) where the literal meaning of the word is not informative, but is nevertheless activated (cf. Glucksberg, 1993 ; McGlone, Glucksberg, & Cacciari, 1994 ).

It appears that semantic activation of constituent words within spoken idioms rapidly declines over time, as proposed for automatic spreading activation within the semantic network (e.g., Neely, O’Connor, & Calabrese, 2010 ). Neither fixation data nor ERPs gave evidence for long-lasting semantic activation of idiom constituents. Across Experiments 1 and 2, there was no longer a processing benefit for semantic associates compared to unrelated words after respective initial effects. Within spoken idioms, the present effect is comparable to that obtained by Sprenger et al. (2006) , where semantic activation appeared to be strongest during early processing stages.

Here, we tentatively speculate that a rapid decay of semantic activation of constituent associates accounts for the presently and previously found mixed results for spoken and written idioms. Across paradigms using spoken idioms (Experiments 1 and 2), we consistently found evidence for activation of semantic associates of idiom-final words. Using written versions of the same idioms as in Experiments 1 and 2, we did not find effects of semantic activation in Experiment 3 and this replicates results that Rommers et al. (2013) obtained for written idioms (word-by-word presentation). If automatic semantic activation of the idiom constituent decays rapidly, the time between idiom recognition and measurement of the semantic activation is crucial for observing respective effects. In general, it takes more time to present an idiom visually word-by-word (e.g., Experiment 3 of the present study or Rommers et al., 2013 ) than it takes to present a spoken version of the same idiom (e.g., Experiments 1 and 2). According to this timing difference, short-lived semantic activation might be still measurable at final constituents of spoken idioms, while it might have decayed already before the measurement in word-by-word reading ( Rommers et al., 2013 ; Experiment 3 of the present study).

For priming experiments, where semantic spread presumably occurs in a bottom-up fashion, activation of semantic associates of idiom constituents was found for both modalities ( Beck & Weber, 2016 ; Rabanus et al., 2008 ; Smolka et al., 2007 ). Since in those experiments the idiom constituent itself was always presented in the input, the recognition of the idiom and resulting pre-activation of its constituents is not the only source of spreading semantic activation. This led us to conclude that there is an interplay of the processing mechanism (top-down vs. bottom-up) and the modality-related rate of presentation. In addition, the results imply that even the top-down prediction of idiom-final words is sufficient to activate single word meanings, but this is only measurable in the auditory modality in the present experiment. More research is needed to dissociate differences in these processes directly and to validate this explanation.

To account for individual idiom knowledge, we conducted an overt idiom recognition task in all experiments. In Experiment 1, participants had to choose the correct idiom completion among four alternatives. In Experiments 2 and 3, participants had to indicate whether the spoken or written strings were idioms. By performing these tasks, the participants might have been biased to activate canonical idiom forms only. However, if the participants would only have compared the incoming input with the activated idiom form, we should not have obtained a semantic activation of single word meanings in Experiments 1 and 2. In any case, general effects of the task cannot explain the differences between the results regarding activation of associates of idiom constituents of Experiments 2 and 3. Using the same task in both experiments, we show modality-related differences in online processing of idiomatic expressions.

The present results challenge the assumption that idioms are solely unanalyzed “long words” ( Jackendoff, 2002 ). In general, our results support hybrid models such as the Superlemma Hypothesis ( Sprenger et al., 2006 ), in which idioms are represented as both multi-word representations ( superlemmas ) and simple lemmas of single constituents on a lexical level. The hybrid nature of idioms may allow the linguistic system to rely on single constituent and multi-word representations in parallel ( Arnon & Christiansen, 2017 ; Tremblay & Baayen, 2010 ). We hypothesize that the meanings of simple lemmas within idioms are available for only a short time after their activation.

Acknowledgments

We would like to thank Anne Bauch, Sara Beck, Stacie Boswell, Birte Herter, Babette Jakobi, Sören Koch, Tobias Kopp, Matteo Marks, Anne Rau, Ulrike Schild, and Charlotte Veil. We also warmly thank all participants. Furthermore, we would like to thank two anonymous reviewers for their helpful comments on a previous version of the manuscript.

Materials Experiment 1, Experiment 2 and Experiment 3.

Since we planned to test children with the same material and paradigm in the future, we only selected highly familiar short idioms that German children would already be expected to know.

Since we planned to test children with the same material and paradigm in the future, we had to adapt the paradigm. Therefore, we chose this long preview window of the four printed words so that there was enough time to read all four words before the onset of the auditory stimuli. Since the specific idiom knowledge of children is very different, we only wanted to include idioms that are known to the individual children. Therefore, we chose an overt task where participants had to find the correct idiomatic completion.

As in Experiment 1, we chose an overt idiom recognition task because we wanted to conduct the same experiment with children. In order to account for differing idiom knowledge between children, we wanted to include only idioms that participants recognized correctly. A similar idiom recognition task was used in Qualls et al. (2003) .

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 75650358 – SFB 833. The founding source had no involvement in the study.

Inline graphic

  • Altmann G. T. M., Kamide Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73(3), 247–264. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Altmann G. T. M., Kamide Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory and Language, 57(4), 502–518. [ Google Scholar ]
  • Anderson J. E., Holcomb P. J. (1995). Auditory and visual semantic priming using different stimulus onset asynchronies: An event-related brain potential study. Psychophysiology, 32(2), 177–190. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Arnon I., Christiansen M. H. (2017). The role of multiword building blocks in explaining L1–L2 differences. Topics in Cognitive Science, 9(3), 621–636. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Beck S. D., Weber A. (2016). Bilingual and monolingual idiom processing is cut from the same cloth: The role of the L1 in literal and figurative meaning activation. Frontiers in Psychology, 7, 1305. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Berg P., Scherg M. (1994). A multiple source approach to the correction of eye artifacts. Electroencephalography and Clinical Neurophysiology, 90(3), 229–241. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Brothers T., Swaab T. Y., Traxler M. J. (2015). Effects of prediction and contextual support on lexical processing: Prediction takes precedence. Cognition, 136, 135–149. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cacciari C., Corradini P. (2015). Literal analysis and idiom retrieval in ambiguous idioms processing: A reading-time study. Journal of Cognitive Psychology, 27(7), 797–811. [ Google Scholar ]
  • Cacciari C., Tabossi P. (1988). The comprehension of idioms. Journal of Memory and Language, 27(6), 668–683. [ Google Scholar ]
  • Canal P., Pesciarelli F., Vespignani F., Molinaro N., Cacciari C. (2017). Basic composition and enriched integration in idiom processing: An EEG study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(6), 928–943. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Canal P., Vespignani F., Molinaro N., Cacciari C. (2010). Anticipatory mechanisms in idiom comprehension: Psycholinguistic and electrophysiological evidence. In Balconi M. (Ed.), Neuropsychology of Communication (pp. 131–144). Springer. [ Google Scholar ]
  • Carrol G., Conklin K. (2020). Is all formulaic language created equal? Unpacking the processing advantage for different types of formulaic sequences. Language and Speech, 63(1), 95–122. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Collins A. M., Loftus E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82(6), 407–428. [ Google Scholar ]
  • Conklin K., Schmitt N. (2008). Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics, 29(1), 72–89. [ Google Scholar ]
  • Conklin K., Schmitt N. (2012). The processing of formulaic language. Annual Review of Applied Linguistics, 32, 45–61. [ Google Scholar ]
  • Connolly J. F., Phillips N. A. (1994). Event-related potential components reflect phonological and semantic processing of the terminal word of spoken sentences. Journal of Cognitive Neuroscience, 6(3), 256–266. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Connolly J. F., Phillips N. A., Forbes K. A. (1995). The effects of phonological and semantic features of sentence-ending words on visual event-related brain potentials. Electroencephalography and Clinical Neurophysiology, 94(4), 276–287. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Connolly J. F., Phillips N. A., Stewart S. H., Brake W. G. (1992). Event-related potential sensitivity to acoustic and semantic properties of terminal words in sentences. Brain and Language, 43(1), 1–18. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Connolly J. F., Stewart S. H., Phillips N. A. (1990). The effects of processing requirements on neurophysiological responses to spoken sentences. Brain and Language, 39(2), 302–318. [ DOI ] [ PubMed ] [ Google Scholar ]
  • DeLong K. A., Quante L., Kutas M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Diaz M. T., Swaab T. Y. (2007). Electrophysiological differentiation of phonological and semantic integration in word and sentence contexts. Brain Research, 1146, 85–100. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Federmeier K. D., Kutas M. (1999). A rose by any other name: Long-term memory structure and sentence processing. Journal of Memory and Language, 41(4), 469–495. [ Google Scholar ]
  • Federmeier K. D., McLennan D. B., De Ochoa E., Kutas M. (2002). The impact of semantic memory organization and sentence context information on spoken language processing by younger and older adults: An ERP study. Psychophysiology, 39(2), 133–146. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Ferreira F., Foucart A., Engelhardt P. E. (2013). Language processing in the visual world: Effects of preview, visual complexity, and prediction. Journal of Memory and Language, 69(3), 165–182. [ Google Scholar ]
  • Friedrich C. K., Kotz S. A. (2007). ERP evidence of form and meaning coding during online speech recognition. Journal of Cognitive Neuroscience, 19(4), 594–604. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Geeraert K., Baayen R. H., Newman J. (2017). Idiom variation: Experimental data and a blueprint of a computational model. Topics in Cognitive Science, 9(3), 653–669. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Gibbs R. W. (1980). Spilling the beans on understanding and memory for idioms in conversation. Memory & Cognition, 8(2), 149–156. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Glucksberg S. (1993). Idiom meanings and allusional content. In Cacciari C. T., Tabossi P. (Eds.), Idioms: Processing, Structure, and Interpretation (pp. 3–26). Erlbaum. [ Google Scholar ]
  • Günther F., Dudschig C., Kaup B. (2015). LSAfun—An R package for computations based on Latent Semantic Analysis. Behavior Research Methods, 47(4), 930–944. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Hagoort P., Brown C. M. (2000). ERP effects of listening to speech compared to reading: The P600/SPS to syntactic violations in spoken sentences and rapid serial visual presentation. Neuropsychologia, 38(11), 1531–1549. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Heil M., Rolke B., Pecchinenda A. (2004). Automatic semantic activation is no myth: Semantic context effects on the N400 in the letter-search task in the absence of response time effects. Psychological Science, 15(12), 852–857. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Holsinger E. (2013). Representing idioms: Syntactic and contextual effects on idiom processing. Language and Speech, 56(3), 373–394. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Huettig F., McQueen J. M. (2011). The nature of the visual environment induces implicit biases during language-mediated visual search. Memory & Cognition, 39(6), 1068–1084. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Huettig F., Rommers J., Meyer A. S. (2011). Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica, 137(2), 151–171. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Ito A., Husband E. M. (2017). How robust are effects of semantic and phonological prediction during language comprehension? A visual world eye-tracking study. 10.13140/RG.2.2.33577.49765 [ DOI ]
  • Jackendoff R. (2002). Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Kamide Y., Altmann G. T. M., Haywood S. L. (2003). The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49(1), 133–156. [ Google Scholar ]
  • Kuperberg G., Wlotko E. (2020). A tale of two positivities and the N400: Distinct neural signatures are evoked by confirmed and violated predictions at different levels of representation. Journal of Cognitive Neuroscience, 32(1), 12–35. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kutas M., Federmeier K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology, 62, 621–647. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kutas M., Hillyard S. A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307(5947), 161–163. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Kyriacou M., Conklin K., Thompson D. (2020). Passivizability of idioms: Has the wrong tree been barked up? Language and Speech, 63(29), 404–435. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Laszlo S., Federmeier K. D. (2009). A beautiful day in the neighborhood: An event-related potential study of lexical relationships and prediction in context. Journal of Memory and Language, 61(3), 326–338. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Libben M. R., Titone D. A. (2008). The multidetermined nature of idiom processing. Memory and Cognition, 36(6), 1103–1121. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Mancuso A., Elia A., Laudanna A., Vietri S. (2020). The role of syntactic variability and literal interpretation plausibility in idiom comprehension. Journal of Psycholinguistic Research, 49(1), 99–124. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Marantz A. (2005). Generative linguistics within the cognitive neuroscience of language. The Linguistic Review, 22(2–4), 429–445. [ Google Scholar ]
  • McGlone M. S., Glucksberg S., Cacciari C. (1994). Semantic productivity and idiom comprehension. Discourse Processes, 17(2), 167–190. [ Google Scholar ]
  • Mirman D., Dixon J. A., Magnuson J. S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), 475–494. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Molinaro N., Carreiras M. (2010). Electrophysiological evidence of interaction between contextual expectation and semantic integration during the processing of collocations. Biological Psychology, 83(3), 176–190. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Neely J. H., O’Connor P. A., Calabrese G. (2010). Fast trial pacing in a lexical decision task reveals a decay of automatic semantic activation. Acta Psychologica, 133(2), 127–136. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Nieuwland M. S. (2019). Do “early” brain responses reveal word form prediction during language comprehension? A critical review. Neuroscience and Biobehavioral Reviews, 96, 367–400. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Oldfield R. C. (1971). The assessment and analysis of handedness: The Edinburgh inventory. Neuropsychologia, 9(1), 97–113. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Qualls C. D., Treaster B., Blood G. W., Hammer C. S. (2003). Lexicalization of idioms in urban fifth graders: A reaction time study. Journal of Communication Disorders, 36(4), 245–261. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Quante L., Bölte J., Zwitserlood P. (2018). Dissociating predictability, plausibility and possibility of sentence continuations in reading: Evidence from late-positivity ERPs. PeerJ, 6, e5717. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rabanus S., Smolka E., Streb J., Rösler F. (2008). Die mentale Verarbeitung von Verben in idiomatischen Konstruktionen. Zeitschrift für Germanistische Linguistik, 36(1), 27–47. [ Google Scholar ]
  • Rommers J., Dijkstra T., Bastiaansen M. (2013). Context-dependent semantic processing in the human brain: Evidence from idiom comprehension. Journal of Cognitive Neuroscience, 25(5), 762–776. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Saslow M. G. (1967). Effects of components of displacement-step stimuli upon latency for saccadic eye movement. Journal of the Optical Society of America, 57(8), 1024–1029. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Siyanova-Chanturia A. (2015). On the “holistic” nature of formulaic language. Corpus Linguistics and Linguistic Theory, 11(2), 285–301. [ Google Scholar ]
  • Siyanova-Chanturia A., Conklin K., Caffarra S., Kaan E., van Heuven W. J. B. (2017). Representation and processing of multi-word expressions in the brain. Brain and Language, 175, 111–122. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Siyanova-Chanturia A., Conklin K., Schmitt N. (2011). Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers. Second Language Research, 27(2), 251–272. [ Google Scholar ]
  • Smolka E., Eulitz C. (2020). Can you reach for the planets or grasp at the stars? Modified noun, verb, or preposition constituents in idiom processing. In Schulte im Walde S., Smolka E. (Eds.), The Role of Constituents in Multiword Expressions: An Interdisciplinary, Cross-lingual Perspective (pp. 179–204). Language Science Press. [ Google Scholar ]
  • Smolka E., Rabanus S., Rösler F. (2007). Processing verbs in German idioms: Evidence against the Configuration Hypothesis. Metaphor and Symbol, 22(3), 213–231. [ Google Scholar ]
  • Snider N., Arnon I. (2012). A unified lexicon and grammar? Compositional and non-compositional phrases in the lexicon. In Gries S., Divjak D. (Eds.), Frequency Effects in Language (pp. 127–163). Mouton de Gruyter. [ Google Scholar ]
  • Sprenger S., Levelt W., Kempen G. (2006). Lexical access during the production of idiomatic phrases. Journal of Memory and Language, 54(2), 161–184. [ Google Scholar ]
  • Strandburg R., Marsh J., Brown W., Asarnow R., Guthrie D., Higa J. (1993). Event-related potentials in high-functioning adult autistics: Linguistic and nonlinguistic visual information processing tasks. Neuropsychologia, 31(5), 412–434. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Stroop J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology: Human Perception and Performance, 18(6), 643–662. [ Google Scholar ]
  • Swinney D. A., Cutler A. (1979). The access and processing of idiomatic expressions. Journal of Verbal Learning and Verbal Behavior, 18(5), 523–534. [ Google Scholar ]
  • Tabossi P., Fanari R., Wolf K. (2009). Why are idioms recognized fast? Memory & Cognition, 37(4), 529–540. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Thornhill D. E., Van Petten C. (2012). Lexical versus conceptual anticipation during sentence processing: Frontal positivity and N400 ERP components. International Journal of Psychophysiology, 83(3), 382–392. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Titone D. A., Lovseth K., Kasparian K., Tiv M. (2019). Are figurative interpretations of idioms directly retrieved, compositionally built, or both? Canadian Journal of Experimental Psychology, 73(4), 216–230. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Tremblay A., Baayen R. H. (2010). Holistic processing of regular four-word sequences: A behavioral and ERP study of the effects of structure, frequency, and probability on immediate free recall. In Wood D. (Ed.), Perspectives on Formulaic Language: Acquisition and Communication (pp. 151–173). Continuum. [ Google Scholar ]
  • Tremblay A., Derwing B., Libben G., Westbury C. (2011). Processing advantages of lexical bundles: Evidence from self-paced reading and sentence recall tasks. Language Learning, 61(2), 569–613. [ Google Scholar ]
  • Underwood G., Schmitt N., Galpin A. (2004). The eyes have it: An eye-movement study into the processing of formulaic sequences. In Schmitt N. (Ed.), Formulaic Sequences (pp. 155–172). John Benjamins. [ Google Scholar ]
  • Van Den Brink D., Brown C. M., Hagoort P. (2001). Electrophysiological evidence for early contextual influences during spoken-word recognition: N200 versus N400 effects. Journal of Cognitive Neuroscience, 13(7), 967–985. [ DOI ] [ PubMed ] [ Google Scholar ]
  • van Ginkel W., Dijkstra T. (2019). The tug of war between an idiom’s figurative and literal meanings: Evidence from native and bilingual speakers. Bilingualism: Language and Cognition, 23(1), 131–147. [ Google Scholar ]
  • Van Petten C., Coulson S., Rubin S., Plante E., Parks M. (1999). Time course of word identification and semantic integration in spoken language. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(2), 394–417. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Vespignani F., Canal P., Molinaro N., Fonda S., Cacciari C. (2010). Predictive mechanisms in idiom comprehension. Journal of Cognitive Neuroscience, 22(8), 1682–1700. [ DOI ] [ PubMed ] [ Google Scholar ]
  • Wray A. (2005). Formulaic Language and the Lexicon. Cambridge University Press. [ Google Scholar ]
  • View on publisher site
  • PDF (1.1 MB)
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

How children understand idioms in discourse

  • PMID: 2474558
  • DOI: 10.1017/s0305000900010473

Some studies have shown that children tend to interpret figurative language literally. Our hypothesis is that they can reach an idiomatic competence if idioms are presented within a rich informational environment allowing children to grasp their figurative sense. First and third graders were presented with narratives biased both to the figurative meaning of idioms (experiment 1) and to the literal meaning (experiment 2) and then given a comprehension task. Experiment 3 was designed to investigate children's production of idioms as compared to the comprehension abilities explored in experiments 1 and 2. Results show that informative contexts can improve children's ability to perceive idiomatic meanings even at the age of seven; and that children are less able to produce idioms than to comprehend them. Generally results emphasize that children seem able to perceive that language can be both figurative and literal.

Publication types

  • Research Support, Non-U.S. Gov't
  • Language Development*
  • Linguistics

The Role of Syntactic Variability and Literal Interpretation Plausibility in Idiom Comprehension

  • Published: 20 September 2019
  • Volume 49 , pages 99–124, ( 2020 )

Cite this article

experiment literal meaning

  • Azzurra Mancuso   ORCID: orcid.org/0000-0002-8268-7680 1 ,
  • Annibale Elia 1 ,
  • Alessandro Laudanna 1 &
  • Simonetta Vietri 1  

954 Accesses

17 Citations

Explore all metrics

Idioms have been traditionally described as fixed expressions, highly restricted in their realization. Corpus and experimental studies, however, have shown that they are more variable than previously thought. The issue of idiom syntax has received a renewed interest, since it also addresses the problem of how idioms are mentally stored. Another relevant topic is the role played by literal plausibility of idioms, which refers to the likelihood of an idiomatic expression for a plausible literal interpretation. In this research, we addressed both topics, by means of three cross-modal priming experiments, where canonical idioms and variants (i.e., passive form and left dislocation) were followed by words related to the idiomatic meaning of sentences ( break the ice - embarrassment ) or literal meaning of single words ( break the ice - cold ). The results seem to indicate that idioms do not have a special status in terms of syntactic variability: they behave like literal sentences and do not lose their idiomatic interpretation if manipulated. Moreover, data reveal processing differences between literally plausible and implausible idioms. The results are discussed within current theories about idiom representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

experiment literal meaning

Second Language Learners’ Processing of Idiomatic Expressions: Does Compositionality Matter?

experiment literal meaning

Context and Literality in Idiom Processing: Evidence from Self-Paced Reading

experiment literal meaning

Role of Affective Factors and Concreteness on the Processing of Idioms

Arnon, I., & Snider, N. (2010). More than words: Frequency effects for multi-word phrases. Journal of Memory and Language, 62 (1), 67–82.

Google Scholar  

Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R . Cambridge: Cambridge University Press.

Barlow, M., & Kemmer, S. (2000). Usage-based models of language . Stanford, CA: CSLI Publications.

Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43 (3), 209–226.

Bates, D., Maechler, M., & Bolker, B. (2013 ). lme4: Linear mixed-effects models using S4 classes . R package version 0.999999-2.

Bobrow, S. A., & Bell, S. M. (1973). On catching on to idiomatic expressions. Memory & Cognition, 1 (3), 343–346.

Boulenger, V., Shtyrov, Y., & Pulvermüller, F. (2012). When do you grasp the idea? MEG evidence for instantaneous idiom understanding. Neuroimage, 59 (4), 3502–3513.

PubMed   Google Scholar  

Brannon, L. L. (1975). On the understanding of idiomatic expressions (Doctoral dissertation, ProQuest Information & Learning).

Cacciari, C. (2014). Processing multiword idiomatic strings: Many words in one? The Mental Lexicon, 9 (2), 267–293.

Cacciari, C., & Corradini, P. (2015). Literal analysis and idiom retrieval in ambiguous idioms processing: A reading-time study. Journal of Cognitive Psychology, 27 (7), 797–811.

Cacciari, C., Padovani, R., & Corradini, P. (2007). Exploring the relationship between individuals’ speed of processing and their comprehension of spoken idioms. European Journal of Cognitive Psychology, 19 (3), 417–445.

Cacciari, C., & Tabossi, P. (1988). The comprehension of idioms. Journal of Memory and Language, 27 (6), 668–683.

Canal, P., Pesciarelli, F., Vespignani, F., Molinaro, N., & Cacciari, C. (2015). Electrophysiological correlates idioms comprehension: Semantic composition does not follow lexical retrieval. In NetWordS (pp. 98-101).

Chafe, W. L. (1970). Meaning and the structure of language . Chicago: University of Chicago Press.

Colombo, L. (1993). The comprehension of ambiguous idioms in context. In C. Cacciari & P. Tabossi (Eds.), Idioms: Processing, structure, and interpretation (pp. 163–200). Hillsdale, NJ: Erlbaum.

Colombo, L. (1998). Role of context in the comprehension of ambiguous Italian idioms. In D. Hillert (Ed.), Sentence processing: A cross-linguistic perspective . Syntax and Semantics (Vol. 31, pp. 405–425). New York: Academic Press.

Cronk, B. C., Lima, S. D., & Schweigert, W. A. (1993). Idioms in sentences: Effects of frequency, literalness, and familiarity. Journal of Psycholinguistic Research, 22 (1), 59–82.

Cronk, B. C., & Schweigert, W. A. (1992). The comprehension of idioms: The effects of familiarity, literalness, and usage. Applied Psycholinguistics, 13 (2), 131–146.

Cutting, J. C., & Bock, K. (1997). That’s the way the cookie bounces: Syntactic and semantic components of experimentally elicited idiom blends. Memory & cognition, 25 (1), 57–71.

Duffley, P. J. (2013). How creativity strains conventionality in the use of idiomatic expressions. In M. Borkent, B. Dancygier, & J. Hinnell (Eds.), Language and the creative mind (pp. 49–61). Stanford, CA: CSLI Publications.

Estill, R. B., & Kemper, S. (1982). Interpreting idioms. Journal of Psycholinguistic Research, 11 (6), 559–568.

Foss, D. J. (1970). Some effects of ambiguity upon sentence comprehension. Journal of Verbal Learning and Verbal Behavior, 9 (6), 699–706.

Fraser, B. (1970). Idioms within a transformational grammar. Foundations of language , 6 (1), 22–42.

Geeraert, K., Newman, J., & Baayen, R. H. (2017a). Idiom variation: Experimental data and a blueprint of a computational model.  Topics in Cognitive Science ,  9 (3), 653–669.

Geeraert, K., Newman, J., & Baayen, R. H. (2017b). Understanding idiomatic variation. In  Proceedings of the 13th Workshop on Multiword Expressions  (pp. 80–90). Valencia: Association for Computational Linguistics.

Gibbs, R. W. (1980). Spilling the beans on understanding and memory for idioms in conversation. Memory & Cognition, 8 (2), 149–156.

Gibbs, R. W., & Nayak, N. P. (1989). Psycholinguistic studies on the syntactic behavior of idioms. Cognitive Psychology, 21 (1), 100–138.

Gibbs, R. W., Nayak, N. P., Bolton, J. L., & Keppel, M. E. (1989a). Speakers’ assumptions about the lexical flexibility of idioms. Memory & Cognition, 17 (1), 58–68.

Gibbs Jr, R. W., Nayak, N. P., & Cutting, C. (1989b). How to kick the bucket and not decompose: Analyzability and idiom processing. Journal of Memory and Language, 28 (5), 576–593.

Glucksberg, S., McGlone, M. S., Grodzinsky, Y., & Amunts, K. (2001).  Understanding figurative language: From metaphor to idioms  (No. 36). Oxford University Press on Demand.

Holsinger, E., & Kaiser, E. (2013). Processing (non) compositional expressions: Mistakes and recovery. Journal of Experimental Psychology. Learning, Memory, and Cognition, 39 (3), 866.

Jackendoff, R. (1997). Twistin’the night away. Language, 73, 534–559.

Konopka, A. E., & Bock, K. (2009). Lexical or syntactic control of sentence formulation? Structural generalizations from idiom production. Cognitive Psychology, 58 (1), 68–101.

Langlotz, A. (2006). Idiomatic creativity: A cognitive-linguistic model of idiom-representation and idiom-variation in English (Vol. 17). Amsterdam: John Benjamins Publishing.

Libben, M. R., & Titone, D. A. (2008). The multidetermined nature of idiom processing. Memory & Cognition, 36 (6), 1103–1121.

Marelli, M. (2017). Word-embeddings Italian semantic spaces: A semantic model for psycholinguistic research. Psihologija, 50 (4), 503–520.

McGlone, M. S., Glucksberg, S., & Cacciari, C. (1994). Semantic productivity and idiom comprehension. Discourse Processes, 17 (2), 167–190.

Moon, R. (1998). Fixed expressions and idioms in English: A corpus-based approach . Oxford: Oxford University Press.

Mueller, R. A., & Gibbs, R. W. (1987). Processing idioms with multiple meanings. Journal of Psycholinguistic Research, 16 (1), 63–81.

Ortony, A., Schallert, D. L., Reynolds, R. E., & Antos, S. J. (1978). Interpreting metaphors and idioms: Some effects of context on comprehension. Journal of Verbal Learning and Verbal Behavior, 17 (4), 465–477.

Popiel, S. J., & McRae, K. (1988). The figurative and literal senses of idioms, or all idioms are not used equally. Journal of Psycholinguistic Research, 17 (6), 475–487.

Schröder, D. (2013). The syntactic flexibility of idioms: A corpus-based approach . München: Akademische Verlagsgemeinschaft München.

Schweigert, W. A. (1986). The comprehension of familiar and less familiar idioms. Journal of Psycholinguistic Research, 15 (1), 33–45.

Sprenger, S. A., Levelt, W. J., & Kempen, G. (2006). Lexical access during the production of idiomatic phrases. Journal of Memory and Language, 54 (2), 161–184.

Swinney, D. A., & Cutler, A. (1979). The access and processing of idiomatic expressions. Journal of Verbal Learning and Verbal Behavior, 18 (5), 523–534.

Tabossi, P., Arduino, L., & Fanari, R. (2011). Descriptive norms for 245 Italian idiomatic expressions. Behavior Research Methods, 43 (1), 110–123.

Tabossi, P., Fanari, R., & Wolf, K. (2008). Processing idiomatic expressions: Effects of semantic compositionality. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34 (2), 313.

Tabossi, P., Wolf, K., & Koterle, S. (2009). Idiom syntax: Idiosyncratic or principled? Journal of Memory and Language, 61 (1), 77–96.

Titone, D. A., & Connine, C. M. (1994a). Comprehension of idiomatic expressions: Effects of predictability and literality. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20 (5), 1126.

Titone, D. A., & Connine, C. M. (1994b). Descriptive norms for 171 idiomatic expressions: Familiarity, compositionality, predictability, and literality. Metaphor and Symbol, 9 (4), 247–270.

Titone, D. A., & Connine, C. M. (1999). On the compositional and non-compositional nature of idiomatic expressions. Journal of Pragmatics, 31 (12), 1655–1674.

Titone, D., & Libben, M. (2014). Time-dependent effects of decomposability, familiarity and literal plausibility on idiom priming: A cross-modal priming investigation. The Mental Lexicon, 9 (3), 473–496.

Tremblay, A., & Baayen, R. H. (2010). Holistic processing of regular four-word sequences: A behavioral and ERP study of the effects of structure, frequency, and probability on immediate free recall. In D. Wood (Ed.),  Perspectives on formulaic language: Acquisition and communication (pp. 151–173). Bloomsbury Publishing.

van Ginkel, W., & Dijkstra, T. (2019). The tug of war between an idiom’s figurative and literal meanings: Evidence from native and bilingual speakers.  Bilingualism: Language and Cognition . https://doi.org/10.1017/s1366728918001219 .

Article   Google Scholar  

Vespignani, F., Canal, P., Molinaro, N., Fonda, S., & Cacciari, C. (2010). Predictive mechanisms in idiom comprehension. Journal of Cognitive Neuroscience, 22 (8), 1682–1700.

Vietri, S. (2014). Idiomatic constructions in Italian: A Lexicon-grammar approach (Vol. 31). Amsterdam: John Benjamins Publishing Company.

Zhang, M., Lu, A., & Song, P. (2017). ERP evidence for the activation of syntactic structure during comprehension of lexical idiom. Journal of Psycholinguistic Research, 46 (5), 1137–1148.

Download references

Acknowledgements

We thank the anonymous reviewers for their helpful comments on a previous version of the manuscript. University of Salerno.

Author information

Authors and affiliations.

Department of Political and Communication Sciences, University of Salerno, Fisciano, SA, Italy

Azzurra Mancuso, Annibale Elia, Alessandro Laudanna & Simonetta Vietri

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Azzurra Mancuso .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We report all idioms in their citation form and targets. For literally plausible idioms, we report the corresponding English idiom, when it exists, or the paraphrase of the idiomatic meaning (in brackets). We also report the literal meaning of the expression (in italics). In many cases, the English idiom corresponds exactly to the Italian (e.g., break the ice ). For literally implausible idioms, we report the corresponding English idiom, when it exists, or the paraphrase of the idiomatic meaning (in brackets). We also report a word-by-word English translation (in italics), which does not correspond to a literal meaning of the expression (literally implausible idioms do not have a literal meaning by definition). The last constituent of each literally implausible idiom (and its translation) is underlined, since the target adopted in the Experiment 3 is related to its meaning.

  • *In the Exp 3, the idiom “Battere la fiacca” (to loaf about) was substituted with “Confondere le acque ” (to cloud the issue/ to confuse the waters )
  • **In the Exp 3, the idiom “Tirare le cuoia” (to kick the bucket) was substituted with “Farsi le ossa ” (to make one’s bones )

Rights and permissions

Reprints and permissions

About this article

Mancuso, A., Elia, A., Laudanna, A. et al. The Role of Syntactic Variability and Literal Interpretation Plausibility in Idiom Comprehension. J Psycholinguist Res 49 , 99–124 (2020). https://doi.org/10.1007/s10936-019-09673-8

Download citation

Published : 20 September 2019

Issue Date : February 2020

DOI : https://doi.org/10.1007/s10936-019-09673-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Idiom processing
  • Cross-modal priming
  • Syntactic flexibility
  • Literal plausibility
  • Idiom variants
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Experiment Definition in Science

    experiment literal meaning

  2. Literal v. Inferential Questions

    experiment literal meaning

  3. PPT

    experiment literal meaning

  4. What is the Meaning of Experiment

    experiment literal meaning

  5. Definition & Meaning of "Experiment"

    experiment literal meaning

  6. Scientific method (Biology A)

    experiment literal meaning

COMMENTS

  1. The role of literal meaning in figurative language comprehension

    Experiment 1—literal meaning in metaphor comprehension. In this experiment, we compared the processing of nominal metaphors with that of literal expressions in German to investigate the time-course of metaphor comprehension and whether the literal meaning of a word is activated in the processing of a metaphor. First, the methods applied in ...

  2. The tug of war between an idiom's figurative and literal meanings

    In two lexical-decision experiments, we investigated the processing of figurative and literal meaning in idioms. Dutch native and German-Dutch bilingual speakers responded to target words presented after a minimal context idiom prime (e.g., 'He kicked the bucket').

  3. Frontiers

    Experiment 4 showed that the video-clip primes used in Experiments 1-3 facilitated the retrieval of the encoded, literal meaning of different verbs of physical containment. This finding suggests that participants were able to derive the conceptual feature of physical containment from the videos in the match conditions, since this is the key ...

  4. Activation of Literal Word Meanings in Idioms: Evidence from Eye

    In two experiments conducted by Sprenger and colleagues (2006, Experiment 2 and Experiment 3), participants read idiom fragments (e.g., Jan liep tegen de [lamp], literally translated: Jan walked against the [lamp], meaning "to get caught" in Dutch) and were asked to complete the idiom by speaking aloud the final, missing noun (e.g., lamp).

  5. Processing of literal and metaphorical meanings in polysemous verbs: An

    This experiment investigates whether the meaning of indirect metaphorical expressions is as easily accessed as the literal meaning. The experiment was a cross-modal semantic priming study combined with a lexical decision task in which reaction times of the answers were recorded. 3.1.

  6. Full article: Losing the thread: how three- and five-year-olds predict

    For Experiment 1, it was expected that five-year-olds would use the compositional meaning of the idioms to predict the outcome of the stories to a greater extent than three-year-olds, due to a developing metalinguistic reflexivity, evidenced through their higher rate of selection of the literal option (i.e. holding the speaker at their word).

  7. How children understand idioms in discourse

    First and third graders were presented with narratives biased both to the figurative meaning of idioms (experiment 1) and to the literal meaning (experiment 2) and then given a comprehension task. Experiment 3 was designed to investigate children's production of idioms as compared to the comprehension abilities explored in experiments 1 and 2.

  8. Activation of Literal Word Meanings in Idioms: Evidence from Eye

    where the literal meaning of the word is not informative, but is nevertheless activated (cf. Glucksberg, 1993; McGlone, Glucksberg, & Cacciari, 1994). ... word meanings in Experiments 1 and 2. In ...

  9. Obligatory processing of literal and nonliteral meanings in verbal irony

    In two experiments, the authors tested the hypothesis that some portion of the literal meaning of ironic remarks is processed automatically, along with the intended meaning. In experiment 1, 49 undergraduates took longer to judge the evaluative tone (positive or negative) of utterances used ironically than used literally, demonstrating that the literal meaning of the ironic utterances was ...

  10. The Role of Syntactic Variability and Literal Interpretation

    This seems to confirm the hypothesis that literal plausibility affects idiom processing: when idioms are literally plausible, the meaning of single words is compatible with one possible interpretation of sentences: it resulted a significant semantic priming both on targets related to the idiomatic meaning of the sentence (Experiment 2) and to ...