Phonetic Cues and Dramatic Function Artistic Recitation of Metered Speech

by Reuven Tsur

September 25, 2002


This article attempts a brief synthesis of two of my research areas: expressive sound patterns and the performance of poetic rhythm, focussed on Simon Russel Beale's performance of Gloucester's first soliloquy in Richard III. It explores three structural relationships between phonetic cues and their effects: redundancy (when several phonetic cues combine to the same effect); conflicting cues (which serve to convey conflicting prosodic effects by the same stretch of speech); and overdetermination (when one phonetic cue serves to convey a variety of unrelated -- e.g., phonological, rhythmical and expressive -- effects). Iv·n FŰnagy speaks of dual coding of phonetic cues; the same cues convey phonological and emotive information. This article proposes "triple coding": the same cues convey phonological, emotive and rhythmic information.


        This paper explores the artistic recitation of metered dramatic speech.1 By the same token, it attempts a brief synthesis of two of my research areas as expounded in two of my earlier books, What Makes Sound Patterns Expressive: The Poetic Mode of Speech-Perception (1992), and Poetic Rhythm: Structure and Performance -- An Empirical Study in Cognitive Poetics (1998). It is a micro-scale study, focusing on certain aspects of phonetic cues. Consequently, the complexity of the issues involved must be demonstrated through a very small number of examples. So, I will confine myself to Gloucester's first speech in Richard III. The phrase "metered dramatic speech" suggests that phonetic cues may serve in it three different functions: phonological, expressive and rhythmic. In other words, they may deviate from "ordinary" speech under the pressure of the rhythmic and the expressive needs. I will explore three structural relationships between phonetic cues and their effects: redundancy (when several phonetic cues combine to the same effect); conflicting cues (which serve to convey conflicting prosodic effects by the same stretch of speech); and overdetermination (when one phonetic cue serves to convey a variety of unrelated -- e.g., phonological, rhythmical and expressive -- effects). The ensuing discussion will be divided into six sections: first, I will reproduce some of my assumptions and findings concerning the rhythmical performance of poetry; second, I will draw upon Iván Fónagy's explorations of the expressive functions of vocal style; third, I will offer a brief interpretation of Gloucester's speech; fourth, I will briefly take care of two preliminary issues required for an understanding of the main issues of the paper; the fifth and longest section will explore in great detail how these principles work in a small sample of lines in Gloucester's speech, on a commercial CD of Naxos (NA201512): William Shakespeare -- Great Speeches and Soliloquies.2   Finally, I will consider the solutions offered to two (rare) instances of violation of metre by what Halle and Keyser call "stress maxima in weak positions".

The Rhythmical Performance of Poetry

     This paper offers further empirical evidence in favour of my conception of poetic rhythm and performance as presented in my book Poetic Rhythm: Structure and Performance -- An Empirical Study in Cognitive Poetics.3 It claims that in an enjambment, for instance, the performer may convey both the verse line boundary and the run-on sentence as perceptual units, however strained, by having recourse to conflicting phonetic cues: cues of continuity and discontinuity simultaneously. In my book I provided some empirical evidence for this assumption.

I have adopted Wellek and Warren's position, who argue in their Theory of Literature (1956, Chapter 13) that in order to account for poetic rhythm, one must assume the existence of not one, but three metrical dimensions: prose rhythm, metric pattern, and performance (generative metrists have reinvented the first two of them). My recent work has been devoted to the hitherto neglected performance dimension. In my 1998 book I stated my position with reference to two issues in a recent "state-of-the-art" summary of performance, the "Performance" entry of The New Princeton Encyclopedia of Poetry and Poetics (1993). The first issue concerns delivery style: "C. S. Lewis once identified two types of performers of metrical verse: 'Minstrels' (who recite in a wooden singsong voice, letting scansion override verse) and 'Actors' (who give a flamboyantly expressive recitation, ignoring meter altogether)" (893). I have claimed that in-between these two delivery styles there is a third one, which I call "rhythmical performance", and that this "type" is at the very core of poetic rhythm. The second issue concerns ambiguity. "Chatman isolates a central difference between the reading and scansion of poems on the one hand and their performance on the other: in the former two activities, ambiguities of interpretation can be preserved and do not have to be settled one way or the other ('disambiguated'). But in performance, all ambiguities have to be resolved before or during delivery. Since the nature of performance is linear and temporal, sentences can only be read aloud once and must be given a specific intonational pattern. Hence in performance, the performer is forced to choose between alternative intonational patterns and their associated meanings" (ibid.; cf. e.g. Chatman, 1965, 1966). I argued that this is not so. I also argued that the two issues are intimately related. In Wellek and Warren's terms, the Minstrel subdues prose rhythm, and foregrounds the metric pattern; the Actor subdues the metric pattern in favour of the prose rhythm. For Chatman this may be a slight exaggeration, but in principle this is how things are and should be: when prose rhythm and metre conflict, "the performer is forced to choose between alternative intonational patterns". My position is that there is a third, "rhythmical performance", in which both metric pattern and linguistic stress pattern can be accommodated, such that both are established in the listener's perception. The same holds true for the conflicting intonation patterns articulating the linguistic unit (the phrase or sentence), and the metric unit (the line). This is precisely what the perceived rhythm of poetry is about, and by no means a side issue.

Some reciters of poetry adopt one or another type of solution quite randomly; but some make a deliberate choice in adopting a consistent delivery style. I personally believe that rhythmic complexities arising from conflicting patterns are there in order to realize them in vocal performance too. But in our cultural situation both the "actor's approach" and the "rhythmical performance" are considered legitimate. At any rate, my treatment of the issue will be descriptive and not evaluative. In this paper I will argue that "flamboyantly expressive recitation" and "rhythmical performance" are not mutually exclusive. What is more, I will also argue that "rhythmical performance" frequently utilizes vocal resources originally developed for expressive purposes.

In my 1977 book, A Perception-Oriented Theory of Metre, I suggested that when the endings of the syntactic unit and the metric unit do not coincide (that is, when syntax is run-on from one line to the other), the reciter may indicate continuity and discontinuity at one and the same time by having recourse to conflicting cues. I came to this conclusion in a speculative manner. Twenty years later, in his master's thesis, an empirical study of enjambment, Tom Barney (1990) found ample empirical support for this assumption. This he did without having heard of my work before. I have adapted his techniques to a wide range of problems discussed in my 1977 book. My own way in this empirical research is to collect judgments from students, colleagues or my research associates whether the performer was successful in conveying, e.g., the conflicting aspects of an enjambment. And if possible, I try to compare alternative possiblities. Then I am looking for cues in the phonetic structures of the recordings, trying to find support for the intuitive judgments.

Barney relied in his research on a paper by Gerry Knowles (1991), in which he investigated the nature of tone-groups. Knowles distinguished internally defined prosodic patterns and external discontinuities at the tone-group boundaries. The former consist in some consistent F0 pattern ("intonation pattern" -- in plain English) used in ordinary speech; the latter are temporal discontinuation (pause), pitch discontinuation (a sudden change in F0) and segmental discontinuation (that is, in normal speech the articulation of adjacent words is overlapping; when there is no overlap, it may count as discontinuity, even if there is no pause). Glottal stops in words beginning with a vowel, or word-final stop releases too may indicate segmental discontinuation (see below). This would be the most evasive type of discontinuity. "The important distinction that seems to be emerging is between boundaries with or without pauses". In what follows, I shall explore how these correlates of tone-group boundaries can be exploited as conflicting cues for the perceptual accommodation of the conflicting patterns of speech and versification.

One of the most conspicuous kinds of segmental discontinuity is the prolongation of a phoneme or of a syllable at the end of an utterance, announcing (very much like fermata in music) that the preceding unit has come to an end. Prolongation is, in fact, a double-edged phenomenon, that is, in different contexts it has different, sometimes even opposite, effects. From a perceptual point of view, prolongation indicates lack of forward movement. Therefore, when we have reason to suppose that it occurs at the end of some perceptual unit, it will be perceived as reinforcing the sense of rest; when it occurs in the middle of some forward movement, it is perceived as an arrest, arousing strong desire for change. While this is most useful in the kind of research I am engaged in, there is a big problem with this notion. There is no standard by which we can determine whether a phoneme or sequence of phonemes is longer or shorter than ought to be. Consequently, one must rely in this respect on one's intuitive judgment, or some roundabout reasoning about measurements and comparisons. In this expanded version, I will try out a new method, with reference to my last two examples: comparing the word in the poetic context to readings in the audio version of Merriam-Webster's Collegiate Dictionary, which presents the word in its pronunciation as a single word out of context.

Expressive Functions of Vocal Style

     For certain purposes, speakers may deviate from the "ordinary" articulation of phonetic cues: they may, for instance, overarticulate, underarticulate, or distort certain phonemes or phonetic cues. The Hungarian linguist Iván Fónagy is the greatest authority regarding the expressive functions of vocal style. Instead of getting entangled in elaborate expositions, I will briefly present the issue via one of Fónagy's illuminating examples.

     According to the evidence of facial cinematography, Hungarian or French actresses pronounce /i/ with rounded lips when they mimic a young mother who says tenderly így ("like that") or mais si ("yes, indeed") to her child.

However, subjects who heard the films believed they heard an "i," despite the labialization, which ordinarily transforms [i] into [yl (as in French sure -- RT), apparently on the basis of context and situation. Though the speakers deformed the habitual pronunciation of these vowels, their auditors, in decoding the phonological component of the message, re-established the intended phonemes, interpreting the distortion as an expressive manner of pronouncing the phoneme. In the decoding, the sound is broken up into two elements: [y][i] + expression of tenderness (Fónagy, 1971: 159).

     The rounding of the lips can be considered as preparation for a kiss. Fónagy calls this "phonetic gesture" (1971: 160). This explains in part that the first component is perceived as a substance, the second, which is no less substantial than the first, as a "manner of pronouncing" (1971: 160). In this context, Fónagy speaks of "dual encodedness" (161). My claim is that in the recitation of metered verse there is a "triple encodedness". Sometimes, an overarticulated final stop consonant may be decoded as [p] (or [t], or [k] etc.) + an assertive, determined, firm attitude + the clear-cut articulation of the end of some prosodic or syntactic unit. Even a person reluctant to accept Fónagy's psychoanalytical explanations based on "the transfer of anal libido" (160) or "anal-sadistic cathexis" lending an authoritarian character, may discern some firm, determined, even authoritarian attitude in the speech of a person who tends to over-articulate the stop consonants. Stop consonants are abrupt, not continuous, aim at considerable accuracy, at a circumscribed point both in time and in place of articulation. Their overarticulation indicates control, exhibits strict, particular, and complete accordance with a standard, is marked by thorough consideration of minute details.

When we consider the particular articulatory gestures associated with each stop, some additional expressive potentials may become conspicuous. We will consider here only one of them. The overarticulation of bilabial consonants, mainly the abrupt oral stop [p] and affricate [pf] involves strong closure of the lips, followed by sudden opening. This articulatory gesture is very similar to spitting, and may be expressive of disgust or contempt. Thus, in Wittgenstein's term, there may be "aspect switching" between a determined, or contemptful, or disgusted attitude, depending on the semantic component of the utterance. Even the bilabial nasal [m] can, with some effort, be pronounced contemptfully, as when at the height of political polemics against the Israeli left, Aric Sharon used to pronounce "smol", the Hebrew word for "left", with extreme contempt.4

A Brief Interpretation

     In his opening speech of the play, Gloucester takes the audience into his full confidence. He tells about his treachorous plans, about his relentless self-perception, and provides the necessary historical background information to the play's action. He emerges as a charismatic figure, who can evoke an immediate, personal assent of the audience to all his plots and villainies. This he does by his ironic comments on the new, emasculate social regime, and his cruel self-knowledge, relentless self-irony, and joyous flouting of moral taboos. This grants him almost unlimited power over his victims. Gloucester speaks in a subtle tone about "Grim-visaged war", who "capers nimbly in a lady's chamber / To the lascivious pleasing of a lute"; at the same time, he reveals a peremptory, determined attitude: "I am determinèd to prove a villain / And hate the idle pleasures of these days". His determination to become a king informs the entire tragedy. Indeed, in this recording, Simon Russel Beale adopts sometimes an effeminate tone indicating subtle irony as in speaking of "the lascivious pleasing of a lute"; at the same time, some of his vocal gestures provide indication of a peremptory, determined attitude.

Unlike Iago, who has been characterized as "a motive-hunting motiveless villain", some of Shakespeare's villains are driven by a very well-understood psychological motivation: they were wronged from the very moment of their birth, or even before. Edmund is a bastard; Shylock is victim of the great historical injustice against the Jews; Gloucester was born premature and crippled: "Deform'd, unfinish'd, sent before my time / Into this breathing world; scarce half made up". While some of Shakespeare's wronged figures are of a melancholy, morose disposition, Gloucester is carried away by his own deformity.

1. But I, that am not shaped for sportive tricks,
    Nor made to court an amorous looking-glass ;
    I, that am
rudely stamp'd, and want love's majesty
    To strut before a wanton ambling nymph ;
    I, that am
curtail'd of this fair proportion,
    Cheated of feature by dissembling nature,
    Deform'd, unfinish'd, sent before my time
    Into this breathing world; scarce half made up,
    And that so lamely and unfashionable
    That dogs bark at me as I halt by them;
    Why, I,
in this weak piping time of peace,
    Have no delight to pass away the time,
    Unless to spy my shadow in the sun
    And descant on mine own deformity [my italics -- RT]

Listen to Simon Russel Beale's continuous reading of the opening lines of the first the soliloquy.

Click here

     It would appear that the line "But I, that am not shaped for sportive tricks" merely points up an essentially social contrast between himself and the rest of the society. That is the rhetorical function of "But I, that". But the passage becomes a ten-line-long catalogue of increasingly shocking deformities, presented with witty turns of phrase. From the syntactic point of view, the sentence remains incomplete for eleven lines, and the long-expected predicate occurs only in line 12: "But I ... Have no delight to pass away the time". Superficially, the repeated self-reference "But I" (italicized in excerpt 1) serves to remind the listener who is the referent of this long, syntactically incomplete list of deformities. But it may also be interpreted as the speaker's increasing amusement of his own hopeless situation (which does not prevent him from wooing and winning the beautiful Lady Anne, for instance, on the most improper occasion).

Two Preliminary Issues

     Before plunging into the main issues raised by this paper, we must briefly consider two preliminary issues. First, there is the problem of a twelve-line-long "enjambment". In one sense, it is exceptionally strained, owing to its sheer length. In another sense, however, it is rather mild. The shorter a syntactic unit, the more it resists being streched over two prosodic units. The end of most lines in excerpt 1 coincides with the end of a well-articulated subordinate syntactic unit. Only in two instances the reader may become aware, after the event, that the unit is run on to the next line: "and want love's majesty", and "sent before my time". The loose-end chunk left in the first line is five and six lyllables long, respectively. The complementary chunk in the next line, in the former case coincides with a whole line, and with a six-syllable-long hemistich in the latter. So, these instances of enjambment aren't very strained. In such a structure, the best strategy for a performer would be to clearly articulate the end of each line except these two lines; and to impose some unifying pattern on the whole passage. In the present instance, an emphasis on the repeated referring phrase (italicized in excerpt 1) would do. In the performance under discussion, a clearly demonstrable "crescendo" pattern too has been superimposed on the repetitive pattern. In the recording under discussion there is a curious variant of this. In excerpt 2, one complex sentence is running through four lines. At the end of line 1, the syntax is incomplete, and a sequel is strongly expected. At the end of lines 2 and 3 no such incompleteness is perceived. Nonetheless, there is a feeling that the transition from line 3 to line 4 is rather hasty. The endings of lines 1, 2, and 4 in excerpt 2 are exceptionally well-articulated; whereas the end of line 3 is conspicuously underarticulated, against all syntactic and prosodic odds. This is a well-known structural device in poetic structures too, namely, that the shape of the last but one unit must be considerably weakened, so as to increase the requiredness of the last unit, and the integration of the whole.5

The second preliminary issue concerns pauses. I will briefly recapitulate here two of my earlier discussions (Tsur, 1997; 1998: 301-315). There is a century-long controversy concerning the status of pauses in poetry. Are they part of poetic structure, or of performance? Some generative linguists have recently revived the former position. Consider, for instance, the reading reflected in figures 1-2. Figure 2 shows a huge pause following the first line of excerpt 2, after "peace" (806 msec [= millisecond]); but in midline there is an over one-and-a-half times longer pause, between "Why, I" and "in this weak" (1.238 msec). In the word weak, there is a longish pause before the [k] (183 msec), and a slightly longer one after it (244 msec). Do they change the iambic pentameter nature of the verse line from which this stretch has been excised, or are they vocal manipulations to actualize it? I embrace the latter position.

In this case, the pauses after "Why, I" and before [k] represent two different kinds of pause, "macro-pause" and "micro-pause". The former is heard by a listener as a pause proper, the latter is not. It is perceived as part of an articulatory gesture rather than a period of silence. The pause after [k] is perceived as a minute period of silence. Stop consonants are "abrupt", and cannot be prolonged; that is, unless you insert a brief pause before them. The 183 msec pause between the vowel of weak and the release of the [k] is quite long for a midword pause. If you play "wea-" until the release of the [k], you hear what you see on the screen: [wi:] plus a pause; but if you include in the sequence the release of the [k] as well, you hear no pause, but an over-articulated [k]: the pause is re-interpreted as the time period when the articulatory organs are closed before the release. Thus, the perception of the pause is changed after the event; I call this "back-structuring".

Now how does a 806-msec-long, or 1.238-msec-long pause affect our perception of poetic rhythm? It depends whether it occurs at the end of a line, or in midline. In the former case, it helps to clearly articulate the line ending, and so enhance the unity of the line. In the latter case, it may have a devastating effect on the line's unity; but not necessarily. According to a principle formulated by Gestalt psychologists, entities tend to reassert themselves in perception in front of intruding events, up to a certain point; when the strength of the intruding event passes a certain point, the perceptual entity falls to pieces. The "certain point" depends, among other things, on whether such disintegrating forces as a midline pause are balanced by such appropriate integrating forces as clear-cut articulation of the line ending, or some perceptual force propelling across the pause (I will return to this example).

Redundancy, Conflicting Cues, Overdetermination

     Redundancy of phonetic cues is a state in which several phonetic cues combine to convey the same effect. In "ordinary speech" there is considerable redundancy at a variety of phonetic levels. For our purpose, one of the most important cases is when several cues combine to indicate the end of some syntactic (or prosodic) unit. Consider the reading of the first line of excerpt 2, reported in figures 1 and 2. The reading provides a variety of cues that indicate discontinuation. First of all, there is, as we have seen, the huge pause following the line, after "peace". Such a pause should unambiguously signal discontinuation. But there are at least two more cues. Most conspicuous is the long intonation contour at the end of the unit, which is a classic terminal contour. And the word "peace" is exceptionally prolonged, constituting, as it were, a fermata, as in music. As I suggested earlier, no one can tell how long a word or a phoneme ought to be. But here the word as a whole, as well as the closing consonant [s] are perceived as exceptionally long. To give a rough indication, we may compare them to "pass" in the next line. "Pass" is 287 msec long; the word-final [s] is 100 msec long. "Peace" is about two times longer, 566 msec; but the word-final [s] is well over three times longer than in the other word (339 msec). This shows that the phonetic cues for discontinuation are considerably redundant, and that the closural forces at the line boundary may be strong enough to counterbalance the disintegrating effect of the long midline pause.

Figure 1 Wave plot and pitch contour of "weak piping time of peace".6

2. Why, I, in this weak piping time of peace,
    Have no delight to pass away the time,
    Unless to spy my shadow in the sun
    And descant on mine own deformity :

Figure 2
Wave plot of "Why, I, in this weak piping time of peace, / Have no delight to pass away the time"
                (the entire text could not be squeezed into the available space, so only a few key words are printed).

Listen to Simon Russel Beale's reading of excerpt 2.

Click here

Figure 3
Wave plot and pitch contour of "Have no delight to pass away the time"

3. Now is the winter of our discontent
    Made glorious summer by this sun of York

Listen to Simon Russel Beale's reading of excerpt 3.
Click here

     A similar story (but with some significant differences) may be told about the end of the very first line of the play (excerpt 3). First, the line is followed by a 56 msec pause; this is less than negligible as an acoustic cue for line-ending (but, as we shall see, it has a different function). Secondly, the line ending is indicated by an unusually long, classical "terminal contour". Third, the last syllable "tent" is, as expected, considerably lengthened. It is, indeed, the longest syllable in this line (490 msec), even though, in English, sound sequences in polysyllables are usually shorter than comparable sequences in monosyllables (compare, for instance, tail vs. tailor; I have elsewhere discussed this issue at some length; Tsur, 1998: 156-157). The only (monosyllabic) word that approximates its duration is Now (486 msec). The length of this line-initial word is explained by rhetoric, not rhythmic, reasons. To appreciate the duration of this syllable (of a tri-syllabic), one might observe that the sequence wint- (in "winter") is slightly over half as long (291). Fourth, the word-final oral stop [t] is excessively overarticulated (overarticulating, by the same token, the word boundary and the line boundary as well).

Figure 4
Wave plot and pitch contour of "Now is the winter of our discontent"

     My perception-oriented theory of metre assumes that certain rhythmic problems can be solved by the overarticulation of certain syllables. All the gurus who instructed me in empirical research told me they were not aware of any possibility for the machine to indicate overarticulation. It seems to me now that the machine can show overarticulation when, e.g., certain identifiable features of careful articulation are slightly or greatly exaggerated, such as duration; but there are some additional, quite interesting, features. Language in everyday conversation is usually underarticulated. Especially in English, certain articulatory features of word boundaries are almost always suppressed, and words run into one another. Consider the pairs of back-to-back [s]s in figure 5. The word-final [s] in this is run into the word-initial [s] in sun. This is the normal way of speaking.7 By contrast, between the word-final [s] of glorious and the word-initial [s] of summer a minute 59 msec pause is inserted. The listener doesn't perceive it as a pause, but as an articulatory gesture intended to separate the back-to-back [s]s, a kind of "refrectory period". There is a similar pair of back-to-back [t]s in the line "Have no delight to pass away the time" (figure 3). In ordinary connected speech they would be run into one another; here they are overarticulated by a stronger than usual release of the first [t], and an intervening 175-msec pause. Again, the pause is perceived as an articulatory gesture, and by no means as a pause. In addition to possible expressive functions, these back-to-back [t]s serve a conspicuous prosodic purpose: to articulate the caesura (it should be noted that in figure 5, the caesura is articulated by a conspicuous terminal contour (on "summer"), not the back-to-back [s]s).

Figure 5
Wave plot and pitch contour of "Made glorious summer by this sun of York"

     In isolation, a word cannot begin with a vowel; it must be preceded by a "glottal stop". Glottal stop is the speech sound we insert before "aim" when we say: "I said 'an aim', not 'a name'". In connected speech, the preceding word is usually run into the word-initial vowel, and the glottal stop is omitted. Likewise, when a word ends with an oral stop ([p], [t], [k], [b], [d], or [g]), it consists, in theory, of three stages: the speaker closes the vocal track; this is followed by a minute period of silence, while the vocal track is closed; this may be followed by a "stop release", when the vocal track is opened, and a short plosion is heard. This plosion results from "the release of occluded breath". In connected speech, the word is usually run into the next one, and the word-final stop release is suppressed. I have quoted above Gerry Knowles who suggests that, since glottal stops and stop releases are usually suppressed, in instances when they are properly articulated, they may indicate discontinuity, even where there is no measurable pause.

Consider, for instance, the verse line "But thou, contracted to thine own bright eyes", from Shakespeare's first sonnet. A native speaker of English would normally suppress all the glottal stops and stop releases in a phrase like "thine own bright eyes". Rhythmically, "bright" constitutes a deviation: it is a stressed syllable in a weak position, that is, where the iambic pattern requires an unstressed syllable. My perception-oriented theory of metre predicts that such a verse line can be performed rhythmically without demoting the deviating stress, by having recourse to a certain combination of vocal strategies, one of them being overarticulation. Indeed, the Marlowe Society, in their full recording of Shakespeare's Sonnets, insert (even emphasize) a glottal stop before "own" and "eyes", and a stop release after "bright". No native speaker of English would do that in "ordinary" speech.

Figure 6 Wave plots of "discontent" and "York" excised from a reading of excerpt 3.

Listen to the words "discontent" and "York", excised from the preceding reading of excerpt 3.
Click here

     Let us have now a look at figure 6. The line-final discontent ends with a stop release. This stop release is exceptionally loud and exceptionally long; and is preceded by a very minute pause (28 msec). The nature of this structure will be clarified by a comparison to the stop release at the end of York in the next line. Here the plosion, though still quite conspicuous, is much shorter and much weaker. It is preceded by an exceptionally long pause (169 msec -- in midword!). In spite of its excessive duration, it is not perceived as a pause, but as an articulatory gesture: extended closure of the vocal track, to overarticulate the [k]. In this instance, at least, there appears to be a trade-off between the amplitude and duration of the realease and the preceding pause. The brief 56-msec break after discontent, too, is perceived not as a straightforward pause, but as some articulatory gesture that does not interrupt the stream of speech.

There is a fairly mild enjambment from the first to the second line: the sentence is running on from one line to the other; even the verb phrase "is made" is straddled between the two lines. The line boundary requires discontinuation of the stream of speech; the run-on sentence requires continuation. The performer solves this problem remarkably well. The lack of perceptible pause between the two lines takes care of continuation; the terminal intonation contour, the prolongation of the last syllable, and the exceptionally well-articulated stop release at the end take care of discontinuation. That is what I have called "conflicting cues" in enjambment.

At the end of the second line, line boundary and sentence boundary coincide; so, there is no syntactic demand here for continuation. Indeed, the phonetic cues are, again, in harmony, all of them indicating discontinuation. This is what I have called redundancy. The final monosyllable, York, is the longest syllable in the first two lines (544 msec). The final rising-and-falling intonation curve too may effectively contribute to closure. Considering that there is no prosodic problem here to solve, the line-final stop release with the preceding excessive pause may be judged very much exaggerated. It is here where expressive force and overdetermination come in. The overarticulated line-final stop does not serve merely to clearly articulate a juncture of a line-boundary and sentence-boundary; it serves an expressive function too. When speaking of "triple encodedness", I suggested above that the distorted pronunciation of a phoneme may be decoded as a phoneme, as some expressive effect, and as some prosodic effect. We have just considered the prosodic function of the overarticulated oral stops at the end of the first two lines: to clearly articulate the line boundary. But I have also suggested above that a tendency to overarticulate oral stops may be an indication of certain personality traits, such as an assertive, determined, firm attitude. According to our foregoing analysis, this description fits Gloucester extremely well. I submit that the overaticulation of such word-final stop releases may also be a part of the means by which this particular actor, in this particular performance, characterizes Gloucester as a relentless, determined person. Such an interpretation can rely on many more overarticulated stops in this speech.

I will not scan the entire first speech for such instances; I will only pay some attention to the first line of excerpt 2, "Why, I, in this weak piping time of peace" (see figures 1 and 7). The stop release in weak isn't very remarkable; but is preceded by a most remarkable "articulatory" pause (183 msec), in midword (!). It is perceived as gross overarticulation of the [k]. Shakespeare provides in the rest of this line four more conspicuous oral stops in syllable-initial position: a [t] and three more [p]s, two of them in piping. As the wave plot shows (figures 1 and 7), each one of these [p]s begins with a vigorous perturbation of the sound wave, and is preceded by a pause.8

     The most intriguing pause is in the middle of piping, 92 msec long, whose only conceivable purpose is the overarticulation of the second [p]. This is further reinforced by the two discontinuous though steadily falling intonation contours assigned to them (see figure 1). The [p] of peace is preceded by a 103 msec pause. This pause conspicuously occurs in midphrase: the preposition "of" is run into the preceding "time", while there is no acoustic trace of [f]; and the two words are assigned one consistent intonation contour. The pause is reinforced by the notable pitch discontinuation: from the bottom of the intonation contour assigned to "timeof" there is a leap from 68 Hz to 158 Hz (wherefrom the curve falls again to 80 Hz). A long pause (perceived as a pause) precedes the first [p] of piping. This has a rhetorical purpose in the first place; but it affects the overarticulation of the [p] too. As an additional function, there is here a metric problem too. The overarticulated weak is a heavily stressed syllable in a weak (odd-numbered) position. This arouses strong craving for the reinstatement of metre in the next strong position. The huge leap of pitch from weak to pi and the overarticulation by the preceding pause serve to counterbalance the infringement. The [t] (of time) too is preceded by a brief, 47 msec pause. The cumulative impact of overarticulated stops in general, and [p]s in particular may be perceived as expressive of the speaker's attitudes in two respects: he is determined ("I am determinèd to prove a villain"), and is contemptful of "this weak piping time of peace" ("And hate the idle pleasures of these days").

Figure 7
Wave plot of "piping time of peace"

     A look at the terminal intonation curve in figure 3 may reveal another illuminating aspect of the speaker's irony. The line ends with a fully developed terminal contour, but with a difference. In the preceding line (figure 1), as we have seen, the terminal contour falls from 158 Hz to 80 Hz (the terminal contour in figure 4 falls from 106 Hz to 66 Hz); in figure 3 it falls from 190 Hz, to 122 Hz, wherefrom it continues to 120 Hz. This relatively high pitch sequence appears to have both a rhythmic and an expressive function. In its rhythmic function it has two relevant aspects: it has a terminal shape, but is higher than usual. It clearly articulates the line boundary and, at the same time, suggests that something is still to come. In its expressive function, the listener can't help being struck by the effeminate character of the high voice. The phonetic application I am using gives the pitch range 80-150 Hz as the typical male range, and 120-280 Hz as the typical female range. Thus, when speaking of the effeminate delights "to pass away the time", the speaker's pitch goes well into the typically female range. Thus, we have in close proximity "ambitious" articulation and "effeminate" intonation.

The nature of this terminal contour (indeed, of the whole issue) will be illuminated if we compare it to a similar contour on another line-final "time", in the enjambment "sent before my time / Into this breathing world". Here the conflicting cues required by the enjambment generate a similar contour, but with some slight differences. The falling portion of both curves is similar. But the first one is preceded by a rising curve; pitch rises in it from 127 Hz to 177 Hz and then falls to 90 Hz. The second curve is considerably higher and shorter: it falls from 190 Hz to 120 Hz (the first curve falls 87 Hzs, the second one 70 Hzs only). Considering duration, the first "time" is a few milliseconds shorter than the second one; its final [m] is less than half as long as the other one. What can we learn from these measurements about the reciter's vocal strategies at the two line boundaries? In the second "time", all the cues are redundant in signalling arrest: the last word of the line, and the last phoneme of the word are lengthened; it is followed by a considerable pause, and is closed by a terminal intonation contour. The only cue for expecting (not indicating) continuity is the relatively high pitch of the terminal contour. In the first "time" there are conflicting cues, for indicating continuation and discontinuation. There is no measurable pause between "time" and "Into"; the lengthened last word of the line indicates arrest, but the unexpectedly short word-final [m] would suggest continuation; and the long and low-falling intonation contour unambiguously signals arrest. Listening to the lines strongly confirms this analysis.

4. Deform'd, unfinish'd, sent before my time
    Into this breathing world; scarce half made up

Listen to Simon Russel Beale's reading of the above two lines.
Click here

Figure 8
Wave plots and pitch contours of the word "time", excised from
                "sent before my time" and "pass away the time".

Listen to two tokens of time
One excised from a reading of "Deform'd, unfinish'd, sent before my time",
the other from a reading of "Have no delight to pass away the time"

Click here

     I have pointed out above, in excerpt 1, a highly effective anaphoric repetition of the first personal pronoun "I": But I, that am ... I, that am ... I, that am ... Why, I. I suggested that this repetitive scheme may impute a considerable degree of unity on a passage in which the predicted predicate is postponed to the twelfth line. I also claimed that Beale superimposed on this repetitive pattern a "crescendo" pattern. I suggested that this catalogue can be interpreted as the speaker's increasing amusement over his own absurd situation. Now I propose to elaborate on this, and make an additional observation too. Consider figure 9. The increasing thickness of the wave plots of the first three items indicates here a pattern of increasing loudness. The intonation plots of "I" have strikingly similar shapes, especially the last two ones. The pitches of the four plots too yield a gradually ascending sequence. It is this rising sequence that reinforces the "increasing amusement" aspect of Gloucester's catalogue of his own deformities.

Figure 9
Wave plots and pitch contours of the sequence of phrases
                "But I that am... I that am... I that am" and "why, I".

Listen to the phrases "But I that am... I that am... I that am... why, I" excised from a reading of excerpt 1.
Click here

Listen to the last two items from the preceding list.
Click here

     Now this rising sequence illuminated for me an issue that caused me a considerable problem. I have discussed above the possible disruptive or reinforcing effect of the pause after "Why, I" in the first line of excerpt 2. Relying on a gestalt principle, I assumed that entities tend to reassert themselves in perception in front of intruding events -- provided that the perceptual entity is sufficiently unified. I mentioned two types of unifying factors: closural devices, and some perceptual force propelling across the pause. When listening to the line, I did perceive such a propelling force, but had difficulties to pinpoint its source. I had a feeling that the closing intonation contour on "I" didn't fall "deep" enough, and thus aroused strong expectations for continuation. The trouble is that we have no criteria for deciding what is "deep enough". The pitch curve of "I" in the last but one phrase in figure 9 rises from 107 Hz to 120 Hz, and then falls to 82 Hz. The curve of "why", though somewhat lower, is within roughly the same range (98 Hz, 102 Hz, 80 Hz). From here, there is a considerable leap to the pitch curve of "I", which moves from 127 Hz to 136 Hz, falling to 99 Hz. These comparisons suggest two possible solutions (perhaps both valid). First, the musicologists Cooper and Meyer (1960); pointed out that a steeply rising pitch sequence or intensity sequence (crescendo) has a marked forward grouping effect (it leads, so to speak, forward). Second, when you listen in figure 9 to the last two phrases only, you have a feeling that the intonation curve of the first "I" falls to a "base line", some stable reference point of the musical scale, even if you cannot tell by what criteria. When you listen to the second "I", you have a feeling that its pitch curve remains somehow "half way", strongly indicating that some continuation is to come. Thus, the rising pitch and amplitude curves are overdetermined.

Let us have now a close look at the last two lines of excerpt 2.

Figure 10
Wave plot of "Unless to spy my shadow in the sun an(d) /
                 Descant on mine own deformity".

Listen to the lines "Unless to spy my shadow in the sun and / Descant on mine own deformity".
Click here

Notice that "sunan(d)" is pronounced as a unit, and is followed by a minute pause.

     In this excerpt, one complex sentence is running through four lines. At the end of line 1, the syntax is incomplete, and a sequel is strongly expected. At the end of lines 2 and 3 no such incompleteness is perceived. Nonetheless, there is a feeling that the transition from line 3 to 4 is rather hasty. This is warranted neither by versification, nor by ordinary speech. In ordinary speech, we would expect the speaker to separate "And" from the preceding "sun", and run it into the ensuing "descant", pronouncing the two words with a single, shared [d]. In a rhythmical performance, the line boundary after "sun" would encourage such a separation. Here, on the contrary, the performer pronounces the two words sunan ("sun and") with no measurable pause between them, and further binds them together with one common intonation contour; there is no trace of the word-final [d] (what is quite common in "ordinary" speech). While in the other three lines in this excerpt the last syllable of each line as well as the word-final phoneme is conspicuously lengthened, the word sun as well as its closing phoneme are relatively short. What is more, the speaker inserts a minute 66 msec pause after sunan. We have said enough about such brief pauses to expect (what is, indeed, the case) that it would not be perceived as a pause, but as an articulatory gesture, overarticulating the ensuing [d].

Figure 11
Wave plot and pitch contour of ""Unless to spy my shadow in the sunan(d)"

     The close phonological connection of And to the preceding sun blurs the preceding verse line as a whole. As I suggested above, this apparently unjustified connection may have a structural justification, nevertheless. It may have been meant to weaken the last but one prosodic-syntactic unit, so as to increase the requiredness and closural quality of the last line. By the same token, this would increase the "punch-line" quality of the last line, enhancing its jubilant sarcasm, as a perceptual quality. We have said enough about enjambments to claim that there are vocal strategies that may solve this problem, to have one's cake and eat it. In fact, I believe, this is a borderline case. Some listeners may judge that the falling terminal intonation contour does take care of weakly articulating the line boundary. I personally would prefer if the word-final [n] were a shade longer. I have attempted to lengthen the [n] electronically:

Compare now the original and the manipulated versions of lines 3-4 of Excerpt 2.

Original Version:

Click here

Manipulated Version
This version has been manipulated by electronic means: the vowel and [n] of sun have slightly been lengthened:
Click here

Violating Stress Pattern or Metre

     The deviation from both ordinary speech and versification appears to have rhetorical reasons too. Descant is unusually foregrounded in this reading , by a variety of means. As will be readily seen in figure 12, pitch resets high on de, and then falls on cant. In the wave plot we may observe that both phonemes of de are invested with exceptionally high energy. This will be apparent if we compare this overarticulated sound sequence in Figure 13, to a less extreme instance of overarticulation of the same sequence in Figure 14 (cf. below). The mini-pause before descant seems to serve the same purpose. Descant is a musical term meaning to write variations upon a simple theme; its use here in the sense "comment on", "dwell upon" may suggest a jubilant self-sarcastic tone. It would appear that the performer foregrounded this word to underpin the height of Gloucester's jubilant self-sarcasm. This decision, in turn, may have been influenced by a need to solve a rhythmic problem. In English, such words as present, subject, object, are pronounced with the stress on the last syllable when they are verbs, and on the first syllable when they are nouns. In Shakespeare, Milton, Shelley, Keats, and Yeats, but not in Pope, such verbs, as well as adjectives like extreme, supreme, sometimes occur with their second syllable in a weak position, usually followed by a stressed syllable in the next strong position. Such metric constructs have lead some scholars of Shakespeare's or Milton's pronunciation to certain conclusions concerning these poets' pronunciation. It is more likely, however, that the aesthetic norms rather than the stress rules have changed back and forth from Shakespeare through Pope, through the romantics, through Yeats, to our day.9 At any rate, the Oxford English Dictionary (Online) assigns stress to the last syllable of descant as a verb, quoting this line among other examples; the Random House College Dictionary and Webster's New Twentieth Century Dictionary too assign stress to the last syllable. Merriam-Webster's Collegiate Dictionary, however, gives both possibilities (and, in the electronic version, both possibilities are recorded).10

Figure 12
Wave plot and pitch contour of "An(d) descant on mine own deformity"

     In this electronic version entries are recorded by a male or a female speaker with an exceptionally careful articulation, or even overarticulation, with a long, falling terminal intonation contour, suggesting that it is an independent, unconnected phonological entity. I have found that these recordings may frutifully be compared to the same words in an artistic recitation of a poetic-dramatic text. Consider, for instance, the following three recordings of the word "descant", read by Beale in the context of Gloucester's soliloquy (Figure 13), and by the female reader of the Dictionary (Figure 14). Both readings in Figure 14 illustrate "descant" as a verb; the first token is beginning-stressed, the second end-stressed. Correspondingly, in the first token the onset of the intonation contour is higher on the first syllable, in the second token higher on the last syllable. The wave plot in the lower window conveys information about the relative loudness and relative duration of the speech sounds. In the first token of Figure 14 (that is, when the stress is at the beginning) the second syllable has less energy and less duration than in the second token, when the stress is at the end (though, in both words the second syllable is longer than the first one). Thus, relative stress is cued, simultaneously, by pitch, duration and loudness. The three cues are not necessarily congruent. Notice also that the pauses between the syllables and before the word-final [t] generate overarticulation. Later I will argue that overarticulation in the dictionary and in the artistic text serve different purposes.

Beale (Figure 13) utters "descant" with the stress on the penultimate syllable. Accordingly, the relative pitch peaks constitute, as in the first token in Figure 14, a downward step from the first to the second syllable. As to the other two phonetic cues, their relative weights are significantly different in the two readings in several respects. As the wave plot clearly indicates, the effect of the pitch contrast in Figure 13 is reinforced by a huge intensity contrast (in Figure 14 the intensity contrast is much smaller). Regarding duration, we find the obverse. In the first token in Figure 14 there is a huge duration contrast, which runs counter to the pitch movement: 214 vs 337 msec (whereas in figure 13 the difference is only 269 vs 284 msec).

Figure 13
Wave plot and pitch contour of
                 "descant", read in context by Beale.

Figure 14
Wave plot and pitch contour of two tokens of "descant",
                 read by a female reader in Merriam-Webster's Collegiate Dictionary.

     The overarticulation of "descant" in Beale's performance becomes most conspicuous in an additional detail, by comparison to Figure 14. In the latter, the first syllable "des-" is indicated in the lower window, twice, by a single bulk. In the former it is indicated by two clearly-discernible bulks: de + s; the /s/ alone is longer than the whole syllable in figure 14. This comparison will be underlined by a more detailed comparison of certain duration relationships between the two beginning-stressed readings of "descant", in the play and in the dictionary. In the dictionary reading the whole word is slightly longer than in Beale's reading (612 vs 603 msec). Nonetheless, the syllable "des-" is significantly longer in Beale's reading (269 vs 214 msec). Thus, the dictionary reading may serve as an objective standard (that is, without distortion of personal feelings or versification requirements), from which the artistic recital deviates.

The relative overarticulation in the dictionary entry and the dramatic recitation of Gloucester's speech can be compared in terms of the relative salience of various kinds of phonetic elements. The "abrupt" plosive consonants have, with one exception, roughly the same duration: /d/ is 14-msec-long in both readings 11 ; /k/ is insignificantly longer in Beale's reading (39 vs 33 msec); the only exception is the word-final /t/, which is over twice longer in the dictionary's reading (50 vs 22 msec). Substantial differences can be observed in the "continuous" vowels and consonants, and the pauses. The precisionist articulation of the dictionary is reflected in the relative duration of the vowels and articulatory pauses: the is 116-msec-long in the dictionary, but only 87-msec-long in Gloucester's speech; the /a/ is 117-msec-long in the dictionary, but only 83-msec-long in Gloucester's speech. The pauses (before /k/ and /t), too, are longer in the dictionary (66/53 msec and 34/19 msec, respectively). In Gloucester's speech, by contrast, the continuous fricative and nasal consonants are substantially longer than in the dictionary: /s/ 177 vs 84 msec; /n/ 114 vs 100 msec. Other things being equal, the high salience of consonants, even if continuants, may suggest a more vigorous attitude than the high salience of vowels. The pitch contours show an additional, quite significant albeit elusive, difference. In Figure 14 a falling (that is, terminal) intonation contour is assigned (twice) to the first syllable; this enhances the clear-cut, dispassionate, dictionary-like, articulation, and has a settling effect. In Figure 13 a rising intonation contour is assigned to the first syllable, arousing expectations, suspense; this has a more dramatic, unsettling effect. The listener hears in the dictionary readings some measured, informatory quality, whereas in the stream of Gloucester's soliloquy he detects some impetuous dramatic effect. This difference results from the cumulative impact of these elusive, minute differences (the rising pitch contour assigned to the syllable "des-", for instance, can clearly be heard only when the first two phonemes are isolated on the computer).

The foregoing discussion may suggest the following scenario: The verb descant has its second syllable in a weak position, not followed by a stressed syllable in a strong position. This would strongly violate the iambic metre (in Halle and Keyser's terms, it constitutes a "stress maximum in a weak position" -- see below). The performer could use a joyous self-sarcastic reading as an excuse for foregrounding descant by all possible phonetic means, among them -- the inversion of the stress pattern of the verb.

There is in excerpt 1 an additional instance of a verb with its stressed syllable in the fifth (weak) position:

     In this instance, Beale places the stress, properly, on the second syllable, drastically violating metre. This is one of the rare instances in which one may observe how an outstanding British actor faces the rhythmic problem arising from a stress maximum in the fifth position (which is the weak position least tolerant of violation). So, the issue deserves a more systematic presentation.

Morris Halle and Jay Keyser were the founding fathers of generative metrics, proposing a parsimonious rule which, they claim, can generate all metrical lines, but no unmetrical lines. A metrical line is one in which no stress maximum occurs in a weak position. A stress maximum is, according to the latest version of the Halle-Keyser theory, a syllable that bears lexical stress, between two unstressed syllables, as the second syllable of "curTAIL'D of". 12 Since it occurs in the fifth position (which is odd-numbered and therefore weak position in the iambic metre), the verse line is ruled unmetrical under the Halle-Keyser theory. 13 Halle and Keyser and their critics all over the world found about twelve unmetrical lines under this theory in major English poetry. However, in Tsur (1977; 1998) I provided a list of 52 additional instances. Such a sample was big enough to suggest some method in this madness. In an iambic pentameter line there are four weak positions available for violation under this theory (positions 3, 5, 7, 9). A random distribution of violations would allocate, therefore, 25% to each one. About two thirds, however, occur in position 7; about one third in position 3. Most instances that occur in positions 5 and 9 are rather doubtful instances. The line under discussion is one of the very few indisputably genuine instances. I have argued that a stress maximum in a weak position is acceptable to such poets as Shakespeare, Milton and Shelley, for instance, provided that they can be performed rhythmically; and this distribution reflects the relative difficulty of doing this. This, in turn, is influenced by a hierarchy of metric boundaries: line ending, unmarked caesura, marked caesura -- in this descending order of "grouping potential".
As I have argued in several places, caesura articulates a verse line in the middle; in the iambic pentameter it may occur after positions 4, 5, or 6. When a pentameter line is divided into segments of 4 and 6 positions, "the shorter segment comes first" is the unmarked option (that is, when caesura occurs after position 4); "the longest comes first" is the marked option (that is, when caesura occurs after position 6).

A stressed syllable in a weak position (a stress maximum even more so) disrupts metre, and arouses expectations for reinstatement, "presses forward" for resolution. When the stress pattern and meter have again a "coinciding downbeat", tension is resolved, and the metre becomes "fresh and new". Metre may be reconfirmed in the next or the next but one strong position. Only the latter may constitute a stress maximum. In this case, the period of uncertainty is longer, the threat to rhythm greater, and the resolution, if achieved, more gratifying. Position 1 is weak, but a stressed syllable displaced to it cannot be a stress maximum by definition. Furthermore, when such a displacement occurs, meter is reinstated by a "coinciding downbeat" in position 4, that is, just before the unmarked caesura, achieving considerable stability. So, it is perfectly acceptable even to Alexander Pope. A violation of metre in position 3 is compensated for in position 6, just before the marked caesura. After a stress maximum in the ninth position, metre cannot be reinstated in position 10 by definition. A stress maximum in the fifth position suppresses stress before both potential metric boundaries (in positions 4 and 6), and must be compensated for in position 8, which is not followed by a metric boundary, and where stability cannot be achieved. Consequently, stress maxima in the fifth and ninth positions are the least acceptable violations of the iambic pentameter. The greatest stability is achieved when the stress maximum occurs in position 7, that is, when the resolution effected by the next "coinciding downbeat" occurs in the tenth (last) position of the line, enhancing its closure.

Figure 15
Schematic mapping of possible stress valleys beginning with a stressed syllable
                 or stress maximum in a weak position; the upward arrows point to the positions
                 in which metre is reinstated by coinciding downbeats; the downward arrows
                 point at the unmarked and marked caesurae.

     In a series of empirical studies I have found that experienced readers tend to perform stress maxima in the seventh position without questioning, and find the results satisfying. Though not aware of the required solution, their solutions tend to be remarkably similar; and are in harmony with my predictions based on the Gestalt theory of grouping: the deviating stress will be over- rather than under-emphasised; it will begin a closed and symmetrical group of four syllables ("stress valley"), ending in the tenth position; and will tend to be isolated from the preceding stretch of syllables, while still taking care of syntactic continuity.

I have said that in the case of a stress maximum in a weak position, the deviating stress is over- rather than under-emphasised. According to Cooper and Meyer (1960: 8), in musical performance, the placing of some extra accent may affect the grouping of sounds. Since there is a tendency for accents to begin a group, the placing of accent on a strong beat tends to articulate the sequence in beginning-stressed groups; an accent on a weak beat presents the group as end-stressed. The extra accent in the seventh (weak) position of the line creates a drive to focus the stress valley on the last syllable, enhancing the feeling of strong closure (at the end of the stress valley and the line).

Figure 16
Wave plot and pitch contour of "I, that am curTAIL'D of this fair proportion".

Listen to the line "I, that am curTAIL'D of this fair proportion".
Click here

     Now let us have a close look at Figure 16. One thing that draws attention is the exceptionally high pitch on "-TAILED" jutting out from the pitch plot. A lesser curve of a similar shape is assigned to the line-initial "I". These two curves have two opposite effects each: on the one hand, they indicate extra stress in a weak position, generating a forward impetus toward the end of the stress valley; on the other hand, they are conspicuous terminal contours, grouping the utterance backward. The wave plot shows that there is no measurable pause between "I" and its sequel ("that am"). Thus, the intonation contour clearly articulates the boundary of "I", separating it for a rhetorical effect (to foreground and render it part of the anaphoric pattern discussed above). By the same token, it generates a forward drive, reinforced by the lack of measurable pause. There is no measurable pause between "-TAILED" and "of" either. What is more, the [d] and the [o] are co-articulated: there is no point in the sequence at the left of which there is an unambiguous [d], at the right an unambiguous [o]. The segment isolated at the "watershed" provides information about both a [d] and an [o]. This takes care of continuity, while the outjutting terminal contour constitutes an exceptionally well-articulated caesura. Concurrently, as the stressed syllable occurs in a weak position, it generates an exceptionally impetuous forward movement across the caesura.

This is one of the verse lines that put the reader's rhythmic competence to greater than usual trial. The first syllable ("I" with an emphatic stress) intrudes upon rhythmic regularity in a weak position, initiating a forward pressure for resolution. Regularity ought to be restored by a stressed syllable in the fourth position. There we find, however, the first (unstressed) syllable of "curtail'd", followed by its stressed syllable in the fifth position. This violates metre by a stress maximum in a weak position. The first stressed syllable in a strong position ("fair") occurs in this line as late as position 8, where it ought to achieve some degree of focal stability. It ought to, but doesn't. As I said above, a stress maximum in the fifth position suppresses both potential metric boundaries (in positions 4 and 6), and must be compensated for in position 8, which is not followed by some metric boundary, and where stability cannot be achieved. To make things worse, "fair" in position 8 is an adjective whose stress is subordinated to that of the ensuing noun ("proportion"). Thus, the metric pattern of excerpt 5 is disconfirmed or violated by metrically unexpected accents in positions 1 and 5, giving rise to end-stressed groupings; the arising foreward pressure, however, is continually forwarded to ever-later strong positions, until it achieves, eventually, focal stability in the last strong position, position 10. This well-closed fluid unit becomes part of a wider fluid structure. The line ending in excerpt 5 is characterised by conspicuous conflicting cues. The unusually prolonged word-final [n] suggests completion and discontinuity; the rising intonation on the [n], however, suggests that something is still to come. What is more, the prolongation of [n] renders the rising intonation contour more salient. By the same token, this rising contour has a prominent unsettling emotional effect.

It is almost impossible to perform such a verse line rhythmically. Nevertheless, there is some evidence that irregularities at one rank are more acceptable if at the rank above greater regularity is preserved. An earlier generalisation of mine can be applied here too, with the necessary changes: the preservation of rhythmicality depends, among other things, on whether such disintegrating forces as a midline pause are balanced by such appropriate integrating forces as clear-cut articulation of the line ending, or some perceptual force propelling across the pause. People are more willing to accept irregularity in midline if at the line ending focal stability is achieved. In this reading of this line, this willingness is strained to the utmost. The continually forwarded drive is intended to reach the point of focal stability as fast as possible, to avoid chaos. Still, the time span required to reach that point exceeds the limited capacity of short-term memory. To solve this problem, the reciter effects two vocal manipulations in positions 4 and 5. First, as we have seen, the outjutting intonation on "-TAIL'D" clearly articulates the hemistich boundary (reinforced by the excessive prolongation of the [l]), while generating a perceptual force propelling across it (reinforced by co-articulation). Second, both syllables of "curtail" as well as the boundary between them are grossly overarticulated, so as to generate a syllable with a transitory stress, so to speak, later subordinated to the stress of the next syllable. Traditional metrists speak of "hovering stress", that is, when the stress is equally distributed over two adjacent syllables. In my corpus of performances this is extremely rare (I have encountered so far only one genuine instance). The overarticulation of "cur-" and the greater intensity of the first syllable "predict" such an equally distributed stress; but then the stress of this emphatic syllable turns out to be subordinated to an even more strongly stressed one. Thus, the syllable in the fourth (strong) position is momentarily stressed, and the verse line achieves some degree of articulated stability.

The difference between this and a "standard" pronunciation of this word can be observed in Figure 17, which provides information about two tokens of "curtail", one read by a male reader in Merriam-Webster's Collegiate Dictionary, the other excised from Gloucester's soliloquy. I am not going to provide the exact measurements in the two readings; the differences between the two diagrams are conspicuous, and can directly be seen. The second token (excised from the soliloquy) is considerably longer as a whole; each one of its syllables too is considerably longer, the consonants are disproportionately long, and of greater intensity. The voiceless plosives are, in addition, aspirated,14 generating an exceptionally strong emotive quality. In the artistic recitation, again, the first syllable is indicated by two separate, huge blots in the wave plot. The pitch plots of the two readings have roughly the same shape. The pitch of the second one reaches about 10 Hz higher, but covers a, roughly, 12-Hz-shorter pitch range. Nonetheless, the pitch movement is much more readily discerned here, because it is spread over a longer time span. The listening ear can also distinguish a deviation of the long stressed vowel in the artistic reading from the vowel quality of the dictionary reading: it is somehow "fuller", but also "more open", "brighter".

Figure 17
Wave plot and pitch contour of two tokens of "curtail", one read
                  by a male reader in the audio version of Merriam-Webster's
                  Collegiate Dictionary,
and one excised from Gloucester's soliloquy.

Listen to two tokens of "curtail", one read by a male reader in Merriam-Webster's Collegiate Dictionary, and one excised from Gloucester's soliloquy.
Click here

     Both readings are overarticulated, but with quite different effects (generated by the different cues pointed out above). The dictionary reading is precisionist, emphasizes the minutiae of articulation in a way that would be unacceptable in connected speech, even highly educated. This is intended as a prototype from which other performances deviate. As I have said, the dictionary reading may serve as an "objective" standard (that is, without distortion of personal feelings or versification requirements), from which the artistic recital deviates. The differences we have discerned between these two readings result from precisely such distortions. Consider, for instance, the overarticulated and aspirated plosives, and the prolonged [l]. I have pointed out their contribution to the solution of a rhythmic problem; at the same time, they indicate that these phonemes are exceptionally charged with such emotion as sarcasm or anger.

Figure 18
Wave plot and spectrogram of the two readings represented in Figure 17

     The New Critics attributed great aesthetic significance to the qualification which the various elements in a context receive from the context. This is, in fact, what I have called above "double-edgedness" or "aspect switching". The existence of an audio dictionary entry allows to compare overarticulated readings in and outside a context. The long-falling intonation curves on the second syllable of "curtail" are conspicuously similar in the two tokens of the word, and obviously overarticulate the word boundaries. In the two instances, however, they have very different functions. In the dictionary it indicates that this is an unconnected, stand-alone word. When the same contour is perceived in the performance of a verse line like excerpt 5, at the middle of a sequence of ten alternating weak and strong positions, it cannot indicate a stand-alone dictionary entry, especially when its last consonant is coarticulated with the ensuing preposition. Rather, it assumes two opposing grouping functions. Having the shape of a terminal contour, it groups the first five positions backwards, away from the second half, dividing the sequence into two halves of equal length but unequal structures. Briefly, it confirms the caesura where the stress maximum in the fifth position rules out more conservative ways of confirming it. At the same time, by assigning greater than usual accent to an upbeat, it begins an end-stressed group, leading forward. The contour interacts with its context in an additional way. In the dictionary entry, the contour curves smoothly down, reaching a point of stability at the bottom. In the dramatic recital, it changes direction at the downmost point, and moves sidewards on the prolonged [l], interfering with the stability achieved. Normally, such an appendage may go unnoticed; here, as I said, it is quite salient, mainly because it is spread over a relatively long time span. In fact, the changing details of the smoothly falling long pitch contour are more readily perceptible in the "dramatic" reading. The aspirated plosives, as well as the voice quality in general, foreground its emotive potential. Listening to the two tokens of "curtail" suggests that the sound quality too effects the emotional quality of the second token. Without going into details, a look at the spectrograms 15 (Figure 18) confirms that the sound quality may be quite different.

In respect of the qualification which the various elements in a context receive from the context, it is most illuminating to listen to the reading of excerpt 5 within its context in the soliloquy, and in isolation from its context. In isolation, the reading tends to be perceived as a fairly regular trochaic line, with its sixth position unoccupied (even though there is no measurable pause at that point, only a terminal contour). In context, the stress maximum in the fifth position is perceived as a conspicuous deviation from regularity. But, remember, it is the same reading in both conditions (the reading of excerpt 5 is excised from the reading of excerpt 1).

I am reluctant to attribute some expressive meaning to such a deviation; I am more interested in how competent readers handle such a deviant line. But some critics and actors would, undoubtedly, consider it as an iconic underpinning of the meaning of CURTAIL. This verb means "cutting off", but, according to Merriam-Webster's Collegiate Dictionary, it "adds an implication that in some way deprives of completeness or adequacy". Perhaps this is not at all an "either / or" situation. Perhaps, the solution is that the stress maximum in the fifth position may serve as an iconic underpinning of the meaning of CURTAIL, provided that a rhythmical solution is offered to handle the metric deviation.

In this last section I have broadened the theoretical scope of my inquiry in two directions. I have explored a range of issues related to the violation of metre by stress maximum in a weak position. In the passage under discussion we have encountered two instances which, strictly speaking, qualify as a stress maximum in a weak position. In one instance, Beale eliminated the problem: he shifted the stress to the first syllable of the verb "descant", relying on a rather doubtful alternative pronunciation. He rendered this inversion credible by overemphasising the word in a variety of ways as though this reflected some vigorous sarcastic tone. The other instance is a stress maximum in the fifth position, which is the position least tolerant of violation in the iambic pentameter line. We have followed Beale's heroic efforts to save this line from disintegration, in which he was remarkably successful. I have elsewhere explored at great length how experienced readers perform a stress maximum in the seventh position (the position most tolerant of violation). The vocal manipulations typically involved overstressing and overarticulation of the deviant stress, generating a propelling force toward the next downbeat where the stress pattern and metre coincide. In this excerpt this process is exceptionally strained, postponing the achievement of "focal stability" to the last strong position. In Beale's performance, an exceptionally long terminal contour of intonation fulfils a double function: it generates an exceptionally strong propelling force; and effects a clearly articulated caesura where more conservative solutions are precluded by the stress maximum in the fifth position. By the same token, I have extended to phonetic and prosodic phenomena the New Critics' semantic notion of the qualification which the various elements in a context receive from the context.

     To conclude. We have explored the "triple encodedness" of phonetic cues in metered dramatic speech. The phonetic cues that serve to identify ordinary speech sounds are manipulated such that they provide information about two additional dimensions of the text: its emotive import and rhythmic organisation. When listeners encounter some deviation from ordinary pronunciation, in the appropriate circumstances they tend to decode the distorted speech sounds as parts of two or three different sets. 16 I have pointed out three types of structural relationships between phonetic cues and poetic effects: redundancy, conflicting cues, overdetermination. Skilled actors are usually aware only of the intended effect, not the details of the vocal manipulations, just as you and I are capable of verbal communication, without being aware of the phonetic cues we use. The poet does not indicate what phonetic cues should be used, and in what manner. The actor generates his speech applying his "phonetic competence" to the written text. And once introduced, he exploits the phonetic cues for multipe purposes. That's how artistic creation works in general, on other levels of poetry, and in other artistic media as well.


1. This research has been supported by a grant from the Israel Science Foundation. This article is an expanded version of my (2000) article "Phonetic Cues and Dramatic Function -- Artistic Recitation of Metered Speech". Assaph -- Studies in the Theatre: 173-196. [Back to Main Text]

2. The four performers (Simon Russel Beale, Estelle Kohler, Clifford Rose, and Sarah Woodward) are members of the Royal Shakespeare Company; but there is no indication on the record who is reading what (I assume that the speech under consideration is spoken by Beale). [Back to Main Text]

3. Consequently, the present theoretical part of this paper is heavily drawing upon that book. [Back to Main Text]

4. Fónagy (1971: 160) discusses in great detail the articulatory gesture that produces the glottal stop, and the relation between aggression and the contraction of the glottal sphincter. [Back to Main Text]

5. Though the speech goes on to reveal his plans, Gloucester's relentless self-description comes to an end. [Back to Main Text]

6. The lower window presents the wave plot display which shows a plot of the wave amplitude (in volts) on the vertical axis, as a function of time (in milliseconds) on the horizontal axis. The upper window presents a fundamental frequency plot, which displays time on the horizontal axis and the estimated glottal frequency (F0) in Hz on the vertical axis. [Back to Main Text]

7. We still hear here an elusive separation between the two [s]s. The spectrograms give illuminating information about this, but I can't disscuss it here at any length. [Back to Main Text]

8. Exactly the same kind of perturbation can be seen in the two [p]s of "proportion" in figure 16. It is illuminating to compare the perturbation of the [p]s of "proportion" in this reading (second token) to a reading of the same word in the audio version of Merriam-Webster's Collegiate Dictionary (first token) in the following figure:

Listen to two tokens of "proportion", one read by a female reader in the audio version of Merriam-Webster's Collegiate Dictionary, and one excised from Gloucester's soliloquy.
Click here

9. Halle and Keyser (1966) claim that the stress rules of English haven't greatly changed since Chaucer's to our time. I have elsewhere discussed at considerable length the theoretical issues involved and the possible rhythmic performances of such lines (Tsur, 1998: 144158). [Back to Main Text]

10. I should not be surprised if it turned out that this line of Shakespeare's was the evidence for a possible stress on the first syllable. Halle and Keyser, at any rate, are sceptical about such putative stress shifts. [Back to Main Text]

11. Notwithstanding this equal duration, the /d/ in the reading reflected in Figure 13 is more salient than in those reflected in figure 14, owing to the great contrast between the two syllables, and the separation of loudness chunks within the first syllable, as pointed out above. [Back to Main Text]

12. I have elsewhere discussed at considerable length the nature of stress maxima in weak positions and their possible rhythmical performance (e.g., Tsur, 1997; Tsur, 1998: 3135; 193219), as well as the logical issues involved in metricality judgments (e.g., Tsur, 1998: 145; 195). [Back to Main Text]

13.This construct would be unmetrical under Kiparsky's generative theory too. Kiparsky allows only stressed monosyllables in weak positions; he rules unmetrical all instances of polysyllables that have their stressed syllable in a weak position, even if no stress maximum is involved. [Back to Main Text]

14. An aspirated speech sound is pronounced with or accompanied by aspiration. This can be illustrated by putting the back of one's hand before the mouth when uttering the pairs of words "which -- witch", or "pit -- spit". In the first member of each pair a stronger puff of air is felt. The phonetic feature [+/-aspirated] is phonemic in many languages; but when aspiration occurs where not required by phonemic contrast, it may suggest a wide range of emotions, including aggression and sarcasm (scornful or conceited attitudes and violent actions, for instance, are physiologically associated with the emission of abrupt blasts of air; consider the ambiguity of the verb "puff": "to blow in short gusts", and "to speak or act in a scornful, conceited, or exaggerated manner"). [Back to Main Text]

15. A Spectrogram shows the frequency components of a soundwave, with the relative intensity (indicated by darkness or color) at each frequency (vertical axis) plotted as a function of time (horizontal axis). For the sake of transparancy, I am using a gray scale rather than colour spectrogram. Gray plots use all the intensity gradations available between black and white on the computer monitor to delineate the energy levels in the spectrogram. The relative darkness at any point represents the relative energy level at that frequency and time. [Back to Main Text]

16. There are, of course, additional sets as well, such as regional and social dialect. [Back to Main Text]


Barney, Tom (1990) "The Forms of Enjambment". University of Lancaster unpublished MA dissertation.

Chatman, Seymour (1965) A Theory of Meter. The Hague: Mouton.

Chatman, Seymour (1966) "On the 'Intonational Fallacy'", QJS 52: 283286.

Cooper, C. W. and L.B. Meyer (1960) The Rhythmic Structure of Music. Chicago: Chicago UP.

Fónagy, Iván (1971) "The Functions of Vocal Style", in Seymour Chatman (ed.), Literary Style: A Symposium. London: Oxford UP. 159174.

Halle, Morris and Samuel Jay Keyser (1966) "Chaucer and the Study of Prosody", College English 28: 187219.

Knowles, Gerry (1991) "Prosodic Labelling: The Problem of Tone Group Boundaries", in Stig Johannson and Anna-Brita Stenström (eds.), English Computer Corpora. Selected Papers and Research Guide. (Topics in English Linguistics 3) Berlin: Mouton de Gruyter. 149163.

Preminger, Alex and T. V. F. Brogan (1993) The New Princeton Encyclopedia of Poetry and Poetics. Princeton: Princeton UP.

Tsur, Reuven (1977) A Perception-Oriented Theory of Metre. Tel Aviv: The Porter Institute for Poetics and Semiotics.

Tsur, Reuven (1992)What Makes Sound Patterns Expressive: The Poetic Mode of Speech-Perception. Durham N. C.: Duke UP.

Reuven Tsur (1997a) "Poetic Rhythm: Performance Patterns and their Acoustic Correlates". Versification: An Electronic Journal Devoted to Literary Prosody. (

Tsur, Reuven (1997b) "To Be Or Not To Be -- That is the Rhythm: A Cognitive-Empirical Study of Poetry in the Theatre". Assaph -- Studies in Theatre 13: 95-122

Tsur, Reuven (1998) Poetic Rhythm: Structure and Performance -- An Empirical Study in Cognitive Poetics. Bern: Peter Lang.

Wellek, René & Austin Warren (1956) Theory of Literature. New York: Harcourt, Brace & Co.

Recorded Readings
Beale, Simon Russel et al. reading William Shakespeare: Great Speeches and Soliloquies. Naxos AudioBooks Na 20 1512.

The Marlowe Society and Professional Players reading Shakespeare: The Sonnets. Argo ZPR 254.

To cite this article, use this bibliographical entry: Reuven Tsur "Phonetic Cues and Dramatic Function Artistic Recitation of Metered Speech". PSYART: A Hyperlink Journal for the Psychological Study of the Arts. Available March 3, 2024 [or whatever date you accessed the article].
Received: July 22, 2002, Published: September 25, 2002. Copyright © 2002 Reuven Tsur