Effects of mode of presentation and mode of speech on listener perceptions of voice, speech and personality following supracricoid laryngectomy

Background: There is a paucity of information on listener perceptions of Individuals with a Laryngectomy (IWL) based on different modes of speech, in particular, speech following Supracricoid Laryngectomy (SCL). The purpose of this study was to determine whether listeners have different perceptions of an IWL based on type of surgery, mode of speech, and mode of presentation. Methods: 35 naïve listeners (29 female, 6 male, mean age 31.1 years) were randomly presented with recordings of a standard reading passage produced by 15 different speakers (5 modes of speech x 3 speakers each mode) in both audio-only and audio-visual presentation mode. Listeners rated each speaker using a visual analog scale (10 cm line) on factors related to personality, comfort of speech, and voice quality. Results: A multivariate Analysis of Variance (MANOVA) showed signi ﬁ cant differences in mode of presentation ( p <.001), mode of speech ( p <.001), and a signi ﬁ cant interaction effect between mode of presentation and mode of speech ( p <.001). Conclusions: Overall results suggest the following: IWL are perceived more favorably in the audio-visual mode; normal laryngeal speakers are perceived more favorably than all modes of alaryngeal speech and esophageal speech was perceived as the least favorable across most of the factors.


Introduction
Laryngeal cancer is the most common form of malignancy of the head and neck [1]. The American Cancer Society estimates that in 2020, 12,370 new cases of laryngeal cancer will be diagnosed and that there will be 3,750 laryngeal cancer related deaths in the United States. Traditionally, the Total Laryngectomy (TL) combined with a neck dissection was the treatment of choice for advanced laryngeal cancer [1]. However, more recent research suggests that an alternative to TL that highlights conservation surgery or 'functional laryngeal preservation' [2] has also been shown to be an effective treatment [3][4][5]. Aside from being an effective treatment option, the SCL procedure results in dramatically different visual and acoustic changes relative to TL. This surgical treatment option, however, may not be as familiar to Speech-Language Pathologists (SLPs). The following introduction offers a review of the SCL procedure along with resultant voice, speech, and quality of life information, followed by a review of the impact of visual information on speech and voice disorders.

Supracricoid laryngectomy
The relatively recent surgical development aimed at functional preservation of the larynx, the SCL, may be used as either an initial treatment or as a salvage option for advanced laryngeal cancers [6]. The SCL was initially developed in Europe in the 1950's but was not found in the medical literature in the United States until the 1990's [1]. Although the SCL has been used in other countries, the adoption of the SCL in the United States has been slow [7]. The reasons for the absence of the SCL are debatable, although technical diffi culties of the procedure [1] and intense post-operative rehabilitation [5,8] have been suggested. Lai and Weinstein [6] argued that the technique was not embraced by surgeons in the United States until numerous studies with large sample sizes were published reporting the oncologic and functional successes. Schindler and colleagues [5] suggested that certain countries have not adopted the SCL due to the complexity of post-surgical management and increased variability in results. Oncologic outcomes following SCL have consistently shown the procedure to have excellent local control with low mortality rates [5,9]. The oncologic results are especially of note given that the SCL is completed without the need for a permanent tracheostoma which is a required sequelae following a TL. The presence of a stoma and respiratory complications associated with the stoma has been identifi ed as signifi cant concerns for patients following a TL [10,11].
The body of research on SCL is growing, specifi cally with regards to oncologic and swallowing outcomes, but currently, there is minimal information on the impact of a SCL on a patient's overall Quality of Life (QoL) and psychosocial status.
Of the few existing studies, the consensus is that SCL results in improved psychosocial measures relative to TL [12,13]. Results of voice and speech related QoL measures following SCL are also limited. Makeieff and colleagues [14,15] used the Voice-Handicap Index (VHI) to assess the impact of altered voice function following SCL. Results of both studies suggested that the resultant speech following SCL has a substantial impact on social and professional activities, especially for those patients that rely on their voice. Dworkin, et al. [16] also used the VHI to compare voice handicap in SCL and TL patients. Results showed no signifi cant difference between the two types of surgery and that both SCL and TL patients experience moderate diffi culties in communication due to their voice function. In contrast, Saito, et al. [17] used the Voice-Related QoL (VRQoL) but interpreted their results to suggest that patient's experience 'little inconvenience' in terms of speech after surgery. Weinstein, et al. [18] [12] showed increased communicative competence with patients in Greece that had conservation surgery for laryngeal cancer.
In summary, the limited research in this area presents with mixed results.
Research has also addressed the resultant voice signal following SCL and a brief summary of the results are presented.
Using a standardized perceptual rating system (GRBAS), most research describes the SCL voice as breathy and rough [4,17,19], or hoarse-strained [16]. Results of acoustic analyses typically show the SCL voice to be substantially different than normal laryngeal voice [17]. Although most studies identify the speech following SCL to be functional and intelligible [20,21], patients have reported their speech to be severely dysphonic [19].
To summarize, research exists on a variety of areas concerning the SCL procedure, from oncologic outcomes to voice related to QoL, among others. However, one important aspect that is missing from the literature is how listeners will perceive the resultant voice signal or even the person.
Comparisons to other treatments for laryngeal cancer or other alaryngeal speech modes may also be an important factor to consider in those instances where persons with laryngeal cancer are presented with treatment options. To date, there is no information on listeners' impressions of SCL speakers as compared to normal laryngeal or other speakers with a TL and who use a form of alaryngeal speech (tracheoesophageal, esophageal, electrolarynx). There are studies from the TL literature that suggest that social interaction has an effect on overall QoL. For example, Deshmane, et al. [22] reported that 70% of TL patients suffer from decreased social acceptance and 82% suffer from reduced social activity. Nalbadian and colleagues [23] reported the communication problems with unfamiliar people were reported by 57% of the TL patients.
Such results may be related to the physical appearance of the TL speaker (e.g., presence of a stoma, alteration of the vocal tract) rather than solely related to a different voice.
In an experiment that involved tracking a communication partner's eye-gaze, Evitts and Gallop [12] foun different patterns of eye-gaze dependent on the type of alaryngeal speech used. For instance, during conversation with a speaker that used profi cient esophageal or profi cient electrolaryngeal speech, partners would direct their gaze predominantly at the lower face of the speaker [12]. When conversing with a normal laryngeal speaker or a speaker that used profi cient tracheoesophageal speech, 61% of the partners gaze would be focused on the lower face and 38% of the gaze would be divided among the background, lower face, and eyes [12]. The authors attributed the difference in part to the inherent visual nature of esophageal and electrolaryngeal speech. That is, the extraneous facial movements associated with esophageal speech production (i.e., injection of air) and the addition of a mechanical device served to alter the eye gaze of the conversational partner and created a non-typical social interaction. and that these impressions are impacted by the inclusion of visual information. Due to the inherent differences between normal laryngeal and alaryngeal speech [26,27] it may not be appropriate to extrapolate information from laryngeal speakers to alaryngeal speakers. In addition, IWL can present with signifi cantly altered visual information than other disordered populations, including decreased vocal tract volume, presence of a stoma or the use of a prosthetic or mechanical speaking device.

Visual information and disordered speech
Historically, studies in the fi eld of alaryngeal speech and perception have recognized the importance of visual information in perceptual studies. One of the earliest references to this was in 1955 when Hyman called for the use of 'motion picture fi lms with sound' to study the visual aspects of esophageal and electrolaryngeal speech. Numerous other researchers followed this suggestion by including such things as, for example, observations in real-time from gas station attendants when speaking to a person who used electrolaryngeal speech [28]. Overall results have consistently showed that listeners perceived the esophageal speakers more negatively than the normal speakers across all measures. Although these studies provide important insight into how listeners perceive IWL in terms of personality, voice, speech, acceptability, among others, there are currently no studies that compare TL to new surgical treatment methods for laryngeal cancer.
The primary purpose of this study was to provide insight into differences in listener impressions based on mode of presentation (audio-only vs. audio-visual) and mode of speech (normal laryngeal, tracheoesophageal, esophageal, electrolaryngeal, SCL). Specifi c research questions are as follows: 1. Is there a difference in listener impressions based on mode of presentation (audio-only, audiovisual)?
2. Is there a difference in listener impressions based on mode of speech (supracricoid, tracheoesophageal, esophageal, electrolaryngeal, normal laryngeal)?
Clinically, the information yielded from this study may provide important insight for people diagnosed with laryngeal cancer on how they may be perceived with their new form of voice. In addition, results of the study may have an impact on the type of surgery the surgeon recommends to the person with laryngeal cancer. For instance, if results show that supracricoid laryngectomy results in improved listener impressions and the person is a candidate for organ preservation surgery [29] for requirements of the surgeon and Turfano, 2002 [30] for key principles of organ preservation surgery), then these results should be taken into account.

Methods
The study was approved by the Institutional Review Boards of both Towson University and Johns Hopkins School of Medicine. All participants provided written consent agreeing to the use of their images and voices for the purposes of the research study.

Speaker selection
Five modes of speech (normal laryngeal, tracheoesophageal, esophageal, electrolaryngeal, SCL) were included in the study.
The methods and criteria for speaker selection were similar to recent studies [31,32]. Briefl y, three speakers from each mode of speech were selected from a collection of recordings.
Inclusion criteria for all speakers included: standard Midwest dialect, English as their primary language, fl uent and effortless speech production. Exclusion criteria included presence of facial hair, signifi cant facial asymmetry, facial scars other than those associated with the laryngectomy, and a history of stroke or other neurological disorder that affects speech or cognition.
The speakers were then informally assessed by two licensed and certifi ed SLPs with at least 10 years of clinical voice experience. The SLPs assessed whether or not each speaker was 'typical' for that mode of speech and informally rated their speech intelligibility using a Likert-style rating scale following the presentation of a reading passage produced by each speaker (poor-average-above average). From those speakers who had intelligibility ratings of 'average' or 'above average' and were rated as 'typical', a fi nal group of speakers that were used for the experiment was collected. Efforts were then taken to age-

Speaker recording
Speakers were recorded in quiet room while seated and with a bare wall behind them. Following consent and a description of the study, a headset microphone (AKG, C 420 III) was placed on the head of each subject and the microphone itself was placed two inches from the corner of the mouth. The microphone was directly connected to a video recorder (Sony, DCR-HC30) and recorded on digital videotapes (Panasonic, DVM 60). Speakers were provided a copy of the grandfather passage [33] to review prior to being recorded. A tripod was used to elevate the video recorder on a table top which was placed in front of the speakers. Efforts were made to have the speaker and video recorder to be in the same horizontal plane. The grandfather passage was displayed on an 8 ½" x 11" sheet of white paper with a 1" hole cut out of the center of the page. This was done so that it would appear that the speaker would maintain eyecontact with the video recorder while reading. Speakers were asked to sit relatively still other than movements needed for voice production (e.g., digital occlusion). Each speaker was recorded with his entire head and neck in the frame and a small portion of the bare background.
Individual audiovisual fi les were created for each speaker using a video editing software program (Final Cut Pro X, Apple Inc.). Any noise or visual movements (e.g., stomal noise at the beginning of a sentence) associated with speech production for any modes of speech were included in the fi nal fi le. The

Rating procedure
Participants were individually seated in a quiet room in front of a computer and a 22" LCD computer monitor (Acer AL2216W).
Seating was arranged so that there would be approximately two feet between the LCD monitor and the participant. Audio was provided through a pair of noise-cancelling headphones (Sony MDR-NC60) connected to the computer. Each participant was instructed that they would be presented with a series of people reading a standard reading passage and that some fi les would be audio-only and some would be both audio and visual. Following the passage, the participants were instructed to rate that speaker using the given rating sheet. Participants were then provided the rating sheet and a brief explanation of how to use a visual analog scale was provided. Once each participant stated they understood the procedure and the scale, each participant was then presented all 30 fi les (15 speakers x 2 modes of presentation) in a randomized order.
The visual analog rating scale was based on earlier versions [31,34] and uses positive and negative anchors at each end of a 10 cm line. Users are requested to make a mark on the line nearest the term that they feel best describes the person. Using digital calipers, the distance is then measured in millimeters   were selected [36]. Table 1 shows the factors and percent of variances for both audio-only and audiovisual. Once the factors are identifi ed, the examiner then needs to identify the theme within those factor ratings. Ratings within each factor for both modes of presentation were consistent with the listener rating sheet. Thus, the ratings within Factor 1 were related to the speakers' personality (factor loadings .904-.728), the ratings within Factor 2 were related to comfort of speech (factor loadings .648-.849), and the ratings within Factor 3 were related to voice quality (factor loadings .831-.578). Factor loadings greater than .400 are considered to be strong [36].

Descriptive analysis
The three factors accounted for 78.7% of the variance within the audio-only mode and 79.16% of the variance within the audio-visual mode.

Mode of presentation
In fact, the 14% difference in personality rating is nearly identical to the difference reported in an earlier study using similar methods (Evitts, et al. 2009). That study, however, only included one speaker from each mode of speech while this study increased that amount to three speakers per mode and added an additional mode of alaryngeal speech (SCL). The mean listener ratings in the audiovisual mode for personality were also relatively consistent between the two studies: 34 mm previously compared to 41 mm in the current study, suggesting increased validity for the current study. Across all three factors, listeners had similar perceptions of the SCL and tracheoesophageal speakers. As discussed earlier, the general hierarchy in the literature is that normal laryngeal speech is more favorable than tracheoesophageal speech which is more favorable than esophageal speech which more favorable than electrolaryngeal speech. Including a relatively newer form of conservation surgery, specifi cally SCL, suggests that SCL speech approximates tracheoesophageal speech in that hierarchy. However, SCL speech may actually be closer to normal laryngeal speech in that it is produced with pulmonary airfl ow (as is tracheoesophageal) but also uses laryngeal tissue for the vibratory source whereas tracheoesophageal speech uses upper esophageal sphincter and cricopharyngeal muscle fi bers for the newly created pharyngoesophageal segment. This distinction may have importance when it comes to the brain processing the signal. Specifi cally, the brain has been shown to discriminate between human vocalizations and non-human vocalizations [43] and it appears voice or speech that most closely approximates human vocal fold vibration requires less Citation: Evitts  Rate of speech was also addressed across speakers and a oneway ANOVA of words per minute for the grandfather passage by mode showed that esophageal speech was signifi cantly slower than normal and SCL speech. This is consistent with previous research [45] and may provide additional validity that the speakers used in this study represented fall within that category of 'typical'.
One particular mode of speech that may present with increased inherent differences is SCL. As discussed earlier, this surgery yields either one of two types, CHEP or CHP. Due to the decreased amount of tissue resected, CHEP has been shown to be more favorable [8]. In this study, two of the subjects were CHEP and one was CHP. Comparisons across speakers by type of SCL showed signifi cant differences in all three factors in both audio-only and audiovisual mode although the trend was that the speakers with a CHEP were perceived as more favorable than the speaker with a CHP. However, there was also signifi cant differences present between the two CHEP speakers. These fi ndings are consistent with previous research indicating that CHEP may be more favorable than CHP and may add to the validity of the current study. In addition, the differences present between the two CHEP speakers support the notion increased heterogeneity in most disordered speaker populations.

Interaction effect of mode of presentation and mode of speech
Aside from there being signifi cant effects of mode of presentation and mode of speech, there was also a signifi cant interaction effect between the two variables. Subsequent analyses showed that this effect was only shown in the audiovisual condition and not in the audio-only condition. Overall results in the audiovisual condition for all three factors showed that normal laryngeal speech was more favorable than all modes of alaryngeal speech. Additionally, the voice quality of esophageal speech was found to be signifi cantly less favorable than all other modes but only in the audiovisual condition. This fi nding highlights the importance of visual information when discussing listener perceptions. That is, there are specifi c visual components inherent to each mode of speech that infl uences how listeners perceive a speaker. The qualitative comments (Table 3) provide insight on this interaction. Comments from question 1 (i.e., words used to describe the speaker) were consistent with expectations. For example, numerous comments identifi ed electrolaryngeal speech as mechanical sounding and esophageal speech as choppy which is consistent with the reduced airfl ow and subsequent reduced rate of speech observed with esophageal speech.
Responses to question 2 (i.e., anything distracting) provide much more insight into what characteristics listeners found salient. Some of those inherent traits for each mode are visual in nature and the current results suggest that these inherent visual traits may impact listener perceptions. For instance, 30 listeners commented about the tracheoesophageal speakers touching their throat 1 , 27 listeners commented about the facial movements for the esophageal speakers, and 32 listeners commented about being distracted by the device itself or the person touching their throat or hand movements. There were no responses in the audiovisual mode for the normal laryngeal or the SCL speakers. These results in combination with other results of the current study suggest that SCL speech may not only be closer to normal laryngeal speech in terms of speech production, but also with the visual component of speech production. Those inherent visual traits associated with other speech modes may not only impact listener perceptions of personality or voice quality, but may also impact speech perception overall. For example, head movements on normal speakers have been associated with the speakers' fundamental frequency and amplitude of the speech signal whereas altered head movements was shown to result in decreased speech perception (Munhall, et al. 2004). Additionally, recent neuroimaging data showed that when listeners are presented with degraded auditory stimuli, listeners increased their attention to the visual information [46]. 1 It should be noted that hand-free tracheoesophageal prosthetics are available which would not require the speaker to occlude their stoma for voice production. However, for a variety of reasons only a small percentage of tracheoesophageal speakers use a hands-free device.
Citation: Evitts  Although SCL speech may be considered a degraded auditory stimulus compared to normal laryngeal speech [47], the visual information most closely approximates that of normal laryngeal speech and thus from a perceptual standpoint, the listener treats it in a similar fashion. Moreover, it may be those inherent visual characteristics of speech production from the esophageal or electrolaryngeal speakers in particular, are directly related to the current fi ndings. That is, the more distracting the visual information, the more it impacts the listener. This incongruence between visual and auditory signal has been implicated in the reduced speech intelligibility observed by IWL, thus creating a McGurk effect of sorts [44].
Clearly, more research is needed to delineate the role of visual information in speech processing for speakers with an SCL.
Although signifi cant results are reported here and additional insight into listener perceptions are provided, the low eta squared values for each of the variables suggest other factors are infl uencing listener perceptions. In a previous study on speech intelligibility, Evitts, et al. [44] reported that approximately 80% of the variability associated with speech intelligibility was accounted for by mode of speech. However, when the interaction between mode of speech and mode of presentation was considered, values of 6% to 23% were reported [44]. Individual speaker differences may have played a role in this study as it included three speakers from within each mode. Although this increases the ability to generalize, it may alternatively decrease the variability accounted for. This was originally argued by Kalb and Carpenter [48] who stated that individual speaker characteristics played a larger role in speech intelligibility than mode of speech. That same infl uence of individual speaker characteristics may be true for listener perceptions as well. More research is needed to shed light on this issue.

Limitations
There are several limitations to this pilot study that make it diffi cult to generalize to other speakers with a laryngectomy.
First, the disordered speakers were selected based on experienced SLPs' rating as 'typical' and having average or above average intelligibility. As clinicians and health care professionals working with this population know so well, there is a great deal of heterogeneity in this population with regard to voice function following any form of treatment for laryngeal cancer. Although three speakers from each mode were included in this study and were considered by experienced SLPs to be 'typical' for their mode, additional research is warranted on those that may not represent 'typical'. Moreover, additional research is warranted on those with decreased intelligibility in an attempt to better understand the relationship between intelligibility and listener impressions. Ideally, future research would consist of a large sample size with numerous speakers in each mode of speech representing varying degrees of intelligibility and voice quality. Second, the sentence stimuli that were used were initially intended to balance phonemic information but not visual information. Future visual processing research should control for this and other possible effects, including semantic and syntactic predictability [49][50][51][52][53][54][55]. Third, the listeners used in the current study may not represent the peer group of the population. Listeners in this study were predominantly young females and future research should seek to include persons that would better represent the peer groups of the patient population. This would include the use of spouses as potential listeners. Finally, only males with a laryngectomy were included in this study. Since more women are being diagnosed and treated for laryngeal cancer, similar studies should include the effects on females.

Conclusion
The purpose of this experiment was to investigate the effect of mode of speech and mode of presentation on listeners' perceptions of speech following surgical treatment for laryngeal cancer. In particular, this study sought to include a relatively new form of conservation surgery, supracricoid laryngectomy, as this form of treatment has been associated with improved QoL compared to TL [47]. Although there is research on a variety of outcomes following SCL, there is a lack of research on how listeners perceive this mode. Mean listener perceptions across ratings suggest that all modes of speech were either perceived as favorable or neutral for items related to personality (30-59 mm on a 100 mm visual analog scale) or comfort of speech (14-48 mm). Mean listener ratings for voice quality showed that normal was perceived as favorable but all modes of alaryngeal speech were perceived as less than favorable (61-76 mm). Overall results of the current study suggest that normal laryngeal speech is perceived as more favorable than all modes of alaryngeal speech across ratings of personality comfort of speech, and that SCL speech was found to be at least equal to tracheoesophageal speech in all three areas as well. Additionally, esophageal speech was consistently perceived as the least favorable across all ratings and listener qualitative comments suggest that the extraneous facial movements of the esophageal speakers may be associated with this fi nding. Furthermore, the personality of SCL speakers was perceived as the most favorable among all the modes of alaryngeal speech and the voice quality and comfort of speech of the SCL speakers were found to be more favorable than the esophageal or electrolaryngeal speakers. Supracricoid laryngectomy has been associated with improved QoL which may be primarily due to the lack of a permanent stoma as is the case following TL. However, this improved QoL may also be a function of more favorable listener perceptions. When treatment options are available for laryngeal cancer, this study supports the increased utilization of the SCL surgery.