Predicting mutual intelligibility in Chinese dialects

Chaoju Tang

Outline

Predicting mutual intelligibility in Chinese dialects

Chaoju Tang

2007

Abstract

We determined mutual intelligibility and linguistic similarity by presenting recordings of the same fable spoken in 15 Chinese dialects to naive listeners of the same set of dialects and asking them to rate the dialects along both subjective dimensions. We then regressed the ratings against objective structural measures (lexical similarity, phonological correspondence) for the same set of dialects. Our results

ICPhS XVI ID 1266 Saarbrücken, 6-10 August 2007 PREDICTING MUTUAL INTELLIGIBILITY IN CHINESE DIALECTS Tang Chaoju* & Vincent J. van Heuven1 Phonetics Laboratory, Leiden University (* also at Chongqing Jiaotong University) {C.Tang, V.J.J.P.van.Heuven}@Let.LeidenUniv.NL ABSTRACT We determined mutual intelligibility and linguistic similarity by presenting recordings of the same fable spoken in 15 Chinese dialects to naive listeners of the same set of dialects and asking them to rate the dialects along both subjective dimensions. We then regressed the ratings against objective structural measures (lexical similarity, phonological correspondence) for the same set of dialects. Our results show that subjective similarity is better predicted than subjective mutual intelligibility and that the relationship between objective and subjective measures is logarithmic. Best predicted was log-transformed subjective similarity with R2 = .64. Keywords: Dialectology, dialectometry, linguistic distance, (mutual) intelligibility, perceptual rating. 1. INTRODUCTION 1.1. Why study mutual intelligibility? Distance between languages is used as a criterion when arguing about genealogical relationships between languages. The more the languages resemble each other, the more likely they are derived from the same parent language, i.e., belong to the same language family. However, it is difficult to quantify the distance between languages onedimensionally since languages differ along many structural dimensions (e.g. phonetics, phonology, morphology, syntax). It is unclear how the various dimensions should be weighed against each other. Therefore, we select a single criterion − mutual intelligibility. Mutual intelligibility is an overall criterion that may tell us whether two languages are similar/ close. Useful work on structural measures of difference between related languages has been done, for instance, at Stanford University (for Gaelic Irish dialects, cf. [1]) and at the University of Groningen (for Dutch [2] and Norwegian dialects [3]), using the Levenshtein distance. This is a similarity metric that computes the mean number of string operations needed to convert a word in one [extra files] language to its counterpart in the other language. This measure was then used to build a tree structure (through hierarchical cluster analysis) which matched the language family tree as constructed by linguists. 1.2. How to determine (mutual) intelligibility? Although methods for determining intelligibility are well-established, for instance in the fields of speech technology and audiology, the practical problems are prohibitive when mutual intelligibility has to be established for, say, all pairs of varieties in a set of 15 dialects (yielding 225 pairs). Rather than measuring intelligibility by functional tests, opinion testing has been advanced as a shortcut. That is, the indices of the measurements of mutual intelligibility between languages are generated from listeners’ judgment scores. Once mutual intelligibility scores are available, the relative predictive power of structural dimensions can be found through regression analysis. Such work has recently been done for 15 Norwegian dialects by Gooskens and Heeringa [3] (henceforth G&H). Their results show that subjectively judged distance between sample dialects and the listener’s own dialect correlated substantially with the objective Levenshtein distance (r2 = 0.449). The Levenshtein distance increases rapidly when the word pairs in two languages are noncognates. For non-cognates any sound correspondence is accidental, so that the Levenshtein distance will be close to 100. It might therefore be more informative to break the one-dimensional Levenshtein distance down into two separate parameters, i.e. (i) the percentage of cognate words shared between the vocabularies of two language varieties and (ii) the phonological distance computed for the cognate part of the vocabulary only. This is what we did in our study. We included both predictors of mutual intelligibility in order to estimate the strengths of the two predictors as well as their intercorrelation. The work done by G&H represents a complication relative to earlier work in that their Norwegian dialects are tone languages whilst the www.icphs2007.de 1457 ICPhS XVI Saarbrücken, 6-10 August 2007 Gaelic Irish and Dutch dialects are not. Since it is unclear how tonal differences should be weighed in the distance measure, G&H collected distance judgments for the same reading passages resynthesized with and without pitch variations. The difference in judged distance between the pairs of versions (with and without pitch) would then be an estimate of the weight of the tonal information. Norwegian, however, is a language with a binary tone contrast. We want to test G&H’s method on full-fledged tone languages, with much richer tone inventories varying from four (e.g. Beijing/Mandarin) to as many as ten (e.g. Cantonese/Yue). Finally, it should be realized that perceived distance between some dialect and one’s own is not necessarily the same as an intelligibility judgment. The third aim of our paper is to test to what extent judged distance and judged intelligibility actually measure the same property. 1.3. Earlier work Chinese dialect classification is still controversial. Nevertheless, there is broad consensus on the primary relationships within the Sinitic languages: there is a first split between the Mandarin group (comprising the Northern, Eastern and South-western families) and the Southern group (comprising the Wu, Gan, Xiang, Min, Hakka and Yue families). Cheng [4] has computed structural similarity measures for all pairs of these Chinese dialects. We have used two of his measures (see § 2.2) as predictors of mutual intelligibility between pairs of Chinese dialects in the present study. 2.2. 2. METHODS 2.1. Collecting judgments We targeted 15 Chinese dialects (a subset from [4]), from the Mandarin group: Beijing, Chengdu, Jinan, Xi’an, Taiyuan, Hankou; from the Southern group: Suzhou, Wenzhou (Wu family), Nanchang (Gan family), Meixian (Hakka family), Xiamen, Fuzhou, Chaozhou (Min family), Changsha (Xiang family), and Guangzhou/Cantonese (Yue family). We used existing recordings of the fable “The North Wind and the Sun”. Since each fable had been read by a different speaker (11 males and 4 females), we processed the recordings (using [5]) such that all speakers sounded like males, all had roughly the same articulation rate and speechpause ratio, and the same mean pitch.2 Also, each reading of the fable was produced in two melodic 1458 versions, i.e., one with the original pitch intervals kept intact, and one with all pitch movements replaced by a constant pitch (monotone), which was the same as the mean pitch of the fragment with melody (and the same as all other fragments). The 2 × 15 readings of the fable were recorded onto audio CD in one of four different random orders. The 15 monotonized versions preceded the 15 versions with melody. For each of the 15 dialects 24 native listeners were found in the middle to older generation (ages between 40 and 60), evenly divided between males and females. All 360 listeners were born and bred in their respective dialect areas. Listeners were mono-dialectal so that they had no experience with any other Chinese dialects (though all had some familiarity with Standard Mandarin). Each CD was played through loudspeakers to six (three female, three male) listeners per dialect. Listeners rated the materials twice: the first time they estimated on a scale from 0 to 10 how well they believed a monolingual listener of their own dialect, confronted with a speaker of the dialect in the recording for the first time in their life, would understand the other speaker. Here ‘0’ stood for ‘S/He will not understand a word of the other speaker’ whilst ‘10’ represented ‘S/he will understand the other speaker perfectly’. In the second judgment the listener rated the similarity between her/his own dialect and the dialect of the speaker in the recording, where ‘0’ meant ‘No similarity at all’ against ‘10’ meaning ‘This dialect is exactly the same as my own’. In all 21,600 judgments were collected and statistically analyzed. Structural measures We used two objective measures of structural distance between pairs of Chinese dialects. Both measures were generated by [4]. The first measure, which we call the Lexical Similarity Index (LSI), can be conceived of as the percentage of cognates shared between the vocabularies of two language varieties. Obviously, the higher the number (and token frequencies) of cognate words a listener encounters in a non-native dialect, the easier it will be for her/him to understand the message. We simply copied the values published in appendix 3 of [4].3 Cheng’s second measure basically captures the regularity of the sound correspondences in the sets of cognate words shared between two dialects. Cognates between two dialects will be easier to www.icphs2007.de ICPhS XVI Saarbrücken, 6-10 August 2007 recognize if they contain the same sounds in the same positions in the words, or if the sounds can be converted from one dialect to the other by a simple and general rule. In [4] the counts were converted to a coefficient ranging between 0 (no phonological correspondence at all) to 1 (perfect sound correspondence). We call this measure the Phonological Correspondence Index (PCI). We copied the PCI values in appendix 5 of [4]). 3. RESULTS 3.1. Objective and subjective measures We generated 15 x 15 matrices for each of the six measures for the 15 target dialects: (a) objective lexical similarity (LSI, only 13 dialects), (b) objective phonological correspondence (PCI), (c-d) subjective intelligibility judgments for stimulus versions with and without melody, and (e-f) subjective similarity judgments for versions with and without melody. From the matrices (not presented due to lack of space) hierarchical cluster trees were derived using the method of average linking. The trees (not presented) show a rather poor congruence. Even the primary split between Mandarin and Southern dialects is not correctly reproduced in the trees. Typically, the arguably Southern dialects Changsha and/or Nanchang are incorrectly parsed with the Mandarin dialects. Generally, the degree of congruence is better between the two subjective ratings than between the objective measures. We will now first examine the relationship between the two subjective measures, and then see how well these subjective ratings can be predicted by some combination of objective similarity measures. 3.2. Predicting intelligibility from similarity We used the proximity between the members of every single pair (N = 105) of dialects out of the set of 15 as our measure of closeness between the members. Proximity matrices are symmetrical; the redundant part of the matrices was deleted before we correlated the proximity values obtained from the intelligibility ratings and similarity ratings. The result shows that judged intelligibility correlates with judged similarity (N = 105 pairs of values) at r = .949 (p < .001). This means that the two sets of ratings can be predicted from each other with a very high degree of accuracy. Moreover, visual inspection of the corresponding scatterplot (not presented) reveals no specific outliers, so that the conclusion follows that subjectively estimated si- milarity between pairs of languages is an exceptionally good predictor of, or even a near-perfect substitute for, estimated intelligibility. 3.3. From objective to subjective measures In Table 1 (next page) we have specified how well judged intelligibility and judged similarity can be predicted from the objectively determined LSI and PCI measures. We also computed correlation coefficients between objective and log-transformed subjective measures; these generally yield higher rvalues. A separate series of computations was done on the scores after excluding Beijing (which is almost identical to Standard Mandarin) as one of the dialects. Moreover, all the computations were done once with the judgments based on the sound stimuli with full melodic information and a second time with judgments based on the monotonized versions. Finally, we list the results of selected multiple regression analyses (with LSI and PCI entered in the analysis together for only the optimal combinations of conditions) in order to determine the cumulative effect of the predictors. 4. CONCLUSIONS A number of conclusions can be drawn from Table 1. First, the two objective measures of structural similarity, PCI and LSI, are always significantly correlated with all of the subjective ratings. Moreover, the two predictors are only moderately intercorrelated so that there is potential room for improvement of the prediction through multiple regression. The success of multiple regression is demonstrated most clearly in the prediction of logtransformed similarity for versions with melody and Beijing dialect excluded: here the accuracy of the prediction (coefficient of determination, i.e. r2 or R2) from both objective measures together (64%) is 7 percentage points better than that from the best single predictor (57%). It is even 19 percent than the single r2 in G&H [3] (see § 1.2). The latter result shows that better prediction of judged similarity and intelligibility can be obtained when a one-dimensional objective phonological distance measure is broken down into two separate parameters, one covering the proportion of cognates shared between two vocabularies and the other targeting the phonological similarity in the shared cognates only – as was assumed all along by [4]. Second, similarity judgments can be predicted more successfully (higher r-values) than the corresponding mutual intelligibility judgments. www.icphs2007.de 1459 ICPhS XVI Saarbrücken, 6-10 August 2007 Third, the prediction of log-transformed judgments is better than of the corresponding linear measures. This effect has been found in many other studies on the relationship between objective counts on language use and the subjective impression of such phenomena, e.g. in the area of word token frequency. Fourth, the ratings based on versions with full melodic information can be predicted substantially better from the objective measures than those based on monotonized versions. This indicates that melodic information should carry a rather heavy weight in the ultimate prediction of ratings in the Chinese language situation. Fifth, leaving out the Beijing dialect yields clearly better predictions of judged similarity and of mutual intelligibility. It would make sense, in the Chinese language context, where almost every language user has had some basic exposure to the standard language (which is very close to the Beijing dialect), that the naive raters may appreciate the structural difference between dialects better than the mutual intelligibility. Table 1. Correlation coefficients (r) and number of dialect pairs involved (N) between two measures of objective structural similarity and subjective intelligibility and similarity ratings. Multiple R is indicated for optimal conditions only (see text). Variables and conditions Cheng’s LSI Judged intelligibility, melody Judged intelligibility, monotone Judged similarity, melody Judged similarity, monotone Log judged intelligibility, melody Log judged intelligibility, monotone Log judged similarity, melody Log judged similarity, monotone Judged intelligibility, melody, no Beijing Judged intelligibility, monotone, no Beijing Judged similarity, melody, no Beijing Judged similarity, monotone, no Beijing Log judged intelligibility, melody, no Beijing Log judged intelligibility, monotone, no Beijing Log judged similarity, melody, no Beijing Log judged similarity, monotone, no Beijing Cheng’s PCI r N .763** 77 .527** 105 .482** 105 .622** 105 .523** 105 .647** 105 .600** 105 .703** 105 .616** 105 .591** 91 .548** 91 .648** 91 .552** 91 .703** 91 .658** 91 .696** 91 .631** 91 Cheng’s LSI r N .423** .378** .558** .482** .591** .536** .694** .626** .576** .537** .701** .629** .710** .667** .753** .713** 77 77 77 77 77 77 77 77 65 65 65 65 65 65 65 65 Both R .636** .742** .753** .798** **: p < .01 (two-tailed) 5. REFERENCES NOTES 1. The first author acknowledges the Leiden University Fund / Van Walsem Fund for a (partial) travel grant in order to attend the 16th ICPhSc. 2. The mean pitch was normalized to the mean of the 11 male speakers. Relatively small shifts in pitch (in semitones) were performed (using the PSOLA pitch manipulation implemented in the Praat software) on the male speakers, larger shifts were required for the female voices. For the female speakers a gender transformation was carried out by decreasing the formants by 15%. Longer pauses were reduced to 500 ms, and the remaining speech was linearly speeded up or slowed down (in the same PSOLA manipulation that changed the pitch) such that the articulation rate (syll./s) was the same for all speakers (sound files on CD). 3. No LSI values are listed for Taiyuan and Hankou in [4]. 1460 [1] [2] [3] [4] [5] Kessler, B. 1995. Computational dialectology in Irish Gaelic. Proc. European ACL, Dublin, 60–67. Heeringa, W. 2004. Measuring dialect pronunciation differences using Levenshtein distances. Groningen dissertations in linguistics nr. 46, Groningen University. Gooskens, C., Heeringa, W. 2004. Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data. Language Variation and Change 16, 189–207. Cheng, C.C. 1997. Measuring Relationship among Dialects: DOC and Related Resources Computational Linguistics & Chinese Language Processing 2.1, 41–72. Boersma, P., Weenink, D. 1996. Praat, a system for doing phonetics by computer, version 3.4, Report 132, Institute of Phonetic Sciences, University of Amsterdam. www.icphs2007.de

References (6)

REFERENCES
Kessler, B. 1995. Computational dialectology in Irish Gaelic. Proc. European ACL, Dublin, 60-67.
Heeringa, W. 2004. Measuring dialect pronunciation differences using Levenshtein distances. Groningen dis- sertations in linguistics nr. 46, Groningen University.
Gooskens, C., Heeringa, W. 2004. Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data. Language Variation and Change 16, 189-207.
Cheng, C.C. 1997. Measuring Relationship among Dialects: DOC and Related Resources, Computational Linguistics & Chinese Language Processing 2.1, 41-72.
Boersma, P., Weenink, D. 1996. Praat, a system for doing phonetics by computer, version 3.4, Report 132, Institute of Phonetic Sciences, University of Amsterdam.

Predicting mutual intelligibility in Chinese dialects

Sign up for access to the world's latest research

Abstract

Related papers

References (6)

Related papers

Related topics