THE CONCEPTS OF EQUIVALENCE, GAIN AND LOSS (DIVERGENCE) IN ENGLISHURDU WEB-BASED MACHINE TRANSLATION PLATFORMS Sharmin Muzaffar* & Pitambar Behera** (sharmin.muzaffar & pitambarbehera2)@gmail.com Abstract: Equivalence, gain and loss (divergence) are well-established and the most prevalent concepts in both the theoretical and applied translation studies. Equivalence denotes to the ‘ideal or perfect translation’ from the SL to TL and is inherently vital since it is indispensably reader-oriented or listener-oriented. In the process of translation, total equivalence is hardly achieved because of some gain and loss. Gain and loss can be accounted for the differences or divergences between languages viz. SL and TL. The divergences could be pertained to linguistic, social, cultural (Vermeer, 1987 & Goodenough, 1964), religious and other knowledge paradigms as translating a given language text encapsulates representing all of these paradigms into the TL text. Anton Popovic (1976) has identified four broader types of equivalence in translation: linguistic, paradigmatic, stylistic and textual. Dorr has classified divergence into two basic types: syntactic and lexical-semantic. In order to deal with the concept of equivalence, Popovic’s theoretical model has been considered. So far as divergence is concerned, we have devised our own theoretical model. With regard to methodology, we have applied 1 thousand corpus of English sentences for this research study and analysed the translated Urdu output considering different areas of translational equivalence, gain and loss on web-based Machine Translation platforms such as Bing and Google Translate. We presume that gain and loss emerge as a huge issue which may owe their genesis from socio-linguistic, cultural, religious and anthropological factors.
1. Introduction: Equivalence on one side and gain and loss on the other are contradictory to each other and are quite well-known in the field of translation studies. Achieving complete equivalence in translation (manual or machine) is next to impossible because of divergent linguistic patterns between the SL and TL text. This study focuses on both manual and machine translation and is applicable to theoretical and applied translation. A corpus of 1k English ILCI sentences has been provided as input to Google and Bing translates and data (in Urdu) has been crawled in bulk for observation, generalization and analysis. The outcomes of this study would prove to be beneficial for the enhancement of the accuracy of both of the aforementioned platforms. Below are the snapshots for Google and Bing translates.
Figure 1. Google Translate
Figure 2. Bing Translate
Affiliation: *Research Scholar, Dept. of Linguistics, Aligarh Muslim University, Aligarh, India ** Research Scholar, Centre for Linguistics, Jawaharlal Nehru University, New Delhi, India
2. Equivalence: Eugene Nida (1964, 1969) categorizes equivalence into two types, i.e. formal and dynamic. As far as formal equivalence is concerned, there is complete correspondence between the SL and TL texts with regard to both form and structure (e.g. sentence-to-sentence, word-for-word and concept-to-concept). It further attempts to convey as much information about the SL text as is feasible. A faithful translation is characterized by formal equivalence between the two texts. On the other hand, dynamic equivalence aims at having or recreating a similar relationship between the reader/listener and the text. Both forms of equivalence are still relevant in translation in spite of their merits and demerits. Equivalence is considered to be a missing link between the dynamic model (process-oriented) and the static model (product-oriented) (Neubert, 1985). A mathematical representation of the logic behind the equivalence is as follows which refers to the binary truth-function which takes the true value when the SL and TL texts are true or false. Symbol: ≡ or ↔, as in - (p ^ q) 2.1. Types of Equivalence: Anton Popovic (1976) has identified four broader types of equivalence in translation. They are vividly discussed as follows. In addition, they have also been quantifiably mapped on both the platforms from English to Urdu. 2.1.1.
When there is word-word translation there is equivalence/similarity/identicality/homogeneity between
languages (SL and TL). In the examples instantiated below, ‘football practice’ is translated in Google as ‘fooTball kI meshk’ and ‘football practice’ in Bing. Google translates typical Urdu words while Bing borrows the same phrase from the English input sentence. For instance, English: Ben goes to football practice every Tuesday. Urdu Google: ben fooTball kI meshk har mangal ko jaataa hai Football practice Urdu Bing: ben fooTball practice ke liye har mangal ko ʝaataa hai Football practice 2.1.2.
It refers to the similarity in the grammatical structures between the two texts. André Lefevere (1976) has emphasized on preserving the structures of the SL text as closely as possible but not so closely that the TL structures are distorted. To his opinion, translation as a process should be syntax-oriented. In the examples mentioned below, the prepositional phrase ‘for cosmetic surgery’ is mapped into Google and Bing with the postpositional phrase ‘kasmeTik sarjari ke liye’ in Urdu. For example, English: We produce lasers for cosmetic surgery. Urdu Google: həm kasmeTik sarjari ke liye lasers ke paidaa karte hai.N Cosmetic surgery for Urdu Bing: həm lasers kasmeTik sarjari ke liye paidaa kar rahe hai.N Cosmetic surgery for
It suggests the similarity in the perceived meaning or its influence on the readers’ mind conveyed through the translated message. In other words, if there is functional equivalence of elements in both original and translation- aiming at an expressive identity with the invariant of identical meaning. The idiomatic or multiword expressions are quite crucial for both manual and machine translation as one needs to consider the socio-cultural milieu of a given language. Google translates the English idiomatic sentence perfectly taking meaning and socio-cultural context into consideration while Bing has translated it word to word and thereby making it wrong. For example, English: Son is the apple of mother’s eyes. Urdu Google: beTaa maa.N ki aankho.N kaa sitaaraa hai Urdu Bing: beTaa maa.N ki aankho.N kaa sev hai “In translation, there is substitution of TL meanings for SL meanings: not transference of TL meanings into the SL”- J.C. Catford (1965). In transference, there is an implantation of SL meanings into the TL text. These two processes of substitution and transference must be clearly differentiated in any theory of translation. Lefevere (1976) has emphasised on the approximation of meanings between the SL and TL texts. But to encode the semantic aspect of linguistics to the machine is a daunting task and it seems relevant for MT having input and output texts that are quite divergent in nature linguistically. 2.1.4.
Textual or syntagmatic equivalence:
It takes into consideration the similarity in the organizational structure and forms of the texts. To put forth differently, if there is equivalence of the syntagmatic structuring of a text, i.e. equivalence of both form and shape, it is known as textual equivalence. Keeping the form and shape of both the texts while translating is little difficult which results in collapsing of the translation output. For instance, English: She planned the event all by herself. Urdu Google: wo tamaam khud kI taraf se wagiyaa kI mansubaa bandI kI Urdu Bing: unho.N-ne taqrIb kI taraf se sab-ne khud kaa mansubaa banaayaa 2.2. Comparison of Equivalence on Google and Bing: The statistical data (see fig. 3) demonstrated below represents the rate of equivalence in Google and Bing MT platforms at four major levels: linguistic, paradigmatic, stylistic and textual. The highest rate of equivalence is registered in the category of linguistic equivalence whereas the lowest rate is figured in the stylistic equivalence in both of the said platforms. In paradigmatic equivalence, Google registers 33% accuracy rate which is around 2% higher in comparison to the Bing translate. Similarly, Bing figures 3% lower accuracy rate in the section of textual equivalence than Google. The analysis of the results obtained from both the platforms demonstrates the fact that Bing translates English to Urdu texts word-word as a result of which it achieves linguistic equivalence higher than Google. So far as other categories are concerned, there is not much difference between both. With regard to the stylistic equivalence, one point which is worthy to be mentioned here is that both of the platforms collapse in translating the multiword expressions which include pair and compound words; reduplicated, abbreviated, idiomatic expressions and so on. It is quite natural that machine can hardly be made to comprehend, process, parse and translate higher level of syntax and semantics for best
translations. Overall, Google performs better than Bing in all areas of equivalence with exception to the category of linguistic equivalence.
Equivalence Rate in %
30 25 20
5 0 Google
Figure 3 Equivalence Rate in Google and Bing 3. Gain and Loss: "Every translation entails a loss by comparison with the original" Wolf Harranth (1991) When a text or communication in one code is translated into another, it is indispensable that something is gained while some elements are lost which results in miscommunication. Therefore, it is next to impossible to achieve complete equivalence. The issue of loss and gain owes its genesis to the cultural dissimilarity and divergent linguistic structures between two linguistic communities. The more the structures are divergent, the more the translation becomes error-prone. As a result, it is indispensable for any MT platform to observe and incorporate the divergent patterns between a pair of languages so that the accuracy could be enhanced and translated more correctly. 3.1. Motivation: So far as the translation of literary text is concerned gain and loss are different in nature in contrary to the other genre of texts, for instance, technical texts, natural language texts and so on. The gain and loss in the literary texts do owe much to the figurative usage of the two encoded and decoded languages involved in the process of translation. By suggesting the figurative usage of the language, we refer to the very fact that figures of speech such as simile, metaphor, irony, paradox, humor, word-play, metonymy, synecdoche and so on are abundantly employed to obtain the flamboyance and sublimity in the language and its impact. The issue of ‘gain’ crops out when an overenthusiastic translator inadvertently over-translate the concerned text at hand. Divergence is an umbrella terminology exploited in the area of Machine Translation to cover both the gain and loss which is responsible for the inefficiency of the systems. Translation refers to the process of translation from a source language (SL) to a target language (TL) applying all the meta-linguistic-contextual knowledge by a human translator considering almost all the factors into account. Therefore, it is quite obvious and natural that
there are less linguistic erroneous patterns in human translation as opposed to an automated MT output which translates with the assistance of computers. It emerges due to the parametric variations between languages involved in the process. According to Dorr (1993), “translation divergence arises when the natural translation of one language into another results in a very different form than that of the original.” As discussed in Muzaffar et al., (2016b), English is a European language while Urdu belongs to the Indo-Aryan (IA) group of languages. There are many mutually incongruent features related to morphology, syntax, semantics, and discourse. In consonance with the IA languages, Urdu has an enriched morphology and allows scrambling as a syntactic process. Contrastingly, English has a weak morphology and fixed word-order. Expletives ‘it’ and existential ‘there’ subjects are quite commonly applied in English which are not true to Urdu language. In addition, Urdu has lexically marked honorifics in the verbs whereas the English counterpart does not have so. This divergence could be ascribed to the cultural dissimilarity between the two cultures. Besides, there are some instances of gain and loss from natural language text as explicated. 4. Instances of Gain and Loss: This section has been divided into two broad sub-sections: Linguistic and Cultural. 4.1. Linguistic 4.1.1.
English is not a free word-order language as it doesn’t allow scrambling unlike Urdu and other Indian languages. That is to say, if one tweaks the order of subject, object and verb, the sentences in all cases will read meaningfully. English, on one hand, follows a rigid configurational pattern of word order SVO whereas, on the other, Urdu has different acceptable orders like SOV, SVO and OVS (Muzaffar et al, 2016b) as in the following examples. For Example, (Eng) Qasim is feeding the baby. S V O (Urdu 1)Qasim bacche ko khilaa rahaa hai. S O V (Urdu 2) bacche ko khilaa rahaa hai Qasim O V S (Urdu 3) Qasim khilaa rahaa hai bacche ko S V O 4.1.2.
Gerunds and Participles:
Gerundive and participial constructions are really crucial for gain and loss processes. In all the below exemplified instances, one can observe that the gerunds and participles, both adjuncts and complements, having to + infinitive constructions are realized by different structures. Therefore, these sorts of parametric structures between a pair of languages need to be considered between English and Urdu. For instance, (Eng) To do (doing) exercise is good for health (Urdu) warzish karnaa sehad kI behatarI ke liye achhaa hai. Exercise to do health of improvement for good be-PRS.IMPFV (Eng) He is not able to do this. (Urdu) wah yah karne ke qaabil nahI.N hai.N He-3MSG.NOM it doing of able not be-3MSG.PRS.IMPFV.HON. (Eng) we would like to read. (Urdu) hum pa.Dhanaa chaahate hai.N (Sinha and Thakur, 2005) 5
We-1PL.NOM to read want-1PL.IMPFV be-1PL.PRS (Eng) They have come to serve you. (Urdu) wo log aapke khidmat mei.N haazir hai.N Those people your service in present be-3PL.PRS.IMPFV. 4.1.3.
Mapping have-verbs in Urdu:
Some have sentences and sentences with copular verbs in English are quite difficult to map into Urdu. Sentences with have-verbs in English have first person singular subject ‘I’ and first person singular subject ‘he’ with no case inflections and the have verbs inflecting with tense. Contrarily, Urdu sentences have the same subjects with some oblique case endings and agree with the objects. In addition, the copular verbs have the morphological forms identical to the counterparts of English have verbs in Urdu. For instance, (Eng) He has courage. (Urdu) usme
He-LOC ability-3FSG.NOM have-3SG.PRS.IMPFV (Eng) I have three watches. (Urdu) mere paas tIn gha.Diyaa.N
I with three watches-3FPL.NOM have-3FPL.PRS.IMPFV 4.1.4.
Optative Mood Constructions:
The optative constructions contain two clauses: independent and dependent. The former contains the finite verb whereas the latter contains the non-finite verb. The Urdu verbs in the subordinate clause get the inflectional markers for person, number and gender of the subject. On the other hand, the verb forms remain constant in English counterpart and maintain the root forms in some cases as in the following example. The first instance is an exception as it is passive sentence with an optative mood. For instance, (English) I want that my letter be sent to me. (Urdu) Mai.N chaahataa
meraa khat mere hawaale kiyaa jaaye.
I-1SG.NOM want-1MSG.IMPFV do-1MSG.PRS my letter my control
(English) We want that Rahim succeed. (Urdu) hum log chahte
hEN ki rahim kaamyaab ho.
We-1PL.NOM want-1PL.IMPFV do-1PL.PRS succeed be 4.1.5.
Conjunct verbs are those which consist of a noun or an adjective followed by verb. In Hindi and Urdu, conjunct verbs are formed by combining a noun or an adjective with a verb and “semantically denote an action or a process or a state” as a complete whole (Begum, 2011; Muzaffar et al, 2015 & 2016). The most frequent verbalizers in Urdu are /karanA/ ‘to do’, /honA/ ‘to be’, /denA/ ‘to give’, /lenA/ ‘to take’, /AnA/ ‘to come’ and so on (Muzaffar et al, 2016). For example,
(English) I helped Raam. (Urdu) Maine raam kI madad kI I-1SG.ERG Ram GEN help-3FSG.PST.PRFV 4.1.6.
“In Indian languages verbs agree with both the subject and the object; provided some conditions are fulfilled” (Behera et al., 2016). In Hindi (Jha et al., 2014), Urdu (Muzaffar et al., 2015; Muzaffar & Behera, 2014) and Marathi, the oblique (both ergative and non-nominative) sentences do usually have object-verb agreement. In the instance exemplified in the following, the verb agrees with the person, number and gender of the object ‘naak’. For instance, (English) Afrin has made us ashamed. (Urdu) AfarIn-ne naak kaTwaa dI Afrina-3FSG.ERG nose-3FSG cut-CAUS. Make-3FSG.PST.PRFV 4.2. Socio-cultural: These factors completely dependent on the societal and cultural norms that are reflected on the linguistic aspects. Both these factors have been adapted from (Sinha & Thakur, 2005) 4.2.1.
In Urdu, the honorific features are marked by the pluralization of the verb. There are also specific morphological forms of pronominal elements that are honorific in nature. These above discussed features are not marked in English counterpart. (English) My father has arrived. (Urdu) Mere waalid aa chuke hEN My father-3MSG.NOM. come have-HON be-3MSG.PRS.HON 4.2.2.
Mapping of Time
“Usually, people’s perception of different objects in the world is dependent upon several sociocultural beliefs. For instance, time is conceptualized in the Indian culture differently than that is done in the Western culture” (Sinha & Thakur, 2005). The concept of a.m. and p.m. can hardly be mapped into Urdu as exactly as required. In all the below mentioned examples, a.m. can cover ‘fajr’ while p.m. covers the temporal words such as ‘zohar’, ‘asar’, ‘maghrIb’ and ‘ayeshaa’. (English) He came around 5 am. (Urdu) wo fajra ke waqt aayaa. (English) He came around 1 pm. (Urdu) wo zohar ke waqt aayaa. (English) He came around 5 pm. (Urdu) wo asar ke waqt aayaa. (English) He came around 6:30 pm. 7
(Urdu) wo maghrib ke waqt aayaa. (English) He came around 8:00 pm. (Urdu) wo aeshaa ke waqt aayaa. 5. Conclusion: In this paper, we have dealt with the concepts of equivalence, gain and loss in terms of English to Urdu language pairs by classifying and analyzing the data from Google and Bing. The rationale for taking on equivalence along with gain and loss is to observe the cases where structures of both SL and TL are similar and where they are divergent. For equivalence, we have adhered to the framework as provided by Popovic and for gain and loss, we have categorized on our own framework. This analytical study would prove to be fruitful in terms of building machine translation platforms more efficient as it conducts a detailed analysis of what kinds of linguistic patterns can complicate translation process. From this above discussion, it can, however, be averred that language and culture play an eminent role in deciding the margin of gain and loss between the SL and TL. Acknowledgements: We are hugely indebted to the Google and Bing web-based MT platforms for the translation of English to Urdu texts. References: 1. As-Safi, A. B. (2006). Loss & Gain and Translation Strategies with Reference to the Translations of the Glorious Qur’an. Atlas Stud. Res. 2. Bassnett, Susan. Translation Studies. (1980). Revised edition 1991. London: Routledge. 3. Begum R., Jindal K., Jain A., Husain S., and Sharma D. M. (2011), Identification of conjunct verbs in hindi and its effect on parsing accuracy, Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, 29-40. 4. Behera, P., Maurya, N., & Pandey, V. (2016). Dealing with Linguistic Divergences in EnglishBhojpuri Machine Translation. Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing, ACL, pp. 103–113, Osaka, Japan. 5. Behera, P., Muzaffar, M., Ojha, A. K., & Jha, G. N. (2016a). The IMAGACT4ALL Ontology of Animated Images: Impli-cations for Theoretical and Machine Translation of Action Verbs from English-Indian Languages. Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing, ACL, pp. 64-73, Osaka, Japan. 6. Dorr, B. (1993). Machine Translation: a View from the Lexicon. The MIT Press, Cambridge, Mass. 7. Dorr, B. (1994). Classification of Machine Translation Divergences and a Proposed Solution. Computational Linguistics 20(4):597-633. 8. Dave, S., Parikh, J., & Bhattacharya, P. (2001). Interlingua-based English-Hindi Machine Translation. Journal of Machine Translation, 16(4), 251-304. http://dx.doi.org/10.1023/A:1021902704523 retrieved on 11.10.2015. 9. Dash, N. S. (2013). Linguistic Divergences in English to Bengali Translation, International Journal of English Linguistics; Vol. 3, No. 1; 2013. 10. Gautam, T. R. (2012). Loss and gain in translation from Hindi to English: a stylistic study of multiple English translations of Premchand’s Godaan and Nirmala. 11. Gupta, D. and Chatterjee, N. (2003). Identification of Divergence for English to Hindi EBMT. In Proceeding of MT Summit-IX, pp. 141-148. 12. Harranth, W. (1991). Das Ãœbersetzen von Kinder- und Jugendliteratur." JuLit Information no. 1:23-27.
13. Jha, G. N., Hellan, L., Beermann, D., Singh, S., Behera, P., and Banerjee, E. (2014). Indian Languages on the TypeCraft Platform–The Case of Hindi and Odia. In WILDRE-2, LREC2014. 14. Muzaffar, S. and Behera, P. (2014). Error Analysis of the Urdu Verb Markers: A Comparative Study on Google and Bing Machine Translation Platforms. Aligarh Journal of Linguistics (ISSN- 2249-1511), 4 (1-2), pp 199-208. 15. Muzaffar, S., Behera, P., Jha, G. N., Hellan, L., & Beermann, D. (2015). TypeCraft Natural Language Database: Annotating and Incorporating Urdu. Indian Journal of Science and Technology (ISSN-0974-5645), 8(27). 16. Muzaffar, S., Behera, P., & Jha, G. N. (2016). Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform. LREC-2016. 17. Muzaffar, S., Behera, P., & Jha, G. N. (2016a). A Pāniniān Framework for Analyzing Case Marker Errors in English-Urdu Machine Translation. Procedia Computer Science, 96, 502510. 18. Muzaffar, S., Behera, P., & Jha, G. N. (2016b). Classification and Resolution of Linguistic Divergences in English-Urdu Machine Translation. WILDRE: LREC. 19. Online Machine Translation System, The Bing Translator by Microsoft Inc. http://www.bing.com/translator/ retrieved on 14.10.2015. 20. Online Machine Translation System, The Google Translate by Google Inc. https://translate.google.co.in/ retrieved on 14.10.2015. 21. Saboor, A. & Khan, M.A. (2010). Lexical-semantic Divergence in Urdu-to-English Example Based Machine Translation, 6th International Conference on Emerging Technologies (ICET), pp. 316-320. 22. Shaheen, M. (1991). Theories of translation and their applications to the teaching of English/Arabic-Arabic/English translating (Doctoral dissertation, University of Glasgow). 23. Shukla, V., & Sinha, R. M. K. (2011). Divergence patterns for Urdu to English and English to Urdu Translation. In Human-Machine Interaction in Translation: Proceedings of the 8th International NLPCS Workshop (Vol. 41, p. 21). Samfundslitteratur. 24. Sinha, R. M. K., & Thakur, A. (2005). Divergence patterns in machine translation between Hindi and English. 10th Machine Translation summit (MT Summit X), Phuket, Thailand, 346353. 25. Syalies, F. N. (2016). A Loss and Gain in Equivalence Analysis of Noun Phrases in Strawberry Shortcake Bilingual Series Dandanan Kacau Makeover Madness. 26. Venuti, Lawrence. (2000). Ed. The Translation Studies Reader. London: Routledge.