Forensic linguists, amongst others, have a strong interest in plagiarism detection (Angélil-Carter, 2000; Coulthard & Johnson, 2007; Hänlein, 1998; Lobo, 2003; Semple, Kenkre, & Achilles, 2004) but there is relatively little research attention on bilingual plagiarism. The borderline of plagiarism is both dependent on its definition and on the author’s intention, as much as it is on the text genre: the usage of large amounts of text by journalists with little or no attribution at all, for instance, does not seem to be usually regarded as plagiarism (Coulthard & Johnson, 2007). However, although the conventions/regulations regarding use of newswire copy are not universal, agencies require that the source be credited, and forbid the use of ‘authored articles’.
Detecting verbatim copying of news agencies words is easy and straightforward. However, plagiarism detection requires more sophisticated techniques when news items are plagiarised in languages other than English (e.g. Portuguese), where journalists tend to translate the text intuitively into their mother tongue and make adjustments, while retaining a structure that is more similar to the English counterpart than to the other news sections.
To investigate which mechanisms journalists use to write ‘their own’ texts from news agencies texts (and how they use them), we selected news pieces from the ‘World’ section of Portuguese quality newspapers and compared them to possible English sources. To do a suitable contrastive analysis, we created a comparable/translation corpus ("LREC 2008 Workshop on Comparable Corpora," 2008; McEnery & Wilson, 1996) using the Corpógrafo (a web-based environment for the creation and analysis of personal corpora) (Sarmento, Maia, & Santos, 2004).
We then investigated how translation is usually done by journalists and how (and when) authorship attribution is made explicit, and questioned how much unacknowledged journalistic text can be accepted without being called plagiarism, challenged by the news agencies and proceed to trial. The results obtained so far show that, even though quality papers may cite their sources (usually well-known international agencies), attribution is often inadequate, and there is not a one-to-one match between the Portuguese and the English versions, i.e. the same piece of news often includes different releases from the foreign press and websites. Applications of this investigation to more forensic contexts will be discussed.
References
Angélil-Carter, S. (2000). Stolen Language? Plagiarism in Writing. Harlow: Longman.
Coulthard, M., & Johnson, A. l. (2007). An Introduction to Forensic Linguistics: Language in Evidence. Londres e Nova Iorque: Routledge.
Hänlein, H. (1998). Studies in Authorship Recognition - A Corpus-based Approach Francoforte: Peter Lang.
Lobo, R. A. (2003). Plagiarism Revisited. Journal of the Society for Gynecologic Investigation, 10, 389-389.
LREC 2008 Workshop on Comparable Corpora (2008). Retrieved 02/11/2008, from
http://www.limsi.fr/~pz/lrec2008-comparable-corporaMcEnery, T., & Wilson, A. (1996). Corpus Linguistics: An Introduction (Second Edition ed.). Edinburgo: Edinburgh University Press.
Sarmento, L., Maia, B., & Santos, D. (2004). The Corpógrafo - a Web-based environment for corpora research.
Semple, M., Kenkre, J., & Achilles, J. (2004). Student fraud: The need for clear regulations for dismissal or transfer from healthcare training programmes for students who are not of good character. Nursing Times Research, 9(4), 272-280.