A computer corpus of English as a lingua franca

The most wide-spread contemporary use of English throughout the world is that of English as a lingua franca (ELF), i.e. English used as a common means of communication among speakers from different first-language backgrounds. A Hungarian educationalist coming to Copenhagen to discuss qualification equivalences in European higher education with her Danish, Finnish and Portuguese colleagues; a Korean sales representative negotiating a contract with his German client in Luxembourg; a Spanish Erasmus student chatting with local colleagues in a student hall in Vienna: they all communicate in English as a lingua franca.

VOICE, the Vienna-Oxford International Corpus of English, is a structured collection of language data, the first computer-readable corpus capturing spoken ELF interactions of this kind.

VOICE, compiled at the Department of English at the University of Vienna, is funded by the Austrian Science Fund (FWF). These funds were further supplemented by a contribution from Oxford University Press in 2008. Supporting funds were also provided in the early pilot phase by Oxford University Press and by the Hochschuljubiläumsstiftung der Stadt Wien. The corpus currently comprises 1 million words of transcribed spoken ELF from professional, educational and leisure domains.

It is the ultimate aim of the VOICE project to open the way for a large-scale and in-depth linguistic description of this most common contemporary use of English by providing a corpus of spoken ELF interactions which will be accessible to linguistic researchers all over the world.

The widespread use of ELF in the world and the availability of a description of its linguistic characteristics are likely to have considerable implications for the way objectives of English teaching might be defined. It is important to stress, however, that a consideration of such pedagogic implications is not within the scope of the VOICE project itself.

Corpus description

VOICE comprises transcripts of naturally occurring, non-scripted face-to-face interactions in English as a lingua franca (ELF). The recordings made for VOICE are keyboarded by trained transcribers and stored as a computerized corpus. Currently VOICE comprises 1 million words of spoken ELF interactions, equalling approximately 120 hours of transcribed speech. In addition, 23 recordings of transcribed speech events can also be listened to.

The speakers recorded in VOICE are experienced ELF speakers from a wide range of first language backgrounds. So far, VOICE includes approximately 1250 ELF speakers with approximately 50 different first languages (disregarding varieties of the respective languages). In the initial phase, VOICE focuses mainly, though not exclusively, on European ELF speakers.

The ELF interactions recorded cover a range of different speech events in terms of domain (professional, educational, leisure), function (exchanging information, enacting social relationships), and participant roles and relationships (acquainted vs. unacquainted, symmetrical vs. asymmetrical).They are classified into the following speech event types:

  • interviews
  • press conferences
  • service encounters
  • seminar discussions
  • working group discussions
  • workshop discussions
  • meetings
  • panels
  • question-answer-sessions
  • conversations
Scroll Up