Tool Gallery 8.1.: Spoken Corpus Linguistics and Open Access: Usability and Technology of VOICE 3.0 Online
Open access (OA) is a gold standard for any language corpus. Yet, after a project’s completion, it is often challenging to keep open-access web applications ‘alive’ long-term, despite the fact that the compilation of (spoken) corpora is time- and cost-intensive. The Tool Gallery 8.1. addresses this challenge by sharing insights of the development and usability of the new web application for the Vienna-Oxford International Corpus of English (VOICE, first released in 2009), developed recently in the VOICE CLARIAH project (2020-2021).
The first day of the Tool Gallery is targeted at researchers, PhD candidates and advanced students interested in working with and analyzing transcribed spoken data. We will introduce VOICE, a one-million-word corpus of spoken English as a lingua franca (ELF) interactions, and then engage with its usability as an open-access tool for linguistic research. We will discuss specific properties of spoken corpora (such as field work, data collection, detailed transcription, conversational mark-up, and metadata) and provide an in-depth introduction of the new VOICE 3.0 Online OA web interface and its functionalities through numerous hands-on activities.
Programme Day 1:
14.00-14.10 Welcoming words
14.10-14.40 Spoken corpora and the challenge of long-term open access: The case of VOICE
14.40-15.00 Introducing VOICE: Corpus structure and text properties
15.00-15.10 The VOICE CLARIAH project: Developing VOICE 3.0 Online
15.10-15.30 Coffee break
15.30-16.00 Introducing VOICE 3.0 Online
16.00-17.00 Hands-on activities VOICE 3.0
17.00-17.15 Closing discussion
The second day of this ACDH-CH Tool Gallery takes a look behind the scenes: it focuses on the OA technologies used and developed for the new VOICE 3.0 Online web interface. We introduce key properties of VOICE 3.0 XML, outline the process of setting up a local NoSketch Engine to run queries, and provide details on technology stacks and OA software packages. The second day of this Tool Gallery is targeted primarily at researchers, PhD candidates, advanced students and programmers with an interest in building OA web applications for language corpora and related resources. Some technological expertise in corpus linguistics, web design, XML technologies or software development may be advantageous, but is not a prerequisite.
The ACDH-CH Tool Gallery will end with a closing panel on Day 2 where core members of the VOICE CLARIAH project team will answer questions related to project management and implementation, interdisciplinary collaboration and the challenges of planning for long-term OA availability.
Programme – Day 2:
10.15-10.45: VOICE 3.0 XML and NoSketch Engine
10.45-11.15: The technological infrastructure behind VOICE 3.0 Online
11:15-11.25: Demo: Applying VOICE technologies to other data
11.25–11.45: Discussion: OA technologies of VOICE and re-usability
11.45-12.15: Coffee break
12.15-13.00: Q&A and closing panel: Challenges of long-term OA for corpora