Using Large-Scale Text Collections for Research

Chair: Karina van Dalen-Oskam, karina.van.dalen@huygens.knaw.nl

ICT tools and methods, such as information retrieval and extraction methods (including, for example, text and data mining), can reveal new knowledge from large amounts of textual data, extracting hidden patterns by analysing the results and summarising them in a useful format. This working group will examine practices in this area, building on the work of corpus linguistics and related disciplines to develop a greater understanding of how large-scale text collections can be used for research.

The working group's first activity was the workshop, ‘Using Large-­‐Scale Text Collections for Research: Status and Needs’ First workshop of the NeDiMAH working group Using Large-­‐Scale Text Collections for research at Huygens ING, The Hague, Netherlands.  The meeting was used to assess the availability of text corpora for researchers from different disciplines in the participating countries and languages. How large are the available corpora? For what purposes were they created? What kinds of mark-­‐up do they contain? And which tools are available to help mining the corpora? What is missing in both texts and tools to make the corpus also useful for other research disciplines than the one it was originally created for?  The meeting was attended by participants/speakers from Portugal, the Netherlands, Germany, UK, Romania and Croatia.

On April 1 and 2, 2014, Christoph Schoech convened the second and third workshops of the Working Group at Wurzburg, Germany.  The first of these workshops, Corpora, focussed on the interface between linguistic annotation and textual annotation for historical and literary research and aimed to bring together corpus builders and corpus users other than linguists.  The second of the workshops, Research, focussed on new kinds of analysis of large text corpora explicitly from the perspective of literary or historical research questions.  The workshops were well attended, with 19 speakers/participants from Luxembourg, Germany, Ireland, Poland, Croatia, United States, Netherlands and UK.  

Publications

All of the NeDiMAH working groups will produce publications related to their activities.  A full list of publications and forthcoming publications will appear on this page in November 2014.