CHALLENGES IN BUILDING WRITTEN AND SPOKEN COMPARABLE CORPORA IN CORPUS LINGUISTICS
Keywords:
ICC corpus; contrastive linguistics; comparable corpus; ICE corpus; data sustainability; copyright.Abstract
This article covers challenges while creating written and spoken comparable corpora. Moreover, it suggests possible solutions to some problems on the creation and use of various linguistic corpora.
References
Aijmer, Karin and Bengt Altenberg eds. 2013. Advances in Corpus-based Contrastive Linguistics: Studies in Honour of Stig Johansson. Amsterdam: John Benjamins.
Bański, Piotr, Joachim Bingel, Nils Diewald, Elena Frick, Michael Hanl, Marc Kupietz, Piotr Pęzik, Carsten Schnober and Andreas Witt. 2013. KorAP: The new corpus analysis platform at IDS Mannheim. In Zygmunt Vetulani and Hans Uszkoreit eds. Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of the 6th Language and Technology Conference. Poznan: Uniwersytet im. Adama Mickiewicza w Poznaniu, 586–587.
Calzolari, Nicoletta, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asunción Moreno, Jan Odijk and Stelios Piperidis eds. 2016.