IJCoL (Jun 2016)

Quantitative computational syntax: some initial results

  • Paola Merlo

DOI
https://doi.org/10.4000/ijcol.347
Journal volume & issue
Vol. 2, no. 1

Abstract

Read online

In the computational study of human intelligence, the language sciences are in the unique position of resting both on sophisticated theories and representations and on large amounts of observational data available for many languages. In this paper, we discuss some recent results, where large-scale, data-intensive computational modelling techniques are used to address fundamental linguistic questions on the quantitative properties of abstract grammatical representations. Specifically, we present a programme of research exemplified in three case studies to identify the causes of frequency differentials. In the area of word order, we discuss work that investigates whether typological and corpus frequencies are systematically correlated to abstract syntactic structures and to higher-level structural principles of minimisation and efficiency. In the area of verb meaning, corpus-based computational models are discussed that investigate how frequencies are correlated to well-known lexical effects in causative alternations and morphological marking. The large corpus-based, cross-linguistic component of the work and the abstract grammatical hypotheses on word order and verb meaning provide new empirical and computational evidence to the important debate on language variation, its extent and its limits and illustrate how to bring corpus-based computational methodology to bear on theoretical syntactic issues. In so doing, we help reduce the current gap between theoretical and computational linguistics.