When parsing isn't about parsing

🕑 7 min • 👤 Thomas Graf • 📆 June 18, 2020 in Discussions • 🏷 syntax, morphology, parsing, formal language theory, movement

As a student I didn’t care much for work on syntactic parsing since I figured all the exciting big-picture stuff is in the specification of possible syntactic structures, not how we infer these structures from strings. It’s a pretty conventional attitude, widely shared by syntacticians and a natural corollary of the competence-performance split — or so it seems. But as so often, what seems plausible and obvious at first glance quickly falls apart when you probe deeper. Even if you don’t care one bit about syntactic processing, parsing questions still have merit because they quickly turn into questions about syntactic architecture. This is best illustrated with a concrete example, in that abstract sense of “concrete” that everyone’s so fond of here at the outdex headquarters.


Continue reading

Semantics: Corrections and further thoughts

🕑 6 min • 👤 Thomas Graf • 📆 January 08, 2020 in Discussions • 🏷 semantics, donkey sentences, parsing

This is a follow-up to my previous post on semantics. It has been pointed out to me that this post contains several inaccuracies and grave omissions. Some of them are in the summary of Lucas’ talk, and that would probably have been noticed earlier if I had provided a link to the slides or the paper. Thanks to Lucas for sending me those by email and for walking me through the account again. I’ll briefly explain some of the misleading points later on in this post.

But the much bigger issue is that I failed to point out that Lucas wasn’t just presenting his own work. He made it very, very clear that this was joint work with Dylan Blumford (UCLA) and Robert Henderson (UArizona). I’m really upset with myself about that one, in some sense giving partial credit is even worse than giving no credit at all, and the latter is already a dick move. My sincerest apologies to Dylan and Robert.

If I had run the post past Lucas before publishing it, a lot of this could have been avoided, so I’ll make that a priority for future posts that talk about work that I’m not well-acquainted with. Alright, so let’s talk a bit what I got wrong and how that affects the central message of the previous post.


Continue reading

Semantics should be like parsing

🕑 5 min • 👤 Thomas Graf • 📆 December 28, 2019 in Discussions • 🏷 semantics, donkey sentences, parsing

I spent a few days before Christmas at the Amsterdam colloquium, which exposed me to a much heavier dose of semantics than I’m used to. I’ve always had a difficult relation with semantics. On the one hand I like that it has its fair share of KISS theories, and generalized quantifier theory is aesthetically very pleasing to me. On the other hand most of semantics is pretty dull, and I think that’s because semanticists put way too much stuff in their theories that has nothing to do with natural language semantics. I’ve previously had a hard time putting this into concrete terms, but Lucas Champollion’s invited talk on donkey sentences finally presented me with a specific example.


Continue reading

News from the MG frontier

🕑 3 min • 👤 Aniello De Santo • 📆 June 24, 2019 in Discussions • 🏷 MGs, parsing, NLP

True to my academic lineage, I’m a big fan of Minimalist grammars (MGs): they are a pretty malleable formalism, their core mechanisms are very easy to grasp on an intuitive level, and they are close enough to current minimalist syntax to allow for interesting computational insights into mainstream syntax. However, I often find that MGs’ charms don’t work that well on my more NLP-oriented colleagues — especially when compared to some very close cousins like TAGs or CCGs. There are very practical reasons for this, of course, but two in particular come to mind right away: the lack of any large MG corpus (and/or automatic ways to generate such corpora) and, relatedly, the lack of efficient, state-of-the-art, probabilistic parsers.

This is why I’m very excited about this upcoming paper by John Torr and co-authors (henceforth TSSC), on a (the first ever?) wide-coverage MG parser. The parser is implemented by smartly adapting the \(A^*\) search strategy developed by Lewis and Steedman (2014) for CCGs to MGs (basically, a CKY chart + a priority queue), and coupling it with a complex neural network supertagger trained on an MG treebank.


Continue reading