KISSing syntax

🕑 7 min • 👤 Thomas Graf • 📆 July 12, 2019 in Discussions • 🏷 methodology, syntax

Here’s a question I first heard from Hans-Martin Gärtner many years ago. I don’t remember the exact date, but I believe it was in 2009 or 2010. We both happened to be in Berlin, chowing down on some uniquely awful sandwiches. Culinary cruelties notwithstanding the conversation was very enjoyable, and we quickly got to talk about linguistics as a science, at which point Hans-Martin offered the following observation (not verbatim):

It’s strange how linguistic theories completely lack modularity. In other sciences, each phenomenon gets its own theory, and the challenge lies in unifying them.

Back then I didn’t share his sentiment. After all, phonology, morphology, and syntax each have their own theory, and eventually we might try to unify them (an issue that’s very dear to me). But the remark stuck with me, and the more I’ve thought about it in the last few years the more I have to side with Hans-Martin.

Everything and the kitchen sink

Once we look at individual subareas in linguistics, modularity is sorely missing. This goes particularly for syntax (or maybe it’s a problem in every subarea and I only notice it in syntax because that’s where I do most of my linguistic reading). No matter what phenomenon is studied, it always comes with the baggage of the full formalism. In the case of Minimalism this means Merge, Move, phases, Agree, features, bare phrase structure, lack of linear order, constituency, that subjects move from Spec,vP to Spec,TP, and so on. Pick a random syntax paper of your choice and take a tally of how many pieces of Minimalist machinery are involved. And then compare that against how many of those assumptions actually matter for the paper. The discrepancy is staggering.

Now contrast this with, say, this recent Quanta article on modelling insect swarms as fluids. Or if you prefer a more detailed reading, here’s the actual paper. You have a very narrowly defined problem with a very clean formal apparatus for just this problem. Yes, the formalism is borrowed from other domains, but it is also self-contained. There aren’t half a dozen principles that need to be worked around because they were required for phenomenon X but make things more difficult here. The authors didn’t start with a general theory of animal behavior and then tried to squeeze in swarms. They handcrafted a unique tool for a unique problem. At least to my layman eyes, their account is maximally KISS.

KISS is short for Keep It Simple, Stupid!: simple, independent components, that are easily combined to handle complex tasks. Those of you who are familiar with UNIX-style command line interfaces know exactly what I mean. A complex task like monitoring the size of a user’s home directory in real-time is carried out by combining watch, df, and grep:

watch -n1 'df -h | grep home'

How exactly this works isn’t the point here. What matters is that each component is simple, crafted for one specific purpose, but also freely combinable with others in order to handle more complex tasks. It’s a grammar of tools, elegantly making infinite use of finite means.

That’s not how things work in syntax (or linguistics at large). Linguistic formalisms are more like a cable subscription. You first have to choose a provider (= formalism), and there might not be much choice depending on where you live. The provider then gives you a base package that’s non-negotiable. Doesn’t matter if you don’t watch Fox News or MSNBC, it’s in there. Just like you can’t say “Sure, I’ll do syntax, but I’ll drop tree structures because they don’t matter for my problem.” When you submit your paper to LI, it will have to talk in terms of trees. But of course the base package isn’t everything, you can add on HBO, some sports channels, whatever you really want. That’s when your inner linguist goes on a shopping spree and says “Ooh, I’ll add split probes, and DP-phases, and Merge-over-Move because transderivational constraints are bound to make a come back any day now.” And then you have one hugely complicated thing where it is hard to tell what depends on what and how the research findings could be phrased independently of the formalism’s machinery.

I understand why this is the current state of affairs, or at least I think I do. The idea is that the phenomena themselves aren’t the object of study, they’re just a window into what truly matters: the cognitive machinery that we call the faculty of language. So of course you want to keep the formalism constant across phenomena, that’s the only way to get truly important insights. Or is it?

Shock and surprise, I disagree. You only need to keep the formalism constant if you don’t know how to reliably transfer insights between different formalisms. Mathematical linguistics is very good at this. Sometimes it’s downright trivial. With complex formalisms, it tends to be a harder problem, admittedly. But it’s not an insurmountable one. And frankly, sticking with a complex theories just creates a vicious circle: complex theories make it hard to transfer insights, so we’ll stick to a single theory, which by necessity will be very complex, making it hard to transfer insights, and so on. If you want one formalisms for everything, one ring to rule them all, then you will indeed end up with a very complex piece of machinery. So how about we just don’t do that anymore?

My own KISS theories

Software engineering and science are two very different enterprises, so why would one want scientific theories to follow the KISS principle? Well, as pointed out by Hans-Martin Gärtner and the example of the Quanta article, that’s pretty common in other sciences. You identify a narrow domain and design the simplest possible story that covers the widest range of data within that domain. I think the same can be done in linguistics, and I will shamelessly plug my own work to illustrate this:

The monotonicity view of morphosyntax I blogged about recently qualifies as a KISS theory. It makes no assumptions about morphosyntactic representations or operations, it doesn’t have a single thing to say about features. It deliberately abstracts away from these things. Instead, it identifies hierarchies based on the available data and combines that with a single principle on how these hierarchies limit the space of possible realizations. Yes, there’s all kinds of things it has exactly zilch to say about. For instance, my account of the PCC has no universal asymmetry between direct and indirect objects — which turned out to be a good thing because Adrian Stegovec showed that the asymmetry can go the other way. That’s why focusing on a narrow empirical domain can be a good thing; it increases robustness.

Another KISS-like theory is my account of the Adjunct Island constraint. It relies on one simple mathematical fact: if adjuncts are optional and displacement involves some kind of dependency at the target site, then adjuncts are necessarily islands. That’s because optionality creates (un)grammaticality entailments such that if a structure is ill-formed without an adjunct, adding an adjunct cannot rescue it. By assumption, movement out of an adjunct involves a dependency at the target site. That dependency would be unsatisfied without the adjunct, so by the ungrammaticality entailment the sentence with the adjunct is also illicit, even though the dependency is satisfied in this case. So the Adjunct Island constraint isn’t a case of forbidden movement, it’s a case of optionality enforcing ungrammaticality.

It doesn’t matter for this story what kind of tree structure you subscribe to, it doesn’t matter if you use movement or TAG-style adjunction, if doesn’t matter if you use feature checking or some other way of tracking dependencies. All you need is an agreed-upon notion of adjunct and a system for how dependencies are encoded and resolved. That notion induces entailment relations between distinct representations, and these entailments give you the adjunct island effects. Oh, btw, parasitic gaps are still allowed. The same story also works for the Coordinate Structure Constraint, with ATB-extraction falling out as an analogue of parasitic gaps.

Now I don’t think those proposals are perfect. One advantage of the KISS approach is that one can custom-tailor the machinery so that it easily accommodates rare or marked phenomena that require complex stipulations and addenda in a full-blown formalism. I haven’t done that (yet?). My island story falls short because I have precious little to say about cases where one does get to extract from adjuncts, e.g. Truswell sentences. Under my story, it isn’t the existence of islands that’s surprising, it’s that there are exceptions. The monotonicity approach still has a lot of empirical ground to cover, too, as I’ve only looked at a handful of phenomena so far. So they’re not perfect examples of a KISS theory. The perfect KISS theory also has to have top-notch empirical coverage through years of tweaking and refining. But in my totally unbiased opinion the proposals above are decent starts, and I really like that they’re different from the usual methodology in linguistics.