As a student I didn’t care much for work on syntactic parsing since I figured all the exciting big-picture stuff is in the specification of possible syntactic structures, not how we infer these structures from strings. It’s a pretty conventional attitude, widely shared by syntacticians and a natural corollary of the competence-performance split — or so it seems. But as so often, what seems plausible and obvious at first glance quickly falls apart when you probe deeper. Even if you don’t care one bit about syntactic processing, parsing questions still have merit because they quickly turn into questions about syntactic architecture. This is best illustrated with a concrete example, in that abstract sense of “concrete” that everyone’s so fond of here at the outdex headquarters.
An abstract result…
Here is a result by John Hale and Ed Stabler that sounds rather dry: it holds for every Minimalist grammar G that the string yields of the derivation trees of G form a deterministic context-free string language (Hale and Stabler 2005). This is actually mind-blowing, but it requires a lot of unpacking steps to get to why this is a very intriguing result.
First, let’s make sure we all agree on what an MG derivation tree is. While a phrase structure tree encodes the constituency of a sentence, a derivation tree encodes who this structure is built by a Minimalist grammar. The figure below shows a (simplified) phrase structure for which woman did Mary kiss, with the corresponding derivation tree right next to it.
Two points are crucial here:
- Feature annotation
Every lexical item in the derivation tree is annotated with a string of features that fully determines what syntactic operations the lexical item has to participate in.
- No displacement
The derivation tree is just a record of the sequence of derivational operations, it does not contain the result of performing these operations. As a result, no phrase actually moves to a higher position, everything stays low. We merely note that movement takes place but do not directly encode the result of this movement.
Because movement is encoded only implicitly, the string yield of a derivation tree usually won’t match the string yield of the phrase structure tree that is built from this derivation. If we read the leaves of the phrase structure tree above from left to right, we get which woman do-ed Mary kiss. If we do the same with the derivation tree, we get do -ed Mary kiss which woman. Actually, we get do[T+ h+ wh+ C-] ed[V+ nom+ T- h-] Mary[D- nom-] kiss[D+ D+ V-] which[N+ D- wh-] woman[N-] because the feature annotations are part of the leaf nodes. That’s clearly not the same string as which woman do-ed Mary kiss. It is the result of undoing all the movement steps while adding explicit feature annotations to all lexical items.
In the statement above, then, the phrase string yields of the derivation of G intuitively corresponds to the strings generated by the grammar if we completely factor out the effects of movement on word order. That may seem odd from a Minimalist perspective where Merge and Move are interspersed, but it fits well with the older Aspects model where D[eep]-structure establishes the basic head-argument relations and is then contorted by transformations into the actually observed S[urface]-structure. From that perspective, the statement above is about such “D-structure strings”, except that we also have fully explicit feature annotations.
Okay, so now we know what kind of object we’re talking about, but it’s still unclear what kind of property holds of them. Hale and Stabler show that these strings form a deterministic context-free string language, which I’ll abbreviate as deterministic CFL from here on out (in an effort to pretend that this terminology is silky-smooth and rolls right off the tongue). Now here’s the crucial bit: deterministic CFLs are very easy to parse compared to non-deterministic CFLs. The syntax of programming languages, for instance, is deliberately designed to yield deterministic CFLs to ensure speedy parsing. The step from deterministic CFLs to non-deterministic ones is where parsing performance takes a nosedive.
The Hale-Stabler result thus tells us something very surprising about syntax: if languages would instruct their syntax to simply omit all that overt movement nonsense and force their morphology to clearly spell out all the syntactic features of each lexical item, parsing would be easy-peasy. So why the heck isn’t there even a single one that does that? Languages are happy to fold numerous arcane distinctions into their morphology, but the one thing that would really help is a no-go? Talk about messed up priorities.
…and the questions it raises
I like the Hale-Stabler finding because it subverts the standard narrative: movement is something syntax gets for free, and since it is free, all languages make abundant use of it. If one buys into the notion that syntax is shaped by third-factor considerations (Chomsky 2005), then parsing complexity should push us towards a movement-free syntax because that is what allows us to stay within the realm of deterministic CFLs.
There’s many ways to poke holes into this argument. One is to simply shrug off that whole third-factors idea and keep it at “language is just as messy as any other biological system”. Or we can go full Chomsky and contend that parsing considerations are irrelevant because language developed for internal reasoning, not communication, and nobody needs to parse their own thoughts. All of that might well be true, but when given a choice between a likely truth and productive questions, the latter is often the better route to take.
A more interesting argument posits that the split between deterministic CFLs and non-deterministic CFLs with respect to parsing performance abstracts away from any potential memory limitations. It won’t shock you to hear that memory limitations are indeed a major factor in human sentence processing. So if movement allows you to shift that heavy NP to the right and thus reduce your parser’s memory load, you will gladly do that. Basically: who cares about theoretical parsing performance without movement when in practice things work better with movement?
One could also object that the Hale-Stabler result only holds if all features are directly encoded by morphology, and since they’re not, the whole point is moot. After all, if your morphology doesn’t allow you to get a deterministic CFL with or without movement, you might just as well keep movement in syntax. No point in dropping movement if you don’t stand to gain anything from it because of that obtuse morphology module.
I think those are good points, but they’re not fully convincing. Many cases of movement actually make things worse with respect to memory usage. If our current model of MG parsing is on the right track, then languages could greatly reduce memory usage if unaccusative subjects stayed in their underlying object position. In fact, objects should never appear to the left of their selecting head. What more, the arguments of a head should be put in increasing order of size. Simply doesn’t happen. If movement’s raison d’être is to reduce memory usage, it’s asleep behind the wheel most of the time.
And that point about morphology being uncooperative isn’t the whole story, either. Many movement features have some overt realization, even if it isn’t directly part of morphology. Wh-movers are headed by a wh-phrase, topicalization comes with a specific prosody, and so on. We might not get a nice agglutinative system where each feature corresponds to a separate affix, but it’s not like there’s no reflexes at all.
The general upshot is that this isn’t really a conceptual issue, it’s an empirical one. There aren’t any conclusive answers yet because we haven’t really looked at those issues in detail:
What kind of information about a lexical item’s feature make-up is conveyed by morphology and/or prosody? What ratio of its feature make-up is encoded overtly? How much does this vary across languages, and if so, by how much? Does this ratio interact with what kind of movement configurations the language allows, or at least their relative frequency? Can any of this be explained in generative models of morphology or prosody? Does any of this contradict the inverted T-model?
Which instances of movement reduce memory usage? Which increase it? What kinds of movement (don’t) exist even though they worsen (ameliorate) memory usage? Why do they (not) exist?
Those are interesting questions in their own right, but it’s the connection to parsing and the role of movement that truly make them intriguing to me. We have a formal argument that clearly pulls in the direction of movement-free syntax. We have some conceptual counterarguments to that, and interesting data points, but nothing fully worked out, no formal model with clear predictions. We should have a model that lets us plug in the various constraints for and against movement to compute a space of movement systems, with peaks corresponding to optimal solutions and valleys to really bad solutions. When we map natural languages into that space, they should tend to cluster around the peaks. That doesn’t have anything to do with parsing, but it’s the interaction of movement and parsing that gives rise to the issue in the first place.
Hale, John T., and Edward P. Stabler. 2005. Strict deterministic aspects of Minimalist grammars. (Ed. by.) Philippe Blache, Edward P. Stabler, Joan Busquets, and Richard Moot. Logical aspects of computational linguistics: 5th international conference. Berlin, Heidelberg: Springer. doi:10.1007/11422532_11. http://dx.doi.org/10.1007/11422532_11.