A tribute to Joost Engelfriet

🕑 6 min • 👤 Thomas Graf • 📆 September 02, 2020 in Discussions • 🏷 formal language theory, tree transductions

Ed Stabler sent me a link to the most recent paper by Joost Engelfriet, which concludes with the following message:

That’s all folks! This was my last paper. Thank you, dear reader, and farewell.

That’s bitter-sweet. On the one hand, I admire that he can draw a line in the sand like this. On the other hand, I wish he’d erase that line and keep going for a few more years. Even though Engelfriet isn’t a mathematical linguist — and might not even be aware of the more linguistic side of that field, the one that we serve here at the Outdex Café — he has had a profound influence on the field, including a lot of my own work.

I’ve never met Joost Engelfriet, I only know him through his papers. But those are some impressive papers. They have been my go-to source for anything related to tree transductions. And if you’re a regular reader of this blog, you know that this isn’t just some mathematical curiosity, tree transductions are the lens that allows us to make sense of syntax. Movement is a tree transduction. The bimorphism perspective of the inverted T-model involves tree transductions. The derivation tree perspective of MGs wouldn’t be possible without tree transductions, and that, in turn, means that we would have missed the subregular nature of syntax. Engelfriet wasn’t the first to work on tree transductions — that honor goes to James Thatcher and William Rounds (who cites Transformational Grammar as a direct inspiration for tree transductions). Engelfriet was about 5 years late to the party, yet he’s shaped the field more than anybody else. I think it’s actually impossible to find a paper on tree transductions that doesn’t cite Engelfriet. He’s that integral to the field; Engelfriet is the Chomsky of tree transductions.

Engelfriet has also worked hard on connecting the automata-theoretic view of tree transductions to mathematical logic, in particular monadic second-order logic (MSO). And because tree transductions apparently aren’t complicated enough for him, he also teamed up with Bruno Courcelle to produce the definitive reference on MSO over graphs. And again this isn’t just some pointless mathematical exercise, MSO has been an integral part of mathematical linguistics since the mid 90s, pioneered by Jim Rogers, Thomas Cornell, and Uwe Mönnich, among others. It allowed us to study derivational formalisms as if they were representational formalisms, and that, too was an important stepping stone for me towards subregular syntax.

Almost every paper I have written incorporates work by Engelfriet in one way or another. And it’s not just that his research is essential to my work and the field at large, his papers are also fun. I won’t pretend that they’re easy reads, at least they’re not for me. Tree transductions are much more complex than string transductions, and topics like tree transducer decompositions or the equivalence of MSO-definable tree transductions and macro tree transductions of linear size increase are not for the faint of heart. But an Engelfriet paper never feels like it is any more complex than it has to be. It’s the same feeling of awe that I experience when watching somebody speedrun Super Metroid: I don’t quite get everything they’re doing, and I certainly can’t do it myself, but I can recognize the beauty in it.

Engelfriet’s work is also like Super Metroid speedruns in that many Outdex readers were probably unaware of its existence. Well, now you’re in the know. And if you now want to become a true Engelfriet aficionado, here’s a list of papers (and one book) that were essential reading for me:

  • Engelfriet (1975): Back in the days when tree transductions were the wild west, this paper established the key differences between bottom-up and top-down tree transductions that all modern work builds on.

  • Engelfriet (1977): Another classic, this one introduced the notion of regular look-ahead for top-down tree transductions; look-ahead is now a fundamental parameter of tree transducers. I’ve been thinking about look-ahead a lot recently in an attempt to expand sensing tree automata into transducers for movement. Don’t get your hopes up, though, it’s not ready for a post yet.

  • Engelfriet and Vogler (1985): The arrival of macro tree transducers. Every kind of tree transduction that you’ll come across in computational syntax is equivalent to some constrained macro tree transducer. Once you understand macro tree transducers, everything else is just a special case of a very general top-down device.

  • Engelfriet and Hoogeboom (2001): Engelfriet isn’t just all about trees and graphs, he also did some work on string transductions. This paper shows how two-way finite state transducers over strings are related to MSO string transductions, and this result is now spreading in the subregular phonology community.

  • Engelfriet and Maneth (2003): The paper that establishes the lovely result I mentioned above: MSO tree-to-tree transductions are equivalent to macro tree transductions of linear size increase. This is an important bridge that allows to connect the logical view to the automata-theoretic one. Both are used in mathematical linguistics, e.g. in work on MGs, so it’s good to have some way of translating between them, even if we still need to learn a lot more about how the weaker subclasses relate to each other.

  • Engelfriet, Lilin, and Maletti (2009): While this paper is focused on a particular type of tree transductions (extended multi bottom-up tree transducers), it gives a good idea of what the modern tree transducer landscape looks like and what the advantages and shortcomings of various transduction types are. I don’t know how many times I’ve looked up Table 1 and Figure 5 in this paper.

  • Maletti and Engelfriet (2012): Another paper that’s not directly about tree transductions, but is more closely aligned with the interests of computational linguists. This is a follow-up to a paper by Marco Kuhlmann and Giorgio Satta where they show that TAGs are not closed under (strong) lexicalization. Kuhlmann & Satta left open what kind of grammar formalism would be needed to strong lexicalize TAG, and this paper provides the answer: simple context-free tree grammars. This one stings a bit every time I see it because I was working on a reply to Kuhlmann & Satta, but Maletti & Engelfriet were faster and had a much better result.

  • Courcelle and Engelfriet (2012): The massive tome on MSO graph transductions that I mentioned earlier. No, I haven’t read the whole thing. Yes, I do keep coming back to it.

  • Engelfriet, Fülöp, and Maletti (2017): Tree transducers break down into two macro-classes, the top-down transducers and the bottom-up transducers. Standard bottom-up transducers don’t cut it for many tasks, including handling movement, so nowadays a lot of the attention goes to multi bottom-up transducers. Top-down transducers are also very limited, prompting the introduction of linear extended top-down tree transducers (also known as synchronous tree substitution grammars — because transductions can be regarded as grammars, too). But extended top-down tree transducers aren’t closed under composition, which means that a cascade of these transductions can do things a single extended top-down tree transducer cannot do. This raises the question how powerful these cascades are, and this paper provides the answer.

  • Engelfriet, Maletti, and Maneth (2018): This paper studies multiple context-free tree grammars, which are closely related to (set-local) multi-component TAG and multiple context-free string grammars. As you might know, the latter two are weakly equivalent to MGs. I haven’t fully absorbed this paper yet, but I have long wondered if my translation from standard TAG to MGs with lowering could be extended to multi-component TAG. With a bit more time, I might find the answer in this paper.

There’s many other papers I could’ve listed here. Feel free to link to your favorite in the comments. Happy reading everyone!

References (The Engelfriet Paperpalooza)

Courcelle, Bruno, and Joost Engelfriet. 2012. Graph structure and monadic second-order logic: A language-theoretic approach. Cambridge, UK: Cambridge University Press.

Engelfriet, Joost. 1975. Bottom-up and top-down tree transformations — a comparison. Mathematical Systems Theory 9.198–231. doi:10.1007/BF01704020.

Engelfriet, Joost. 1977. Top-down tree transducers with regular look-ahead. Theory of Computing Systems 10.289–303. doi:10.1007/BF01683280.

Engelfriet, Joost, Zoltán Fülöp, and Andreas Maletti. 2017. Composition closure of linear extended top-down tree transducers. Theory of Computing Systems 60.129–171. doi:10.1007/s00224-015-9660-2.

Engelfriet, Joost, and Hendrik Jan Hoogeboom. 2001. MSO definable string transductions and two-way finite-state transducers. ACM Transactions of Computational Logic 2.216–254. doi:10.1145/371316.371512.

Engelfriet, Joost, Eric Lilin, and Andreas Maletti. 2009. Extended multi bottom-up tree transducers. Acta Informatica 46. doi:10.1007/s00236-009-0105-8.

Engelfriet, Joost, Andreas Maletti, and Sebastian Maneth. 2018. Multiple context-free tree grammars: Lexicalization and characterization. Theoretical Computer Science 728.29–99. doi:10.1016/j.tcs.2018.03.014.

Engelfriet, Joost, and Sebastian Maneth. 2003. Macro tree translations of linear size increase are MSO definable. SIAM Journal on Computing 32.950–1006. doi:10.1137/S0097539701394511.

Engelfriet, Joost, and Heiko Vogler. 1985. Macro tree transducers. Journal of Computer and System Sciences 31.71–146.

Maletti, Andreas, and Joost Engelfriet. 2012. Strong lexicalization of tree adjoining grammars. Proceedings of the 50th annual meeting of the association for computational linguistics (acl 2012). https://www.aclweb.org/anthology/P12-1053.