outdexhttps://outde.xyz/2020-07-07T00:00:00-04:00language ⊗ computationLaTeX pet peeves2020-07-07T00:00:00-04:002020-07-07T00:00:00-04:00Thomas Graftag:outde.xyz,2020-07-07:/2020-07-07/latex-pet-peeves.html<p>Somehow I wound up with five students writing their theses this Spring semester, and you know what this means: lots and lots of reading. And when reading, I can’t help but get riled up every time I see one of my LaTeX pet peeves. I also like to read the source files in parallel with the PDF, and over the years I’ve come across some nightmare-fuel coding in those files.</p>
<p>So, in a (futile?) attempt to save my future self’s sanity, here’s a list of all my LaTeX pet peeves. Many of them are covered in your average LaTeX tutorial, but people rarely read those cover to cover and instead just go to specific parts that they need to solve whatever problem they’re wrestling with. Compiling it all into a single list might make for a more useful reference. Future students of mine, read this and adhere to it. You have been warned! </p>
<p>Somehow I wound up with five students writing their theses this Spring semester, and you know what this means: lots and lots of reading. And when reading, I can’t help but get riled up every time I see one of my LaTeX pet peeves. I also like to read the source files in parallel with the PDF, and over the years I’ve come across some nightmare-fuel coding in those files.</p>
<p>So, in a (futile?) attempt to save my future self’s sanity, here’s a list of all my LaTeX pet peeves. Many of them are covered in your average LaTeX tutorial, but people rarely read those cover to cover and instead just go to specific parts that they need to solve whatever problem they’re wrestling with. Compiling it all into a single list might make for a more useful reference. Future students of mine, read this and adhere to it. You have been warned! </p>
<h1 id="using-latex-where-it-isnt-needed">Using LaTeX where it isn’t needed</h1>
<p>Alright, let’s start with the most important one: only use LaTeX if it’s the best tool for the job. LaTeX is the unrivaled champion when you have to do a lot of heavy lifting: bibliographies, references, multi-part documents, math, trees, autosegmental tiers, automata, glossed examples, data plotting, a single document for producing both handouts and slides; LaTeX handles all of that, um, perhaps not well, but better than any other solution on the market. Beware, though: with great power comes great responsibility, and a lot of my pet peeves are actually examples of the writer not paying attention to the subtle details of LaTeX that are the source of its power.</p>
<p>Great power also means a certain degree of clunkyness. For simple jobs, simpler tools are more efficient. If all you have to show me is a todo list, write that in a markdown dialect and convert it to PDF with <a href="https://pandoc.org/">pandoc</a>. If it’s powerful enough for an outdex post, it’s probably powerful enough for whatever collection of notes you want to show me in our meeting.</p>
<h1 id="not-using-labels-and-references">Not using labels and references</h1>
<p>If I see a hardcoded reference like <code>Section 1</code> in the source code, my blood pressure spikes. The whole point of LaTeX is that you don’t have to do these things. Liberally assign labels, and then use them, e.g. <code>Section~\ref{sec:some_section}</code>.</p>
<p>Be systematic with your labels. For instance, I like the format <code>\label{type:container_name}</code>, where type could be</p>
<ul>
<li><code>cha</code> for chapters,</li>
<li><code>sec</code> for sections,</li>
<li><code>ssec</code> for subsections,</li>
<li><code>fig</code> for figure,</li>
<li><code>tab</code> for table,</li>
<li><code>ex</code> for example.</li>
</ul>
<p>And container is the name of the containing unit. If a thesis contains a subsection in a chapter with label <code>\label{cha:foo}</code>, then the subsection would get the label <code>\label{ssec:foo_bar}</code>. That system isn’t perfect as it creates quite a hassle when I decide to make a subsection its own section, but it provides a rudimentary typing system for the document.</p>
<p>And if you reference a section that doesn’t exist and/or doesn’t have a label yet, don’t just omit the reference. Use <code>\ref{type:container_name}</code> based on what the label should be. That way, you won’t forget to insert the reference later on, and when you finally get around to writing that section, you already have a record of which other sections tie into it.</p>
<p>Oh, and because I’ve seen this mistake an estimated 328 times by now: labels in a float (figures, tables) must be specified <strong>after</strong> <code>\caption</code>, not before.</p>
<h1 id="not-using-macros">Not using macros</h1>
<p>The whole point of LaTex is to separate content from presentation. Don’t write something like <code>$\text{the} :: =\mathrm{N} \mathrm{D} -\mathrm{nom}$</code>. It’s tedious, prone to errors, and lacks semantics. Just define a bunch of custom macros:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb1-1" title="1"><span class="fu">\newcommand</span>{<span class="ex">\featfont</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\mathrm</span><span class="ss">{#1}}</span>}</a>
<a class="sourceLine" id="cb1-2" title="2"><span class="fu">\newcommand</span>{<span class="ex">\fsel</span>}[1]{<span class="ss">\ensuremath{=</span><span class="sc">\featfont</span><span class="ss">{#1}}</span>}</a>
<a class="sourceLine" id="cb1-3" title="3"><span class="fu">\newcommand</span>{<span class="ex">\fcat</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\featfont</span><span class="ss">{#1}}</span>}</a>
<a class="sourceLine" id="cb1-4" title="4"><span class="fu">\newcommand</span>{<span class="ex">\flcr</span>}[1]{<span class="ss">\ensuremath{+</span><span class="sc">\featfont</span><span class="ss">{#1}}</span>}</a>
<a class="sourceLine" id="cb1-5" title="5"><span class="fu">\newcommand</span>{<span class="ex">\flce</span>}[1]{<span class="ss">\ensuremath{-</span><span class="sc">\featfont</span><span class="ss">{#1}}</span>}</a>
<a class="sourceLine" id="cb1-6" title="6"><span class="fu">\newcommand</span>{<span class="ex">\mlex</span>}[2]{<span class="ss">\ensuremath{</span><span class="sc">\text</span>{#1}<span class="ss"> :: #2}</span>}</a></code></pre></div>
<p>With those macros, you can rewrite the code above as <code>\mlex{the}{\fsel{N} \fcat{D} \flce{nom}}</code>. That’s easier to read, and it conveys clearly that you’re defining a lexical item with selector feature <code>N</code>, category feature <code>D</code>, and licensee feature <code>nom</code>. And if you decide later on that you want to use a different notation for features, you only have to change the macros.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb2-1" title="1"><span class="fu">\newcommand</span>{<span class="ex">\featfont</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\mathrm</span><span class="ss">{#1}}</span>}</a>
<a class="sourceLine" id="cb2-2" title="2"><span class="fu">\newcommand</span>{<span class="ex">\fsel</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\featfont</span><span class="ss">{#1}^+}</span>}</a>
<a class="sourceLine" id="cb2-3" title="3"><span class="fu">\newcommand</span>{<span class="ex">\fcat</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\featfont</span><span class="ss">{#1}^-}</span>}</a>
<a class="sourceLine" id="cb2-4" title="4"><span class="fu">\newcommand</span>{<span class="ex">\flcr</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\featfont</span><span class="ss">{#1}^+}</span>}</a>
<a class="sourceLine" id="cb2-5" title="5"><span class="fu">\newcommand</span>{<span class="ex">\flce</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\featfont</span><span class="ss">{#1}^-}</span>}</a>
<a class="sourceLine" id="cb2-6" title="6"><span class="fu">\newcommand</span>{<span class="ex">\mlex</span>}{2}{<span class="ss">\ensuremath{</span><span class="sc">\text</span>{#1}<span class="ss"> :: #2}</span>}</a></code></pre></div>
<h1 id="subscripts">Subscripts</h1>
<p>One quirk of LaTeX is that it only provides subscripts in math mode. So a writer that desires, say, labeled bracketing with subscripts, might try the following code:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb3-1" title="1">John [<span class="ss">$_{VP}$</span> arrived <span class="ss">$t$</span>]</a></code></pre></div>
<p>This is wrong, <em>wrong</em>, <strong>wrong</strong>. In math mode, LaTeX treats each character as a separate mathematical variable and inserts some white space between them. So <code>$_{VP}$</code> is actually interpreted as <code>$_{V P}$</code>. Instead of a single subscript <em>VP</em>, you get two subscripts <em>V</em> and <em>P</em>.</p>
<p>The difference is pretty blatant once you’re aware of it. Here’s the output produced from the code above.</p>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/subscript_wrong.svg" alt="The subscript letters are too far apart" /><figcaption>The subscript letters are too far apart</figcaption>
</figure>
<p>And here’s what it should look like:</p>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/subscript_correct.svg" alt="Correct subscript spacing" /><figcaption>Correct subscript spacing</figcaption>
</figure>
<p>Notice the decreased spacing between V and P.</p>
<p>The second output is produced by replacing <code>$_{VP}$</code> with <code>$_\mathit{VP}$</code>:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb4-1" title="1">John [<span class="ss">$_</span><span class="sc">\mathit</span><span class="ss">{VP}$</span> arrived <span class="ss">$t$</span>]</a></code></pre></div>
<p>This now typesets VP as a single variable in math italics.</p>
<p>Alternatively, you could also use <code>$_\text{VP}$</code> or <code>$_\textrm{VP}$</code> to have the label typeset as normal text.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb5-1" title="1">John [<span class="ss">$_</span><span class="sc">\text</span>{VP}<span class="ss">$</span> arrived <span class="ss">$t$</span>]</a></code></pre></div>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/subscript_text.svg" alt="Subscript typeset with \textrm" /><figcaption>Subscript typeset with <code>\textrm</code></figcaption>
</figure>
<p>Quite generally, may I suggest you define a custom macro for the subscripts of labeled brackets?</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb6-1" title="1"><span class="fu">\newcommand</span>{<span class="ex">\labsub</span>}[1]{<span class="ss">\ensuremath{_</span><span class="sc">\text</span>{#1}<span class="ss">}</span>}</a></code></pre></div>
<p>This way, you can change the definition of the macro to fit the layout of your paper. This separation of code and output is exactly what makes LaTeX so powerful, so make generous use of it! In fact, why don’t you just use the following macro for your labeled bracketing:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb7-1" title="1"><span class="fu">\newcommand</span>{<span class="ex">\labbrack</span>}[2][<span class="fu">\unskip</span>]{[<span class="ss">\ensuremath{_</span><span class="sc">\text</span>{#1}<span class="ss">}</span> #2]}</a></code></pre></div>
<p>This macro allows you to produce the output above from <code>\labbrack{CP}{John \labbrack{VP}{arrived $t$}}</code>.</p>
<h1 id="ensure-math-with-well-ensuremath">Ensure math with, well, <code>\ensuremath</code></h1>
<p>Some of the macros above use a command you might not have seen before: <code>\ensuremath</code>. The name tells you exactly what it does: it ensures that its argument is typeset in math mode. When you need math mode inside a macro, you should always use <code>\ensuremath</code>.</p>
<p>Crucially, don’t try to do something like the following:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb8-1" title="1"><span class="fu">\newcommand</span>{<span class="ex">\labsub</span>}[1]{<span class="ss">${_</span><span class="sc">\text</span>{#1}<span class="ss">}$</span>}</a></code></pre></div>
<p>This is a disaster waiting to happen. If you use this command while you’re in already in math mode, then the first <code>$</code> will switch you back into text mode and LaTeX will complain that you can’t use <code>_</code> in text mode. This doesn’t happen with <code>\ensuremath</code>: if you’re not in math mode, it switches you into math mode, and if you’re already in math mode, it keeps you in math mode.</p>
<h1 id="gb4es-automath">gb4e’s automath</h1>
<p>Some of you might say that we don’t need any of that subscript hackery because the package <code>gb4e</code> redefines <code>_</code> so that it can be used in text mode. This is a bad solution as it tends to break packages in weird ways that are difficult to debug. Just spare yourself the hassle and turn that “feature” off:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb9-1" title="1"><span class="bu">\usepackage</span>{<span class="ex">gb4e</span>}</a>
<a class="sourceLine" id="cb9-2" title="2"><span class="fu">\noautomath</span></a></code></pre></div>
<p>And while you’re at it, you might also want to add a bit more boilerplate around <code>gb4e</code> to make sure it doesn’t throw some odd error messages on recent TeX distros.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb10-1" title="1"><span class="fu">\makeatletter</span></a>
<a class="sourceLine" id="cb10-2" title="2"><span class="fu">\def</span>\new@fontshape{}</a>
<a class="sourceLine" id="cb10-3" title="3"><span class="fu">\makeatother</span></a>
<a class="sourceLine" id="cb10-4" title="4"><span class="bu">\usepackage</span>{<span class="ex">gb4e</span>}</a>
<a class="sourceLine" id="cb10-5" title="5"><span class="fu">\noautomath</span></a></code></pre></div>
<p>Yes, <code>gb4e</code> is pretty broken nowadays. But it’s still better than <code>linguex</code> in that it actually respects LaTeX’s split between commands and environments and doesn’t encourage bad coding choices (context-sensitive commands like <code>\Next</code> and <code>\Last</code> undermine the strengths of the LaTeX labeling mechanism). Here’s hoping somebody comes up with a completely new package, based on tikz matrices — what are you looking at me for?</p>
<h1 id="punctuation">Punctuation</h1>
<p>Alright, this one is really a major flaw in LaTeX’s design, so I don’t blame anybody who gets this wrong. LaTeX makes a distinction between a sentence-ending dot and other uses of dots, in particular as part of an abbreviation. The former has more whitespace after it to create a visual separation between sentences. So far so good, that’s a nice feature.</p>
<p>The problem is that LaTeX, in a misguided attempt to simplify the writer’s life, automatically switches between those two types of dots. By default, dots after lowercase characters are taken to be sentence-ending, and dots after uppercase characters are not taken to be sentence-ending. This is a major pain in the butt. Consider the following example.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb11-1" title="1">The phrase then moves to the specifier of the CP.</a>
<a class="sourceLine" id="cb11-2" title="2">This movement can be driven by various features, e.g. a wh-feature.</a></code></pre></div>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/punctuation_wrong.svg" alt="The space after CP is too narrow, the one after e.g. is too wide." /><figcaption>The space after <em>CP</em> is too narrow, the one after <em>e.g.</em> is too wide.</figcaption>
</figure>
<p>LaTeX’s punctuation rules do exactly the wrong thing here. The dot after <code>CP</code> will be interpreted as part of an abbreviation when it is in fact sentence-ending. And the dot after <code>e.g.</code> will be interpreted as sentence-ending when it is in fact part of an abbreviation.</p>
<p>In order to fix this, we have to hack the code to override LaTeX’s default choices. We can put <code>\@</code> in front of a dot that we always want to be sentence-ending, and we can put a backslash before the space right after an abbreviation.</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb12-1" title="1">The phrase then moves to the specifier of the CP<span class="fu">\@</span>.</a>
<a class="sourceLine" id="cb12-2" title="2">This movement can be driven by various features, e.g.<span class="fu">\ </span>a wh-feature.</a></code></pre></div>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/punctuation_correct.svg" alt="Correct spacing for sentence-ending dot and abbreviation dot" /><figcaption>Correct spacing for sentence-ending dot and abbreviation dot</figcaption>
</figure>
<p>So LaTeX forces you to not only keep in mind what kind of dot you want, but to also track the context of the dot to determine if it needs special diacritics to get what you want. This is crappy design, but at least you’ll quickly develop the required muscle memory to work around this crappy design.</p>
<h1 id="is-your-friend"><code>~</code> is your friend</h1>
<p>There is actually a third solution for the spacing issue pointed out above. Instead of adding a backslash before the space, we could have replaced the space with a tilde.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb13-1" title="1">The phrase then moves to the specifier of the CP<span class="fu">\@</span>.</a>
<a class="sourceLine" id="cb13-2" title="2">This movement can be driven by various features, e.g.~a wh-feature.</a></code></pre></div>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/punctuation_tilde.svg" alt="Tilde can be used instead of backslash to avoid linebreaks" /><figcaption>Tilde can be used instead of backslash to avoid linebreaks</figcaption>
</figure>
<p>This also produces a normal-width space, but it also tells LaTeX that it shouldn’t put a linebreak right after the abbreviation. Either <em>e.g. a</em> stays on the current line, or it all goes onto the next line, LaTeX can’t keep <em>e.g.</em> on the current line and put <em>a</em> on the next. Abbreviations at the end of a line look rather odd, so I generally put <code>~</code> after every abbreviation I use. It’s pretty much become muscle memory by now, and I just don’t use the backslash after abbreviation dots anymore.</p>
<p>You can go with whichever one of the two you prefer. But you should use one of them. Don’t just write <code>e.g. a</code>, the spacing will be wrong. It’s a very minor thing, but again, once you’re aware of it you’ll be annoyed when you see a paper that get’s this wrong.</p>
<h1 id="ellipsis">Ellipsis</h1>
<p>Since we’re already on the subject of punctuation, <code>...</code> is not how you typeset an ellipsis in LaTeX. The command for that is <code>\ldots</code>, and again the difference is in the spacing. Just compare:</p>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/ellipsis_end.svg" alt="Ellipsis at end of a sentence" /><figcaption>Ellipsis at end of a sentence</figcaption>
</figure>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/ellipsis_middle.svg" alt="Ellipsis in the middle of a sentence" /><figcaption>Ellipsis in the middle of a sentence</figcaption>
</figure>
<h1 id="quotation-marks">Quotation marks</h1>
<p>My next pet peeve is actually covered early on in every LaTeX tutorial, yet I still get papers that consistently do it wrong: don’t use <code>"</code> for quotation marks. You want <code>\</code>`<code>for opening quotation marks and</code>’’` (that’s two single quotes) for closing quotation marks. Again the difference in typesetting is immediately apparent:</p>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/quotation.svg" alt="Quotation marks must not be " in the source code" /><figcaption>Quotation marks must not be <code>"</code> in the source code</figcaption>
</figure>
<p>Yes, this is just ridiculous and could easily be handled automatically (many LaTeX editors will do it for you), but LaTeX gonna LaTeX.</p>
<h1 id="and-slash-are-not-the-same-slash"><code>/</code> and <code>\slash</code> are not the same slash</h1>
<p>Another bit of LaTeX voodoo. While <code>/</code> does indeed produce a slash, sometimes it’s recommended that you use <code>\slash</code> instead. The difference is that <code>/</code> is a non-breaking character, which means that LaTeX isn’t allowed to insert a linebreak before or after it. Sometimes that’s a good thing. You really don’t want the CG type <code>S/NP</code> to be split across two lines. But with, say, <code>consonantal/non-vocalic</code>, you’ll probably get weird spacing if LaTeX isn’t allowed to break this up into <code>consonantal/</code> and <code>non-vocalic</code>. So even though their output looks the same, <code>/</code> and <code>\slash</code> don’t do the same thing.</p>
<p>Personally, I think it would’ve been better to have a compositional system where we can use a special marker like <code>|</code> to indicate that the preceding symbol is a non-breaking character. Then <code>~</code> would just be a shorthand for a space followed by <code>|</code>. Oh well, the LaTeX code base is 40 years old by now, it’s bound to have its quirks.</p>
<h1 id="epsilon-isnt-the-epsilon-you-want"><code>\epsilon</code> isn’t the epsilon you want</h1>
<p>You thought we were done with LaTeX curiosities, hmm?</p>
<p>This one is not the mistake I see most often, but that’s just because not every student paper has a need for the epsilon symbol. But when a paper does use epsilon, there’s a good chance it won’t be the symbol that the student had in mind. For some reason, most LaTeX fonts use a shape for epsilon that doesn’t look like the typical epsilon. They produce <span class="math inline">\(\epsilon\)</span> instead of <span class="math inline">\(\varepsilon\)</span>. If you want <span class="math inline">\(\varepsilon\)</span> instead, you have to use <code>\varepsilon</code> rather than <code>\epsilon</code>.</p>
<h1 id="and-and-arent-tuple-brackets">…and <code><</code> and <code>></code> aren’t tuple brackets</h1>
<p>Again this one doesn’t show up that often simply because not every paper needs to typeset tuples, but when it’s necessary, people get it wrong oh so often. And in this case, it’s not even LaTeX being needlessly obtuse. Tuple brackets are not the same as <span class="math inline">\(<\)</span> and <span class="math inline">\(>\)</span>, they have very different angles: <span class="math inline">\(\langle\)</span> and <span class="math inline">\(\rangle\)</span>. If you want the latter, you have to use <code>\langle</code> and <code>\rangle</code> (left and right angle).</p>
<p>Tuple brackets also have completely different spacing:</p>
<ul>
<li>Using <code><</code> and <code>></code>: <span class="math inline">\(<a, b>\)</span></li>
<li>Using <code>\langle</code> and <code>\rangle</code>: <span class="math inline">\(\langle a, b \rangle\)</span></li>
</ul>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/tuples.svg" alt="Tuple brackets are \langle and \rangle, not < and >" /><figcaption>Tuple brackets are <code>\langle</code> and <code>\rangle</code>, not <code><</code> and <code>></code></figcaption>
</figure>
<p>Again I’d suggest using a custom macro to make your life a little easier.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb14-1" title="1"><span class="fu">\newcommand</span>{<span class="ex">\tuple</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\langle</span><span class="ss"> #1 </span><span class="sc">\rangle</span><span class="ss">}</span>}</a></code></pre></div>
<p>You could also use a version with automatic sizing of the tuple brackets, but this will often give you problems if LaTeX needs to insert a linebreak.</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode latex"><code class="sourceCode latex"><a class="sourceLine" id="cb15-1" title="1"><span class="fu">\newcommand</span>{<span class="ex">\Tuple</span>}[1]{<span class="ss">\ensuremath{</span><span class="sc">\left</span><span class="ss"> </span><span class="sc">\langle</span><span class="ss"> #1 </span><span class="sc">\right</span><span class="ss"> </span><span class="sc">\rangle</span><span class="ss">}</span>}</a></code></pre></div>
<h1 id="math-relations-and-operators">Math relations and operators</h1>
<p>The spacing difference between <code><</code> and <code>></code> on the one hand and <code>\langle</code> and <code>\rangle</code> on the other is actually an instance of a more general principle. In math mode, LaTeX distinguishes between normal symbols, operators, and relations, and they each have different spacing properties. So instead of <code>a R b</code>, you should write <code>a \mathrel{R} b</code> to get the correct spacing. If you want <code>|</code> to be an operator, you should write <code>a \mathop{|} b</code>. Again it’s a good idea to use custom macros for that.</p>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/relations.svg" alt="Math relations and operators have different spacing" /><figcaption>Math relations and operators have different spacing</figcaption>
</figure>
<h1 id="so-many-dashes">So, many, dashes</h1>
<p>Another subtle difference, this time with respect to hyphens and dashens.</p>
<ul>
<li><code>-</code> is a hyphen and is used word-internally, e.g. in <code>single-dashed</code></li>
<li><code>--</code> denotes a range, e.g. <code>pages 55--68</code> (yes, this means you should use <code>--</code> for page ranges in bibtex entries)</li>
<li><code>---</code> is an em-dash and is used with parentheticals, e.g. <code>John---as far as I'm aware---snores</code></li>
<li><code>$-$</code> is a minus and should only be used in equations: <span class="math inline">\(15 - 3\)</span></li>
</ul>
<figure>
<img src="https://outde.xyz/img/thomas/tutorial_latex/dashes.svg" alt="LaTeX has various kinds of dashes" /><figcaption>LaTeX has various kinds of dashes</figcaption>
</figure>
<p>I like to even add spaces around the super-wide <code>---</code>, and every single copy editor tried to dissuade me from that. I like the extra space, but I’m sure it’s somebody’s pet peeve, so, sorry about that.</p>
<h1 id="bibtex-capitalization">Bibtex capitalization</h1>
<p>As you might know, your bibtex style automatically handles the capitalization of your references. But it only works well if</p>
<ol type="1">
<li>your bibtex references are in Sentence Case, and</li>
<li>you explicitly indicate which characters should not be lowercased.</li>
</ol>
<p>So don’t write something like this:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode bibtex"><code class="sourceCode bibtex"><a class="sourceLine" id="cb16-1" title="1"><span class="co">title = {Some paper on MGs}</span></a></code></pre></div>
<p>There is no easy way to convert this to Sentence Case if that’s what the publisher wants. And if the publisher want’s lowercase, you’ll get <em>mgs</em> instead of <em>MGs</em>.</p>
<p>Instead, it should be:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode bibtex"><code class="sourceCode bibtex"><a class="sourceLine" id="cb17-1" title="1"><span class="co">title = {Some Paper on {MG}s}</span></a></code></pre></div>
<p>This way, you will get the correct lowercase or sentence case depending on the publisher’s stylesheet, and either way <code>MGs</code> will stay MGs and won’t be lowercased to <em>mgs</em>.</p>
<h1 id="more-to-come">More to come</h1>
<p>Okay, this is already a laundry list of things that are easy to miss when writing but leap out at the reader that’s aware of them. But we’re not done yet, I have another list that’s all about doing figures with <a href="https://en.wikibooks.org/wiki/LaTeX/PGF/TikZ">TikZ</a>. In comparison to LaTeX, TikZ feels a lot more like an actual programming language, so there’s a much higher risk to write working but really nasty code. Getting proficient with TikZ is hard, the manual is over a 1000 pages by now. But I think there’s some general design principles that even newbies can follow easily and that make TikZ code a lot more fun to work with. So the next time we return to LaTeX (and I’m not saying that’s anytime soon), be prepared for some unsolicited TikZ advice by yours truly.</p>
<p>In the meantime, feel free to share your personal LaTeX pet peeves in the comments section. There’s so many subtle issues, I’m sure there’s some common mistakes that I still make myself.</p>
When parsing isn't about parsing2020-06-18T00:00:00-04:002020-06-18T00:00:00-04:00Thomas Graftag:outde.xyz,2020-06-18:/2020-06-18/when-parsing-isnt-about-parsing.html<p>As a student I didn’t care much for work on syntactic parsing since I figured all the exciting big-picture stuff is in the specification of possible syntactic structures, not how we infer these structures from strings. It’s a pretty conventional attitude, widely shared by syntacticians and a natural corollary of the competence-performance split — or so it seems. But as so often, what seems plausible and obvious at first glance quickly falls apart when you probe deeper. Even if you don’t care one bit about syntactic processing, parsing questions still have merit because they quickly turn into questions about syntactic architecture. This is best illustrated with a concrete example, in that abstract sense of “concrete” that everyone’s so fond of here at the outdex headquarters. </p>
<p>As a student I didn’t care much for work on syntactic parsing since I figured all the exciting big-picture stuff is in the specification of possible syntactic structures, not how we infer these structures from strings. It’s a pretty conventional attitude, widely shared by syntacticians and a natural corollary of the competence-performance split — or so it seems. But as so often, what seems plausible and obvious at first glance quickly falls apart when you probe deeper. Even if you don’t care one bit about syntactic processing, parsing questions still have merit because they quickly turn into questions about syntactic architecture. This is best illustrated with a concrete example, in that abstract sense of “concrete” that everyone’s so fond of here at the outdex headquarters. </p>
<h2 id="an-abstract-result">An abstract result…</h2>
<p>Here is a result by John Hale and Ed Stabler that sounds rather dry: it holds for every Minimalist grammar <em>G</em> that the string yields of the derivation trees of <em>G</em> form a deterministic context-free string language <span class="citation" data-cites="HaleStabler05">(Hale and Stabler 2005)</span>. This is actually mind-blowing, but it requires a lot of unpacking steps to get to why this is a very intriguing result.</p>
<p>First, let’s make sure we all agree on what an MG derivation tree is. While a phrase structure tree encodes the constituency of a sentence, a derivation tree encodes who this structure is built by a Minimalist grammar. The figure below shows a (simplified) phrase structure for <em>which woman did Mary kiss</em>, with the corresponding derivation tree right next to it.</p>
<figure>
<img src="https://outde.xyz/img/thomas/beyond_parsing/derivationtree_example.svg" alt="Phrase structure tree and MG derivation tree for which woman did Mary kiss" /><figcaption>Phrase structure tree and MG derivation tree for <em>which woman did Mary kiss</em></figcaption>
</figure>
<p>Two points are crucial here:</p>
<ol type="1">
<li><strong>Feature annotation</strong><br />
Every lexical item in the derivation tree is annotated with a string of features that fully determines what syntactic operations the lexical item has to participate in.</li>
<li><strong>No displacement</strong><br />
The derivation tree is just a record of the sequence of derivational operations, it does not contain the result of performing these operations. As a result, no phrase actually moves to a higher position, everything stays low. We merely note that movement takes place but do not directly encode the result of this movement.</li>
</ol>
<p>Because movement is encoded only implicitly, the string yield of a derivation tree usually won’t match the string yield of the phrase structure tree that is built from this derivation. If we read the leaves of the phrase structure tree above from left to right, we get <em>which woman do-ed Mary kiss</em>. If we do the same with the derivation tree, we get <em>do -ed Mary kiss which woman</em>. Actually, we get <em>do[T<sup>+</sup> h<sup>+</sup> wh<sup>+</sup> C<sup>-</sup>] ed[V<sup>+</sup> nom<sup>+</sup> T<sup>-</sup> h<sup>-</sup>] Mary[D<sup>-</sup> nom<sup>-</sup>] kiss[D<sup>+</sup> D<sup>+</sup> V<sup>-</sup>] which[N<sup>+</sup> D<sup>-</sup> wh<sup>-</sup>] woman[N<sup>-</sup>]</em> because the feature annotations are part of the leaf nodes. That’s clearly not the same string as <em>which woman do-ed Mary kiss</em>. It is the result of undoing all the movement steps while adding explicit feature annotations to all lexical items.</p>
<p>In the statement above, then, the phrase <em>string yields of the derivation of G</em> intuitively corresponds to the strings generated by the grammar if we completely factor out the effects of movement on word order. That may seem odd from a Minimalist perspective where Merge and Move are interspersed, but it fits well with the older Aspects model where D[eep]-structure establishes the basic head-argument relations and is then contorted by transformations into the actually observed S[urface]-structure. From that perspective, the statement above is about such “D-structure strings”, except that we also have fully explicit feature annotations.</p>
<p>Okay, so now we know what kind of object we’re talking about, but it’s still unclear what kind of property holds of them. Hale and Stabler show that these strings form a deterministic context-free string language, which I’ll abbreviate as deterministic CFL from here on out (in an effort to pretend that this terminology is silky-smooth and rolls right off the tongue). Now here’s the crucial bit: deterministic CFLs are very easy to parse compared to non-deterministic CFLs. The syntax of programming languages, for instance, is deliberately designed to yield deterministic CFLs to ensure speedy parsing. The step from deterministic CFLs to non-deterministic ones is where parsing performance takes a nosedive.</p>
<p>The Hale-Stabler result thus tells us something very surprising about syntax: if languages would instruct their syntax to simply omit all that overt movement nonsense and force their morphology to clearly spell out all the syntactic features of each lexical item, parsing would be easy-peasy. So why the heck isn’t there even a single one that does that? Languages are happy to fold numerous arcane distinctions into their morphology, but the one thing that would really help is a no-go? Talk about messed up priorities.</p>
<h2 id="and-the-questions-it-raises">…and the questions it raises</h2>
<p>I like the Hale-Stabler finding because it subverts the standard narrative: movement is something syntax gets for free, and since it is free, all languages make abundant use of it. If one buys into the notion that syntax is shaped by third-factor considerations <span class="citation" data-cites="Chomsky05a">(Chomsky 2005)</span>, then parsing complexity should push us towards a movement-free syntax because that is what allows us to stay within the realm of deterministic CFLs.</p>
<p>There’s many ways to poke holes into this argument. One is to simply shrug off that whole third-factors idea and keep it at “language is just as messy as any other biological system”. Or we can go full Chomsky and contend that parsing considerations are irrelevant because language developed for internal reasoning, not communication, and nobody needs to parse their own thoughts. All of that might well be true, but when given a choice between a likely truth and productive questions, the latter is often the better route to take.</p>
<p>A more interesting argument posits that the split between deterministic CFLs and non-deterministic CFLs with respect to parsing performance abstracts away from any potential memory limitations. It won’t shock you to hear that memory limitations are indeed a major factor in human sentence processing. So if movement allows you to shift that heavy NP to the right and thus reduce your parser’s memory load, you will gladly do that. Basically: who cares about theoretical parsing performance without movement when in practice things work better with movement?</p>
<p>One could also object that the Hale-Stabler result only holds if all features are directly encoded by morphology, and since they’re not, the whole point is moot. After all, if your morphology doesn’t allow you to get a deterministic CFL with or without movement, you might just as well keep movement in syntax. No point in dropping movement if you don’t stand to gain anything from it because of that obtuse morphology module.</p>
<p>I think those are good points, but they’re not fully convincing. Many cases of movement actually make things worse with respect to memory usage. If our current model of MG parsing is on the right track, then languages could greatly reduce memory usage if unaccusative subjects stayed in their underlying object position. In fact, objects should never appear to the left of their selecting head. What more, the arguments of a head should be put in increasing order of size. Simply doesn’t happen. If movement’s <em>raison d’être</em> is to reduce memory usage, it’s asleep behind the wheel most of the time.</p>
<p>And that point about morphology being uncooperative isn’t the whole story, either. Many movement features have some overt realization, even if it isn’t directly part of morphology. Wh-movers are headed by a wh-phrase, topicalization comes with a specific prosody, and so on. We might not get a nice agglutinative system where each feature corresponds to a separate affix, but it’s not like there’s no reflexes at all.</p>
<p>The general upshot is that this isn’t really a conceptual issue, it’s an empirical one. There aren’t any conclusive answers yet because we haven’t really looked at those issues in detail:</p>
<ol type="1">
<li><p><strong>Feature visibility</strong><br />
What kind of information about a lexical item’s feature make-up is conveyed by morphology and/or prosody? What ratio of its feature make-up is encoded overtly? How much does this vary across languages, and if so, by how much? Does this ratio interact with what kind of movement configurations the language allows, or at least their relative frequency? Can any of this be explained in generative models of morphology or prosody? Does any of this contradict the inverted T-model?</p></li>
<li><p><strong>Assistive movement</strong><br />
Which instances of movement reduce memory usage? Which increase it? What kinds of movement (don’t) exist even though they worsen (ameliorate) memory usage? Why do they (not) exist?</p></li>
</ol>
<p>Those are interesting questions in their own right, but it’s the connection to parsing and the role of movement that truly make them intriguing to me. We have a formal argument that clearly pulls in the direction of movement-free syntax. We have some conceptual counterarguments to that, and interesting data points, but nothing fully worked out, no formal model with clear predictions. We should have a model that lets us plug in the various constraints for and against movement to compute a space of movement systems, with peaks corresponding to optimal solutions and valleys to really bad solutions. When we map natural languages into that space, they should tend to cluster around the peaks. That doesn’t have anything to do with parsing, but it’s the interaction of movement and parsing that gives rise to the issue in the first place.</p>
<h2 id="references" class="unnumbered">References</h2>
<div id="refs" class="references">
<div id="ref-Chomsky05a">
<p>Chomsky, Noam. 2005. Three factors in language design. <em>Linguistic Inquiry</em> 36.1–22. doi:<a href="https://doi.org/10.1162/0024389052993655">10.1162/0024389052993655</a>. <a href="http://dx.doi.org/10.1162/0024389052993655">http://dx.doi.org/10.1162/0024389052993655</a>.</p>
</div>
<div id="ref-HaleStabler05">
<p>Hale, John T., and Edward P. Stabler. 2005. Strict deterministic aspects of Minimalist grammars. (Ed. by.) Philippe Blache, Edward P. Stabler, Joan Busquets, and Richard Moot. <em>Logical aspects of computational linguistics: 5th international conference</em>. Berlin, Heidelberg: Springer. doi:<a href="https://doi.org/10.1007/11422532_11">10.1007/11422532_11</a>. <a href="http://dx.doi.org/10.1007/11422532_11">http://dx.doi.org/10.1007/11422532_11</a>.</p>
</div>
</div>
MR movement: Freezing effects & monotonicity2020-05-19T00:00:00-04:002020-05-19T00:00:00-04:00Thomas Graftag:outde.xyz,2020-05-19:/2020-05-19/mr-movement-freezing-effects-monotonicity.html<p>As you might know, I love reanalyzing linguistic phenomena in terms of monotonicity (see <a href="https://outde.xyz/2019-05-31/omnivorous-number-and-kiowa-inverse-marking-monotonicity-trumps-features.html">this earlier post</a>, <a href="http://dx.doi.org/10.15398/jlm.v7i2.211">my JLM paper</a>, and <a href="https://github.com/somoradi/somoradi/blob/master/nels49_Moradi.pdf">this NELS paper by my student Sophie Moradi</a>). I’m now in the middle of writing another paper on this topic, and it currently includes a section on freezing effects. You see, freezing effects are obviously just bog-standard monotonicity, and I’m shocked that nobody else has pointed that out before. But perhaps the reason nobody’s pointed that out before is simple: my understanding of freezing effects does not match the facts. In the middle of writing the paper, I realized that I don’t know just how much freezing effects limit movement. So I figured I’d reveal my ignorance to the world and hopefully crowd source some sorely needed insight. </p>
<p>As you might know, I love reanalyzing linguistic phenomena in terms of monotonicity (see <a href="https://outde.xyz/2019-05-31/omnivorous-number-and-kiowa-inverse-marking-monotonicity-trumps-features.html">this earlier post</a>, <a href="http://dx.doi.org/10.15398/jlm.v7i2.211">my JLM paper</a>, and <a href="https://github.com/somoradi/somoradi/blob/master/nels49_Moradi.pdf">this NELS paper by my student Sophie Moradi</a>). I’m now in the middle of writing another paper on this topic, and it currently includes a section on freezing effects. You see, freezing effects are obviously just bog-standard monotonicity, and I’m shocked that nobody else has pointed that out before. But perhaps the reason nobody’s pointed that out before is simple: my understanding of freezing effects does not match the facts. In the middle of writing the paper, I realized that I don’t know just how much freezing effects limit movement. So I figured I’d reveal my ignorance to the world and hopefully crowd source some sorely needed insight. </p>
<h1 id="freezing-effects-primer">Freezing effects primer</h1>
<p>Freezing is the idea that once a phrase starts moving, it becomes opaque to extraction. Below you have a prototypical example of a sentence that violates the freezing condition — to keep things readable, I’m using copies instead of traces, but that’s just a descriptive device.</p>
<ol class="example" type="1">
<li>* [<sub>CP</sub> [which car] did [<sub>TP</sub> [the driver of <del>which car</del>] T [<sub><em>v</em>P</sub> <del>the driver of which car</del> <em>v</em> cause a scandal]]]</li>
</ol>
<p>Here the subject DP <em>the driver of which car</em> undergoes movement from the base subject position Spec,<em>v</em>P to the surface subject position in Spec,TP. As a result, the DP effectively turns into an island, which makes it impossible to move the wh-phrase <em>which car</em> from within the subject into Spec,CP. That’s the essence of freezing, and it can be summarized in the form of a catchy slogan:</p>
<ol start="2" class="example" type="1">
<li><strong>Freezing in a nutshell</strong><br />
Once you’ve escaped, nothing escapes from you.</li>
</ol>
<p>Freezing is the Citizen Kane of movement: a free-spirited phrase that is eager to move finally achieves success but is corrupted by it and now uses its power to keep down all the other free-spirited phrases in its domain that would like to move.</p>
<p>Freezing has a well-known loophole: since a phrase P isn’t opaque to extraction until it starts moving, other movers can escape from P as long as they do so before P moves. This still allows for instances of remnant movement as in the German example below.</p>
<ol start="3" class="example" type="1">
<li>[<sub>CP</sub> [<sub>VP</sub> <del>das Buch</del> gelesen] hat [<sub>TP</sub> das Buch der Hans T [<sub><em>v</em>P</sub> <del>der Hans</del> <em>v</em> <del>[<sub>VP</sub> das Buch gelesen]</del>]]]<br />
[<sub>CP</sub> [<sub>VP</sub> <del>the book</del> read] has [<sub>TP</sub> the book the Hans T [<sub><em>v</em>P</sub> <del>the Hans</del> <em>v</em> <del>[<sub>VP</sub> the book read]</del>]]]<br />
`Hans <strong>read</strong> the book.’</li>
</ol>
<p>Yeah, unless you’re already familiar with the analysis, this example is a lot harder to make sense of (2). Let’s switch out the German for English glosses, just to make things a bit easier. Then the sentence starts out with the structure [<sub><em>v</em>P</sub> the Hans <em>v</em> [<sub>VP</sub> the book read]], where <em>the book</em> is the object of the finite verb <em>read</em> and <em>the Hans</em> is the subject. At this point, <em>the Hans</em> undergoes the usual subject movement to Spec,TP. Then, the object <em>the book</em> moves out of the VP into some part of what’s called the <em>Mittelfeld</em>, which may be some kind of TP-specifier position. Both movement steps are allowed because neither phrase was extracted from a moving phrase. Now, finally, the whole VP moves to Spec,CP. This, too, is a licit step — freezing effects do not say that you cannot move once something has moved out of you, they say that nothing can move out of you once you start moving. And that’s definitely not the case here, nothing moves out of the VP once the VP starts moving. So the whole VP gets to move to the left edge of the sentence without any issues. Since the object had already moved out of the VP before, only the head of the VP is visible at the left edge of the surface string, giving us a sentence where it looks like just the V-head underwent movement.</p>
<p>If you’re still confused, here’s the bare phrase structure trees for (2) and (3).</p>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/bpstree_eng.svg" alt="Bare phrase structure tree for (2)" /><figcaption>Bare phrase structure tree for (2)</figcaption>
</figure>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/bpstree_ger.svg" alt="Bare phrase structure tree for (3)" /><figcaption>Bare phrase structure tree for (3)</figcaption>
</figure>
<h1 id="connection-to-monotonicity">Connection to monotonicity</h1>
<p>For the two examples above, there is a straight-forward account in terms of monotonicity. Remember that monotonicity is an order preservation principle (<a href="https://outde.xyz/2019-05-31/omnivorous-number-and-kiowa-inverse-marking-monotonicity-trumps-features.html">check this earlier post for details</a>). Given two structures <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> with orders <span class="math inline">\(\leq_A\)</span> and <span class="math inline">\(\leq_B\)</span>, a function <span class="math inline">\(f\)</span> from <span class="math inline">\(A\)</span> to <span class="math inline">\(B\)</span> is monotonically increasing iff <span class="math inline">\(x \leq_A y\)</span> implies <span class="math inline">\(f(x) \leq_B f(y)\)</span>. For our purposes, it will be sufficient to think of monotonicity as a generalized ban against crossing branches.</p>
<p>We can apply the notion of monotonicity directly to the dependency tree representation provided by Minimalist grammars (MGs). In this format, the phrase structure trees above are represented as the trees below, except that I have simplified things a bit by omitting all features and instead indicating movement dependencies via arrows.</p>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/deptree_eng.svg" alt="MG dependency tree for (2)" /><figcaption>MG dependency tree for (2)</figcaption>
</figure>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/deptree_ger.svg" alt="MG dependency tree for (3)" /><figcaption>MG dependency tree for (3)</figcaption>
</figure>
<p>Each dependency tree defines a partial order over the lexical items in the sentence. Intuitively, this partial order encodes syntactic prominence in terms of head-argument relations, or in Minimalist terms, (external) Merge. That is to say, if X is the daughter of Y, then Y is more prominent than X, and so is the mother of Y, and the mother of the mother of Y, and so on. Okay, so our first order for monotonicity comes straight from the MG dependency trees. Strictly speaking there’s some extra steps to be taken for mathematical reasons, but I’ll ignore those here to keep things simple. So MG dependency trees will be our way of getting a partial order that I call the <strong>Merge order</strong>.</p>
<p>For our second order we construct a truncated version of the dependency trees that encodes prominence with respect to movement (internal Merge). The construction is a bit more complicated, but putting aside some edge cases it’s enough to take the dependency tree and remove all lexical items that don’t provide the landing site for some mover. This gives us the reduced structures below. I’ll call orders of this kind <strong>Move orders</strong>.<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a></p>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/movetree_eng.svg" alt="Move order for (2)" /><figcaption>Move order for (2)</figcaption>
</figure>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/movetree_ger.svg" alt="Move order for (3)" /><figcaption>Move order for (3)</figcaption>
</figure>
<p>Now we define a mapping <em>f</em> from the Move order to the Merge order such that each node M in the move order is mapped to the node N in the Merge order iff M provides the final landing site for N. Again it helps to look at this in terms of pictures. As you can see, <em>f</em> essentially encodes the reverse of the arrows I added to the original dependency trees.</p>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/mapping_eng.svg" alt="Mapping for (2)" /><figcaption>Mapping for (2)</figcaption>
</figure>
<figure>
<img src="https://outde.xyz/img/thomas/monotonicity_freezing/mapping_ger.svg" alt="Mapping for (3)" /><figcaption>Mapping for (3)</figcaption>
</figure>
<p>Notice how the lines for the verb and the object cross in the illicit English sentence, but not in the well-formed German one (the crossing of lines with the subject is just an artefact of how we draw relations in two-dimensional space). So perhaps crossing branches aren’t okay, and since monotonicity is essentially a ban against crossing branches, that would suggest that the problem with the English sentence is that it does not obey monotonicity. Freezing effects, then, amount to the requirement that a sentence’s Move order must preserve its Merge order. The only permitted form of movement is <strong>m</strong>onotonicity <strong>r</strong>especting movement, or simply MR movement.</p>
<h1 id="why-this-might-not-work">Why this might not work</h1>
<p>Alright, that’s a nifty story, but it might not actually work. MR movement is both more limited and more permissive than freezing effects. And I’m not sure if that’s a problem.</p>
<p>Let’s first look at why MR movement is more restrictive. Freezing effects tell us that once N has been extracted from M, it is free to move to wherever it pleases. MR movement, on the other hand, can never move N to a position that’s higher than the final landing site of M. Does this ever happen? I’m not sure. German certainly furnishes cases that look like that.</p>
<ol start="4" class="example" type="1">
<li>[<sub>CP</sub> [<sub>DP</sub> das Buch] hat [<sub>TP</sub> [<sub>VP</sub> <del>das Buch</del> gelesen] <del>[<sub>DP</sub> das Buch]</del> der Hans T [<sub><em>v</em>P</sub> <del>der Hans</del> <em>v</em> <del>[<sub>VP</sub> das Buch gelesen]</del>]]]<br />
[<sub>CP</sub> [<sub>DP</sub> the book] has [<sub>TP</sub> [<sub>VP</sub> <del>the book</del> read] <del>[<sub>DP</sub> the book]</del> the Hans T [<sub><em>v</em>P</sub> <del>the Hans</del> <em>v</em> <del>[<sub>VP</sub> the book read]</del>]]]<br />
`The book, Hans read.’</li>
</ol>
<p>But to be frank, German is a bad example to begin with because scrambling can do all kinds of stuff that won’t fly for standard movement. I can’t think of cases for other languages, but I’m also pretty bad at remembering data points, so that’s not saying much. So, yes, MR movement might be too restrictive if it is stated with respect to the final landing site.</p>
<p>One way to fix that is to redefine the Move order so that it keeps track of the first landing sites instead of the final ones. But for some reason I find that more ad-hoc. It should either be all landing sites, or the last one, there is no reason why the first one should enjoy some privileged status. But that’s neither here nor there, so I don’t know, maybe? Nah, I’d rather stick to my guns and reanalyze data that conflicts with MR movement.</p>
<p>But then there’s also the fact that MR movement is less restrictive. Once again it’s because I chose to focus on the final landing site instead of the first one. This means that MR movement can extract N from M after M has already started movement, provided that M eventually winds up in a higher position than N. Again I’m not sure if that’s a problem. Whenever such a case arises, one could also make it fit with freezing movement by simply positing an additional movement step that extracts N at the very beginning before M starts to move. Testing for the presence or absence of this initial movement step would be hard, so I’m not sure how things would pan out empirically. Again I’m inclined to stick with MR movement simply because it provides a different perspective on freezing effects. Maybe MR movement works, maybe it doesn’t, but either result would provide useful insights into the nature of freezing effects.</p>
<h1 id="the-crowd-sourcing-part">The crowd sourcing part</h1>
<p>Overall, freezing effects can be regarded as an instance of monotonicity, just not in the way I prefer. I define the move order in terms of the final landing site, but to get an exact match for the standard definition of freezing one has to use the initial landing site. That’s still noteworthy as it allows us to reduce freezing to the more general principle of monotonicity, and I have argued many times that monotonicity really has a fundamental role to play in language.</p>
<p>But I’d really like to push for the MR movement perspective instead. I just find it more pleasing, and I like that it differs from the standard view of freezing on some edge cases. So what do you think? Does MR movement have a shot, or is there robust evidence against it?</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>While the Move orders in these examples are linear orders, more complex examples would produces partial orders. An example of that is <em>John slept and Mary snored</em>.<a href="#fnref1" class="footnote-back">↩</a></p></li>
</ol>
</section>
Martian substructures2020-05-06T00:00:00-04:002020-05-06T00:00:00-04:00Thomas Graftag:outde.xyz,2020-05-06:/2020-05-06/martian-substructures.html<p>Sometimes students get hung up on the difference between <strong>substring</strong> and <strong>subsequence</strong>. But the works of Edgar Rice Burroughs have given me an idea for an exercise that might just be silly enough to permanently edge itself into students’ memory. </p>
<p>Sometimes students get hung up on the difference between <strong>substring</strong> and <strong>subsequence</strong>. But the works of Edgar Rice Burroughs have given me an idea for an exercise that might just be silly enough to permanently edge itself into students’ memory. </p>
<p>Enter John Carter, Jeddak of Jeddaks, Warlord of Mars, depicted here with his wife, Dejah Thoris, Princess of Helium, daughter of Mors Kajak, who is Jed of Helium and son to Tardos Mors, Jeddak of Helium.</p>
<figure>
<img src="https://vignette.wikia.nocookie.net/barsoom/images/2/26/Frazetta_PoM.jpg/revision/latest?cb=20090528221427" alt="John Carter and Dejah Thoris, as portrayed by Frank Frazetta" /><figcaption>John Carter and Dejah Thoris, as portrayed by Frank Frazetta</figcaption>
</figure>
<p>And here we have two of John Carter’s most loyal friends. First, Tars Tarkas, Jeddak of Thark.</p>
<figure>
<img src="https://vignette.wikia.nocookie.net/barsoom/images/a/ac/Tars_Tarkas_and_John_Carter.jpg/revision/latest?cb=20101128213452" alt="Tars Tarkas helps John Carter in his fight against the Plant Men of the Valley Dor, at the end of the River Iss; art by Michael Whelan" /><figcaption>Tars Tarkas helps John Carter in his fight against the Plant Men of the Valley Dor, at the end of the River Iss; art by Michael Whelan</figcaption>
</figure>
<p>And next Kantos Kan, Jedwar of the Heliumetic navy.</p>
<figure>
<img src="https://vignette.wikia.nocookie.net/barsoom/images/a/a2/Kantos-Kan.jpg/revision/latest?cb=20120713232546" alt="Kantos Kan, in a Barsoomian painting of unmatched verisimilitude" /><figcaption>Kantos Kan, in a Barsoomian painting of unmatched verisimilitude</figcaption>
</figure>
<p>Kantos Kan has the rare distinction of having a last name that is a <strong>substring</strong> of the first name. After all, <strong>Kan</strong><em>tos</em> = <strong>Kan</strong> + <em>tos</em>.</p>
<p>That’s not the case for Tars Tarkas. While the first name and last name share a common <strong>prefix</strong> <em>Tar</em>, there is no string that we can put before or after one of those strings to get the other one. But Tars is a <strong>subsequence</strong> of Tarkas, as the latter is <strong>Tar</strong> + <em>ka</em> + <strong>s</strong>. We can build Tarkas from Tars by splicing in additional material.</p>
<p>So remember, substrings are <strong>Kantos Kan strings</strong>, and subsequences are <strong>Tars Tarkas sequences</strong>. Okay, your turn. For each one of the following pairs, say whether one is</p>
<ul>
<li>a Kantos Kan string of the other,</li>
<li>a Tars Tarkas sequence of the other,</li>
<li>neither.</li>
</ul>
<ol type="1">
<li><em>Jed</em> and <em>Jeddak</em></li>
<li><em>Tars</em> and <em>Thark</em></li>
<li><em>Jeddak</em> and <em>Jedwar</em></li>
<li><em>Thoris</em> and <em>Tardos Mors</em></li>
<li><em>Tars</em> and <em>Tardos Mors</em></li>
<li><em>Dejah Thoris</em> and <em>Dor</em></li>
</ol>
<p><strong>Bonus exercise</strong>: Explain why I didn’t allow a fourth option “both” in the exercise.</p>
<p><strong>Bonus bonus exercise</strong>: Design similar exercises for other entries from <a href="https://goodman-games.com/blog/2018/03/26/what-is-appendix-n/">Appendix N</a>.</p>
Categorical statements about gradience2020-04-28T00:00:00-04:002020-04-28T00:00:00-04:00Thomas Graftag:outde.xyz,2020-04-28:/2020-04-28/categorical-statements-about-gradience.html<p>Omer has a <a href="https://omer.lingsite.org/blogpost-on-so-called-degrees-of-grammaticality/">great post on gradience in syntax</a>. I left a comment there that briefly touches on why gradience isn’t really that big of a deal thanks to <strong>monoids</strong> and <strong>semirings</strong>. But in a vacuum that remark might not make a lot of sense, so here’s some more background. </p>
<p>Omer has a <a href="https://omer.lingsite.org/blogpost-on-so-called-degrees-of-grammaticality/">great post on gradience in syntax</a>. I left a comment there that briefly touches on why gradience isn’t really that big of a deal thanks to <strong>monoids</strong> and <strong>semirings</strong>. But in a vacuum that remark might not make a lot of sense, so here’s some more background. </p>
<h2 id="gradience-in-the-broad-sense">Gradience in the broad sense</h2>
<p>My central claim is that linguists’ worries about gradience are overblown because there isn’t that much of a difference between categorical systems, which only distinguish between well-formed and ill-formed, and gradient systems, which have more shades of gray than that. In particular, the difference doesn’t matter for those aspects of grammar that linguists really care about. A grammar with only a categorical distinction isn’t irredeemably impoverished, and if your formalism gets the linguistic fundamentals wrong adding gradience won’t fix that for you.</p>
<p>Brief note: In practice, gradient systems are usually probabilistic, but there’s no need for that. The familiar system of rating sentences as well-formed, <code>?</code>, <code>??</code>, <code>?*</code>, and <code>*</code> would also be gradient. This is an important fact that’s frequently glossed over. I really wish researchers wouldn’t always jump right to probabilistic systems when they want to make something gradient. Sure, probabilities are nice because they are easy to extract from the available data, but that doesn’t mean that this is the right notion of gradience.</p>
<p>That said, this post will frequently use probabilistic grammars to illustrate more general points about gradience. The take-home message, though, applies equally to all gradient systems, whether they’re probabilistic or not.</p>
<h2 id="a-formula-for-categorical-grammars">A formula for categorical grammars</h2>
<p>Let’s start with a very simple example in the form of a <a href="https://outde.xyz/2019-08-19/the-subregular-locality-zoo-sl-and-tsl.html">strictly local grammar</a>. SL grammars are usually negative, which means that they list all the <em>n</em>-grams that must not occur in a string. But for the purposes of this post, it is preferable to convert the negative grammar into an equivalent positive grammar, which lists all the <em>n</em>-grams that may occur in a string. For example, the positive SL-2 grammar <span class="math inline">\(G\)</span> below generates the language <span class="math inline">\((ab)^+\)</span>, which contains the strings <span class="math inline">\(\mathit{ab}\)</span>, <span class="math inline">\(\mathit{abab}\)</span>, <span class="math inline">\(\mathit{ababab}\)</span>, and so on.</p>
<ol class="example" type="1">
<li><strong>Positive SL-2 grammar for <span class="math inline">\(\mathbf{(ab)^*}\)</span></strong>
<ol type="1">
<li><span class="math inline">\(\mathit{\$a}\)</span>: the string may start with <span class="math inline">\(a\)</span></li>
<li><span class="math inline">\(\mathit{ab}\)</span>: <span class="math inline">\(a\)</span> may be followed by <span class="math inline">\(b\)</span></li>
<li><span class="math inline">\(\mathit{ba}\)</span>: <span class="math inline">\(b\)</span> may be followed by <span class="math inline">\(a\)</span></li>
<li><span class="math inline">\(\mathit{b\$}\)</span>: the string may end with <span class="math inline">\(b\)</span></li>
</ol></li>
</ol>
<p>Now let’s consider how one actually decides whether a given string is well-formed with respect to this grammar. There’s many equivalent ways of thinking about this, but right now we want one that emphasizes the algebraic nature of grammars.</p>
<p>Suppose we are given the string <span class="math inline">\(\mathit{abab}\)</span>. As always with an SL grammar, we first add edge markers to it, giving us <span class="math inline">\(\mathit{\$abab\$}\)</span>. That’s just a mathematical trick to clearly distinguish the first and last symbol of the string. The SL grammar decides the well-formedness of the string <span class="math inline">\(\mathit{\$abab\$}\)</span> based on whether the bigrams that occur in it are well-formed. Those bigrams are (including repetitions)</p>
<ol type="1">
<li><span class="math inline">\(\mathit{\$a}\)</span>,</li>
<li><span class="math inline">\(\mathit{ab}\)</span>,</li>
<li><span class="math inline">\(\mathit{ba}\)</span>,</li>
<li><span class="math inline">\(\mathit{ab}\)</span>,</li>
<li><span class="math inline">\(\mathit{b\$}\)</span>.</li>
</ol>
<p>We can write this as a single formula that doesn’t make a lick of sense at this point:</p>
<p><span class="math display">\[G(\mathit{\$abab\$}) := f(\$a) \otimes f(ab) \otimes f(ba) \otimes f(ab) \otimes f(b\$)\]</span></p>
<p>It sure looks fancy, but I haven’t really done anything substantial here. Let’s break this formula down into its components:</p>
<ul>
<li><span class="math inline">\(G(\mathit{\$abab\$})\)</span> is the value that the grammar <span class="math inline">\(G\)</span> assigns to the string <span class="math inline">\(\mathit{\$abab\$}\)</span>. Since <span class="math inline">\(G\)</span> is categorical, this can be <span class="math inline">\(1\)</span> for <em>well-formed</em> or <span class="math inline">\(0\)</span> for <em>ill-formed</em>.</li>
<li><span class="math inline">\(:=\)</span> means “is defined as”.</li>
<li><span class="math inline">\(f\)</span> is some mystery function that maps each bigram to some value.</li>
<li><span class="math inline">\(\otimes\)</span> is some mystery operation that combines the values produced by <span class="math inline">\(f\)</span>.</li>
</ul>
<p>The formula expresses in mathematical terms the most fundamental rule of SL grammars: the value that <span class="math inline">\(G\)</span> assigns to <span class="math inline">\(\mathit(\$abab\$)\)</span> depends on the bigrams that occur in the string. Each bigram in the string is mapped to some value, and then all these values are combined into an aggregate value for the string. The only reason the formula looks weird is because I haven’t told you what <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span> are.</p>
<p>The cool thing is, <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span> can be lots of things. That’s exactly what will allow us to unify categorical and gradient grammars. But let’s not get ahead of ourselves, let’s just focus on <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span> for our categorical example grammar <span class="math inline">\(G\)</span>.</p>
<p>We start with <span class="math inline">\(f\)</span>. This function maps a bigram <span class="math inline">\(b\)</span> to <span class="math inline">\(1\)</span> if it is a licit bigram according to our grammar <span class="math inline">\(G\)</span>. If <span class="math inline">\(b\)</span> is not a licit bigram, <span class="math inline">\(f\)</span> maps it to <span class="math inline">\(0\)</span>.</p>
<p><span class="math display">\[
f(b) :=
\begin{cases}
1 & \text{if } b \text{ is a licit bigram of } G\\
0 & \text{otherwise}
\end{cases}
\]</span></p>
<p>Let’s go back to the formula above and fill in the corresponding values according to <span class="math inline">\(f\)</span> and <span class="math inline">\(G\)</span>.</p>
<p><span class="math display">\[
\begin{align*}
G(\mathit{\$abab\$}) := & f(\$a) \otimes f(ab) \otimes f(ba) \otimes f(ab) \otimes f(b\$)\\
= & 1 \otimes 1 \otimes 1 \otimes 1 \otimes 1\\
\end{align*}
\]</span></p>
<p>Compare this to the formula for the illicit string <span class="math inline">\(\mathit{\$abba\$}\)</span>.</p>
<p><span class="math display">\[
\begin{align*}
G(\mathit{\$abba\$}) := & f(\$a) \otimes f(ab) \otimes f(bb) \otimes f(ba) \otimes f(a\$)\\
= & 1 \otimes 1 \otimes 0 \otimes 1 \otimes 0\\
\end{align*}
\]</span></p>
<p>Notice how we get <span class="math inline">\(1\)</span> or <span class="math inline">\(0\)</span> depending on whether the bigram is licit according to grammar <span class="math inline">\(G\)</span>.</p>
<p>This only leaves us with <span class="math inline">\(\otimes\)</span>. The job of this operation is to combine the values produced by <span class="math inline">\(f\)</span> such that we get <span class="math inline">\(1\)</span> if the string is well-formed, and <span class="math inline">\(0\)</span> otherwise. A string is well-formed iff it does not contain even one illicit bigram, or equivalently, iff there isn’t a single bigram that was mapped to <span class="math inline">\(0\)</span> by <span class="math inline">\(f\)</span>. If there is even one <span class="math inline">\(0\)</span>, the whole aggregate value must be <span class="math inline">\(0\)</span>. We can replace <span class="math inline">\(\otimes\)</span> with any operation that satisfies this property — multiplication, for instance, will do just fine.</p>
<p><span class="math display">\[
\begin{align*}
G(\mathit{\$abab\$}) := & f(\$a) \otimes f(ab) \otimes f(ba) \otimes f(ab) \otimes f(b\$)\\
= & 1 \otimes 1 \otimes 1 \otimes 1 \otimes 1\\
= & 1 \times 1 \times 1 \times 1 \times 1\\
= & 1\\
\end{align*}
\]</span> <span class="math display">\[
\begin{align*}
G(\mathit{\$abba\$}) := & f(\$a) \otimes f(ab) \otimes f(bb) \otimes f(ba) \otimes f(a\$)\\
= & 1 \otimes 1 \otimes 0 \otimes 1 \otimes 0\\
= & 1 \times 1 \times 0 \times 1 \times 0\\
= & 0\\
\end{align*}
\]</span></p>
<p>Tada, the well-formed string gets a 1, the ill-formed string a 0, just as intended. Any string that contains at least one illicit bigram will be mapped to 0 because whenever you multiply by 0, you get 0. The only way for a string to get mapped to 1 is if only consists of well-formed bigrams. This is exactly the intuition we started out with: the well-formedness of a string is contingent on the well-formedness of its parts; in this case, bigrams.</p>
<h2 id="a-formula-for-gradient-grammars">A formula for gradient grammars</h2>
<p>While it’s certainly refreshing to think of a grammar as a device for multiplying <span class="math inline">\(1\)</span>s and <span class="math inline">\(0\)</span>s, there is a deeper purpose to this view. Here’s the crucial twist: the formula above also works for gradient SL grammars, we just have to change <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span>. If we use probabilities, we can even keep <span class="math inline">\(\otimes\)</span> the same. The math works exactly the same for categorical and probabilistic grammars.</p>
<p>First, let’s turn our categorical example grammar into a probabilistic one by assigning each bigram a probability. I’ll use arbitrary numbers here, in the real world those probabilities would usually come from a corpus.</p>
<ol start="2" class="example" type="1">
<li><strong>Probabilistic SL-2 grammar for <span class="math inline">\(\mathbf{(ab)^*}\)</span></strong>
<ol type="1">
<li><span class="math inline">\(\mathit{\$a}\)</span>: the probability that a string starts with <span class="math inline">\(a\)</span> is 100%</li>
<li><span class="math inline">\(\mathit{ab}\)</span>: the probability that <span class="math inline">\(a\)</span> is followed by <span class="math inline">\(b\)</span> is 100%</li>
<li><span class="math inline">\(\mathit{ba}\)</span>: the probability that <span class="math inline">\(b\)</span> is followed by <span class="math inline">\(a\)</span> is 75%</li>
<li><span class="math inline">\(\mathit{b\$}\)</span>: the probability that <span class="math inline">\(b\)</span> is not followed by anything is 25%</li>
</ol></li>
</ol>
<p>Now that the grammar is probabilistic, we also have to change our formula. Except that we don’t! We keep everything the way it is and only interpret <span class="math inline">\(f\)</span> differently. The function <span class="math inline">\(f\)</span> no longer tells us whether a bigram is licit, it instead gives us the probability of the bigram according to <span class="math inline">\(G\)</span>. The probability for bigrams that aren’t listed in the grammar is set to <span class="math inline">\(0\)</span>.</p>
<p><span class="math display">\[
\begin{align*}
G(\mathit{\$abab\$}) := & f(\$a) \otimes f(ab) \otimes f(ba) \otimes f(ab) \otimes f(b\$)\\
= & 1 \otimes 1 \otimes .75 \otimes 1 \otimes .25\\
= & 1 \times 1 \times .75 \times 1 \times .25\\
= & .1875\\
\end{align*}
\]</span> <span class="math display">\[
\begin{align*}
G(\mathit{\$abba\$}) := & f(\$a) \otimes f(ab) \otimes f(bb) \otimes f(ba) \otimes f(a\$)\\
= & 1 \otimes 1 \otimes 0 \otimes .7 \otimes 0\\
= & 1 \times 1 \times 0 \times .7 \times 0\\
= & 0\\
\end{align*}
\]</span></p>
<p>Compare that to the formula we had for the categorical grammar — it’s exactly the same mechanism! Nothing here has changed except the values. The value of the whole is still computed from the values of the same parts.</p>
<h2 id="a-trivalent-sl-grammar">A trivalent SL grammar</h2>
<p>What if we want to do a trivalent system, with well-formed, borderline, and ill-formed? Let’s modify our categorical grammar so that it marginally allows <span class="math inline">\(\mathit{bb}\)</span>.</p>
<ol start="3" class="example" type="1">
<li><strong>Trivalent SL-2 grammar for <span class="math inline">\(\mathbf{(ab)^*}\)</span></strong>
<ol type="1">
<li><span class="math inline">\(\mathit{\$a}\)</span>: the string may start with <span class="math inline">\(a\)</span></li>
<li><span class="math inline">\(\mathit{ab}\)</span>: <span class="math inline">\(a\)</span> may be followed by <span class="math inline">\(b\)</span></li>
<li><span class="math inline">\(\mathit{ba}\)</span>: <span class="math inline">\(b\)</span> may be followed by <span class="math inline">\(a\)</span></li>
<li><span class="math inline">\(\mathit{b\$}\)</span>: the string may end with <span class="math inline">\(b\)</span></li>
<li><span class="math inline">\(\mathit{bb}\)</span>: <span class="math inline">\(b\)</span> may be marginally followed by <span class="math inline">\(b\)</span></li>
</ol></li>
</ol>
<p>The corresponding formula once again will stay the same. But instead of <span class="math inline">\(0\)</span> and <span class="math inline">\(1\)</span>, we will use three values:</p>
<ul>
<li><span class="math inline">\(1\)</span>: well-formed</li>
<li><span class="math inline">\(?\)</span>: borderline</li>
<li><span class="math inline">\(*\)</span>: ill-formed</li>
</ul>
<p>Instead of multiplication, <span class="math inline">\(\otimes\)</span> is now an operation <span class="math inline">\(\mathrm{min}\)</span> that always returns the least licit value, as specified in the table below.</p>
<table>
<thead>
<tr class="header">
<th style="text-align: right;"><span class="math inline">\(\mathrm{min}\)</span></th>
<th style="text-align: center;"><span class="math inline">\(\mathbf{1}\)</span></th>
<th style="text-align: center;"><span class="math inline">\(\mathbf{?}\)</span></th>
<th style="text-align: center;"><span class="math inline">\(\mathbf{*}\)</span></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: right;"><span class="math inline">\(\mathbf{1}\)</span></td>
<td style="text-align: center;"><span class="math inline">\(1\)</span></td>
<td style="text-align: center;"><span class="math inline">\(?\)</span></td>
<td style="text-align: center;"><span class="math inline">\(*\)</span></td>
</tr>
<tr class="even">
<td style="text-align: right;"><span class="math inline">\(\mathbf{?}\)</span></td>
<td style="text-align: center;"><span class="math inline">\(?\)</span></td>
<td style="text-align: center;"><span class="math inline">\(?\)</span></td>
<td style="text-align: center;"><span class="math inline">\(*\)</span></td>
</tr>
<tr class="odd">
<td style="text-align: right;"><span class="math inline">\(\mathbf{*}\)</span></td>
<td style="text-align: center;"><span class="math inline">\(*\)</span></td>
<td style="text-align: center;"><span class="math inline">\(*\)</span></td>
<td style="text-align: center;"><span class="math inline">\(*\)</span></td>
</tr>
</tbody>
</table>
<p>And here are the corresponding formulas for our familiar example strings <span class="math inline">\(\mathit{\$abab\$}\)</span> and <span class="math inline">\(\mathit{\$abba\$}\)</span></p>
<p><span class="math display">\[
\begin{align*}
G(\mathit{\$abab\$}) := & f(\$a) \otimes f(ab) \otimes f(ba) \otimes f(ab) \otimes f(b\$)\\
= & 1 \otimes 1 \otimes 1 \otimes 1 \otimes 1\\
= & 1 \mathrel{\mathrm{min}} 1 \mathrel{\mathrm{min}} 1 \mathrel{\mathrm{min}} 1 \mathrel{\mathrm{min}} 1\\
= & 1\\
\end{align*}
\]</span> <span class="math display">\[
\begin{align*}
G(\mathit{\$abba\$}) := & f(\$a) \otimes f(ab) \otimes f(bb) \otimes f(ba) \otimes f(a\$)\\
= & 1 \otimes 1 \otimes ? \otimes 1 \otimes *\\
= & 1 \mathrel{\mathrm{min}} 1 \mathrel{\mathrm{min}} ? \mathrel{\mathrm{min}} 1 \mathrel{\mathrm{min}} *\\
= & *\\
\end{align*}
\]</span></p>
<p>Note how the second string is still considered ill-formed. While the presence of the bigram <span class="math inline">\(\mathit{bb}\)</span> degrades it to borderline status, the presence of the illicit bigram <span class="math inline">\(\mathit{a\$}\)</span> means that we cannot assign a higher value than <span class="math inline">\(*\)</span>.</p>
<h2 id="beyond-acceptability">Beyond acceptability</h2>
<p>We can even use this formula to calculate aspects of the string that have nothing at all to do with well-formedness or acceptability. Suppose that <span class="math inline">\(f\)</span> once again maps each bigram to <span class="math inline">\(1\)</span> or <span class="math inline">\(0\)</span> depending on whether it is licit according to <span class="math inline">\(G\)</span>. Next, we instantiate <span class="math inline">\(\otimes\)</span> as addition. Then we have a formula that calculates the number of licit bigrams in the string.</p>
<p><span class="math display">\[
\begin{align*}
G(\mathit{\$abab\$}) := & f(\$a) \otimes f(ab) \otimes f(ba) \otimes f(ab) \otimes f(b\$)\\
= & 1 \otimes 1 \otimes 1 \otimes 1 \otimes 1\\
= & 1 + 1 + 1 + 1 + 1\\
= & 5\\
\end{align*}
\]</span> <span class="math display">\[
\begin{align*}
G(\mathit{\$abba\$}) := & f(\$a) \otimes f(ab) \otimes f(bb) \otimes f(ba) \otimes f(a\$)\\
= & 1 \otimes 1 \otimes 0 \otimes 1 \otimes 0\\
= & 1 + 1 + 0 + 1 + 0\\
= & 3\\
\end{align*}
\]</span></p>
<p>Or maybe <span class="math inline">\(f\)</span> replaces each bigram <span class="math inline">\(g\)</span> with the singleton set <span class="math inline">\(\{g\}\)</span>. And <span class="math inline">\(\otimes\)</span> will be <span class="math inline">\(\cup\)</span>, the set union operation. Then the formula maps each string to the set of bigrams that occur in it.</p>
<p><span class="math display">\[
\begin{align*}
G(\mathit{\$abab\$}) := & f(\$a) \otimes f(ab) \otimes f(ba) \otimes f(ab) \otimes f(b\$)\\
= & \{\$a\} \otimes \{ab\} \otimes \{ba\} \otimes \{ab\} \otimes \{b\$\}\\
= & \{\$a\} \cup \{ab\} \cup \{ba\} \cup \{ab\} \cup \{b\$\}\\
= & \{\$a, ab, ba, b\$\}\\
\end{align*}
\]</span> <span class="math display">\[
\begin{align*}
G(\mathit{\$abba\$}) := & f(\$a) \otimes f(ab) \otimes f(bb) \otimes f(ba) \otimes f(a\$)\\
= & \{\$a\} \otimes \{ab\} \otimes \{bb\} \otimes \{ba\} \otimes \{a\$\}\\
= & \{\$a\} \cup \{ab\} \cup \{bb\} \cup \{ba\} \cup \{a\$\}\\
= & \{\$a, ab, bb, ba, a\$\}\\
\end{align*}
\]</span></p>
<p>Is there a point to these instantiations of <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span>? They can be useful for certain computational tasks, but from a linguistic perspective there really isn’t much point to them. But, you know what, I’d say the same is true for all the other instantiations we’ve seen so far. If you’re a linguist, you shouldn’t worry at all about how <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span> are instantiated.</p>
<h2 id="grammars-combine-they-dont-calculate">Grammars combine, they don’t calculate</h2>
<p>The general upshot is this: a grammar is a mechanism for determining the values of the whole from values of its parts. The difference between grammars is what parts they look at and how they relate them to each other.</p>
<p>A TSL grammar, for instance, would have a different formula. In a TSL grammar, we ignore irrelevant symbols in the string. So if we have a grammar that cares about <span class="math inline">\(a\)</span> but not <span class="math inline">\(b\)</span>, the corresponding formula for the string <span class="math inline">\(\mathit{abba}\)</span> would be <span class="math inline">\(f(\$a) \otimes f(aa) \otimes f(a\$)\)</span>. This is only a minor change because TSL grammars are very similar to SL grammars. The formula for, say, a finite-state automaton would differ by quite a bit more. That’s what linguistic analysis is all about. Linguistics is about determining the <strong>shape of the formula</strong>!</p>
<p>But that’s not what the categorical VS gradience divide is about. That only kicks in once you have determined the overall shape of the formula and need to define <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span>. And that choice simply isn’t very crucial from a linguistic perspective.</p>
<p>There’s many different choices for <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span> depending on what you want to do. But the choices that are useful for a linguist will always be limited in such way that they form a particular kind of algebraic structure that’s called a <strong>monoid</strong>. I won’t bug you with <a href="https://en.wikipedia.org/wiki/Monoid">the mathematical details of monoids</a>. Whether you prefer a categorical system or a gradient system, rest assured there’s a suitable monoid for that. And that’s all that matters. That’s why linguists shouldn’t worry about the categorical VS gradience divide — linguistic insights are about the overall shape of the formula, not about calculating the result.</p>
<h2 id="from-string-to-trees-semirings">From string to trees: semirings</h2>
<p>Okay, there’s one minor complication that I’d like to cover just to cross all <em>t</em>s and dot all <em>i</em>s. If you’re already worn out, just skip ahead to the wrap-up.</p>
<p>Beyond the pleasant valleys of string land lies the thicket of tree land. In tree land, things can get a bit more complicated depending on what your question is. Not always, though. It really depends on what kind of value you’re trying to compute.</p>
<p>If you just want to know whether a specific tree is well-formed, nothing really changes. Take your standard phrase structure grammar. A rewrite rule of the form <code>S -> NP VP</code> is a tree bigram where the mother is <code>S</code> and the daughters are <code>NP</code> and <code>VP</code>. Just like we can break down a string into its string bigrams, we can break down a tree into its tree bigrams. And the value of the whole tree according to a phrase structure grammar is computed by combining the values of its tree bigrams. With more expressive formalisms like MGs, things are once again more complicated, just like a finite-state automaton uses a more complicated formula in string land than the one for SL grammars above. But the general principle remains the same: once you have a formula for how the parts interact, you can plug in the operators you want. As before, we can switch between gradient and categorical systems by tweaking the values of <span class="math inline">\(f\)</span> and <span class="math inline">\(\otimes\)</span>, under the condition that this still gets us a monoid.</p>
<p>I think this is actually enough for syntax. But perhaps you want to talk about the value of a string, rather than a tree. This is a more complex value because one string can correspond to multiple trees. For instance, in probabilistic syntax the probability of the string</p>
<ol start="4" class="example" type="1">
<li>I eat sushi with edible chopsticks.</li>
</ol>
<p>is the sum of the probabilities of two distinct trees:</p>
<ol start="5" class="example" type="1">
<li>[I eat [sushi with edible chopsticks]]</li>
<li>[I [[eat sushi] [with edible chopsticks]]</li>
</ol>
<p>So <span class="math inline">\(\otimes\)</span> by itself is not enough, there is yet another operation. For probabilistic grammars it’s <span class="math inline">\(+\)</span>, but we may again replace it with a more general mystery operation <span class="math inline">\(\oplus\)</span>. The job of <span class="math inline">\(\oplus\)</span> is to combine all the values computed by <span class="math inline">\(\otimes\)</span>. Like <span class="math inline">\(\otimes\)</span>, <span class="math inline">\(\oplus\)</span> has to yield a monoid of some kind, and the combination of <span class="math inline">\(\oplus\)</span> and <span class="math inline">\(\otimes\)</span> has to form a <strong>semiring</strong>. Again I’ll completely <a href="https://en.wikipedia.org/wiki/Semiring">gloss over the math</a>. Let’s focus only on the essential point: once again the split between categorical systems and gradient systems is not very large because either way we end up with a semiring. The nature of the grammar stays the same, only the system for computing compound values uses different functions and operators.</p>
<p>You might be wondering what a categorical grammar looks like from the semiring perspective. What is the mysterious operation <span class="math inline">\(\oplus\)</span> in that case? It can’t be addition because <span class="math inline">\(1 + 1\)</span> would give us <span class="math inline">\(2\)</span>, which isn’t a possible value in a categorical system. No, with categorical systems, <span class="math inline">\(\oplus\)</span> behaves like logical <em>or</em>: it returns 1 if there is at least one 1. Suppose, then, that we want to know if some string <em>s</em> is well-formed according to some categorical grammar <span class="math inline">\(G\)</span>. Here is how this would work in a very simplified manner:</p>
<ol type="1">
<li>We look at all possible trees that yield the string <em>s</em>, even if those strings are ill-formed according to <span class="math inline">\(G\)</span>.</li>
<li>We use <span class="math inline">\(\otimes\)</span> to compute the compound value for each tree. As before, <span class="math inline">\(\otimes\)</span> is multiplication (but it could also be logical <em>and</em>, if you find that more pleasing). Well-formed tress will evaluate to <span class="math inline">\(1\)</span>, ill-formed ones to <span class="math inline">\(0\)</span>.</li>
<li>We then use <span class="math inline">\(\oplus\)</span>, i.e. logical <em>or</em>, to combine all those compound values into a single value for the string <em>s</em>. Then <em>s</em> will get the value <span class="math inline">\(1\)</span>, and hence be deemed well-formed, iff there is at least one well-formed tree that yields <em>s</em>.</li>
</ol>
<p>Okay, that’s not how we usually think about well-formedness. We view the grammar as a system for specifying a specific set of well-formed trees, rather than a function that maps every logically conceivable tree to some value. But as you hopefully remember from your semantics intro, there is no difference between a set and its characteristic function. The procedure above treats the grammar as the characteristic function of the set of well-formed trees. Most of the time that’s not very illuminating for linguistics, but when it comes to the split between categorical and gradient it is really useful because it reveals the monoid/semiring structure of the grammar formalism.</p>
<h2 id="wrapping-up-dont-worry-be-happy">Wrapping up: Don’t worry, be happy</h2>
<p>Monoids and semirings are a very abstract perspective of grammars, and I rushed through them in a (failed?) attempt to keep the post at a manageable length. But behind all that math is the simple idea that syntacticians, and linguists in general, don’t need to worry that a categorical grammar formalism is somehow irreconcilable with the fact that acceptability judgments are gradient. Even if we don’t factor out gradience as a performance phenomenon, even if we want to place it in the heart of grammar, that does not require us to completely retool our grammar formalisms. The change is largely mathematical in nature and doesn’t touch on the things that linguists care about. Linguists care about representations and how specific parts of those representations can interact. In the mathematical terms I used in this post, those issues are about the shape of the formula for computing <span class="math inline">\(G(o)\)</span> for some object <span class="math inline">\(o\)</span>. It is not about the specific values or operators that appear in the formula.</p>
<p>In many cases, there’s actually many different operators that give the same result. We interpreted <span class="math inline">\(\otimes\)</span> as multiplication for categorical SL grammars, but we could have also used logical <em>and</em> or the <em>min</em> function. They all produce exactly the same values. No linguist would ever worry about which one of those functions is the right choice. The choice between categorical, probabilistic, or some other kind of gradient system isn’t all that different. Again you are needlessly worrying about the correct way of instantiating <span class="math inline">\(f\)</span>, <span class="math inline">\(\otimes\)</span>, and possibly <span class="math inline">\(\oplus\)</span>.</p>
<p>That’s not to say that switching out, say, a categorical semiring for a probabilistic one is a trivial affair. It can create all kinds of problems. But those are mathematical problems, computational problems, they are not linguistic problems. It’s stuff like computing infinite sums of bounded reals. It’s decidedly not a linguistic issue. So don’t worry, be happy.</p>
Just your regular regular expression2020-04-24T00:00:00-04:002020-04-24T00:00:00-04:00Thomas Graftag:outde.xyz,2020-04-24:/2020-04-24/just-your-regular-regular-expression.html<p>Outdex posts can be a dull affair, always obsessed with language and computation (it’s the official blog motto, you know). Today, I will deviate from this with a post that’s obsessed with, wait for it, computation and language. Big difference. Our juicy topic will be regular expressions. And don’t you worry, we’ll get to the “and language” part. </p>
<p>Outdex posts can be a dull affair, always obsessed with language and computation (it’s the official blog motto, you know). Today, I will deviate from this with a post that’s obsessed with, wait for it, computation and language. Big difference. Our juicy topic will be regular expressions. And don’t you worry, we’ll get to the “and language” part. </p>
<h2 id="some-simple-boring-examples">Some simple, boring examples</h2>
<p>If you don’t know what a regular expression is, think of it as search (or search and replace) on steroids. If you work with a lot of text files — surprise, I do — regular expressions can make your life a lot easier, but they also have a nasty habit of turning into byzantine symbol salad that’s impossible to decipher. Allow me to demonstrate. Or maybe skip ahead to the next section, this one here is just a slow introductory build-up to the interesting stuff.</p>
<p>Suppose you want to change every instance of <em>regular expression</em> to the shorter <em>regex</em>. If you’re like me, you will use <code>sed</code> for this, the <strong>s</strong>tream <strong>ed</strong>itor. Here’s the relevant command.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb1-1" title="1"><span class="ex">s/regular</span> expression/regex/g</a></code></pre></div>
<p>Okay, that’s not exactly user-friendly in these days of GUIs and colorful buttons to click on, but it’s manageable. The command breaks down into a few simple components.</p>
<ol type="1">
<li><code>s</code>: substitute</li>
<li><code>/</code>: argument separator</li>
<li><code>regular expression</code>: anything matching this regular expression should be replaced</li>
<li><code>/</code>: argument separator</li>
<li><code>regex</code>: replace the match with this string instead</li>
<li><code>/</code>: argument separator</li>
<li><code>g</code>: do a global replace; that is to say, process the whole line, don’t just stop after the first match on the line</li>
</ol>
<p>Suppose we have the input text below.</p>
<ol class="example" type="1">
<li>A Note on Regular Expressions: Since “regular expression” is a long term, regular expressions are also called regexes.</li>
</ol>
<p>This will be rewritten as follows.</p>
<ol start="2" class="example" type="1">
<li>A Note on Regular Expressions: Since “regex” is a long term, regexs are also called regexes.</li>
</ol>
<p>Note that in either case the rewriting targets every instance of <em>regular expression</em>, even if it is followed by other characters like <em>s</em>. But without <code>g</code>, only the first instance of <em>regular expression</em> would have been replaced.</p>
<ol start="3" class="example" type="1">
<li>A Note on Regular Expressions: Since “regex” is a long term, regular expressions are also called regexes.</li>
</ol>
<p>As you can see in the examples above, capitalization matters, so by default <code>regular expression</code> does not match <code>Regular Expression</code>. We can fix that by specifying alternatives (there’s better ways for handling upper and lower case, but then I wouldn’t get to demonstrate alternatives).</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb2-1" title="1"><span class="ex">s</span>/[<span class="ex">Rr</span>]egular [Ee]xpression/regex/g</a></code></pre></div>
<p>Here <code>[Rr]egular [Ee]expression</code> will match <em>Regular Expression</em>, <em>Regular expression</em>, <em>regular Expression</em>, and <em>regular expression</em>. So now we would get this output</p>
<ol start="4" class="example" type="1">
<li>A note on regexs: Since “regex” is a long term, regexs are also called regexes.</li>
</ol>
<p>But these instances of <em>regexs</em> are pretty ugly. Let’s extend the match so that it can include an optional <em>s</em>.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb3-1" title="1"><span class="ex">s</span>/[<span class="ex">Rr</span>]egular [Ee]xpressions\?/regex/g</a></code></pre></div>
<p>We use <code>\?</code> to indicate that the preceding character is optional for the match. If there is an <em>s</em>, include it in the rewriting, otherwise ignore whatever comes after the <em>n</em>. With this, we get yet another output.</p>
<ol start="5" class="example" type="1">
<li>A note on regex: Since “regex” is a long term, regex are also called regexes.</li>
</ol>
<p>We could play this game for a few more rounds, but I think you get the gist. Now let’s look at how quickly regular expressions can get nasty.</p>
<h2 id="cranking-up-the-weird">Cranking up the weird</h2>
<p>Things have been perfectly reasonable so far. Just to mix things up a bit, here’s a regular expression I use a lot to rewrite things like <code>**foo**</code> as <code><b>foo</b></code> (don’t ask why I need to do that, it’s a quick hack while the long-term solution is still being worked on).</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb4-1" title="1"><span class="ex">s</span>/<span class="dt">\*\*\(</span>[^<span class="ex">*</span>]*<span class="dt">\)\*\*</span>/<span class="op"><</span>b<span class="op">></span>\<span class="op">1<</span>\/b<span class="op">></span>/g</a></code></pre></div>
<p>If you’re curious, here’s how you read that regex:<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a></p>
<ol type="1">
<li><code>s</code>: substitute</li>
<li><code>/</code>: argument separator</li>
<li><code>\*\*</code>: match **</li>
<li><code>\(</code>: start matching group 1</li>
<li><code>[^*]*</code>: match any string of 0 or more characters that are not *</li>
<li><code>\)</code>: close matching group 1</li>
<li><code>\*\*</code>: match **</li>
<li><code>/</code>: argument separator</li>
<li><code><b></code>: insert <b></li>
<li><code>\1</code>: insert the content of matching group 1</li>
<li><code><\/b></code>: insert </b></li>
<li><code>/</code>: argument separator</li>
<li><code>g</code>: do a global replace</li>
</ol>
<p>It actually makes a lot of sense if you come up with it yourself and remind yourself every 5 minutes how it works. In all other cases, it’s a Lovecraftian nightmare that will drive you mad.</p>
<p>But this is just the tip of the iceberg. True regex wizards can do stuff that is so crazy it tears apart the fabric of reality. Did you ever wonder how you can match lines of the form <code>a b c</code> such that <code>a + b = c</code>? Well, <a href="http://www.drregex.com/2018/11/how-to-match-b-c-where-abc-beast-reborn.html?m=1">somebody wrote a <code>sed</code> program for that</a>, because why wouldn’t they? Here’s a part of the very first command:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb5-1" title="1"><span class="kw">(</span><span class="ex">?</span>=[-+]?(?:0\B<span class="kw">)</span><span class="ex">*+</span>(\d*?)<span class="kw">((</span>?:(?=\d+(?:\.\d+)?<span class="dt">\ </span>[-+]?(?:0\B)*+(\d*?)(\d(?(4)\4))(?:\.\d+)?<span class="dt">\ </span>[-+]?(?:0\B)*+(\d*?)(\d(?(6)\6))(?:\.\d+)?$)\d)++)\b)</a></code></pre></div>
<p>Don’t look at me, I have no clue what’s going on here. But it works, somehow. If you want to figure it out, it might help to use <a href="https://github.com/SoptikHa2/desed/">desed</a>, a debugger for <code>sed</code>. If you give it a try, please also try this <a href="https://tildes.net/~comp/b2k/programming_challenge_find_path_from_city_a_to_city_b_with_least_traffic_controls_inbetween#comment-2run"><code>sed</code>-based solution for a shortest path problem</a>. I’d really appreciate an in-depth explanation.</p>
<p>Regular expressions weren’t designed to handle any of that stuff. But somebody with way too much time on their hand hunkered down and pushed them to their limits. It’s insane, but it works. And that’s the point that gets me to the bit of linguistics I need as an excuse for showing off some cool regex stuff.</p>
<h2 id="regexes-in-linguistics">Regexes in linguistics</h2>
<p>The regex examples above show that there is a big difference between what a system can do and what a system can do in a manner that’s easily digestible for a human reader. And that distinction is too often glossed over in linguistics. The literature is full of claims of the form “proposal X cannot account for phenomenon Y”. And very often, that’s not true, just like it isn’t true that you can’t use regular expressions to calculate a shortest path. For instance, you don’t need copy movement to produce pronounced copies, but oh boy will the grammar look weird. What these claims actually mean is “proposal X cannot elegantly account for phenomenon Y”. And that’s a big difference.</p>
<p>Elegance is a tricky criterion. For one thing, elegance depends a lot on the specification language. What may look clunky as a (standard) regular expression may be very elegant as a formula of monadic second-order logic, even though the two are intertranslatable. And the elegance of a specification language depends a lot on how much has been abstracted away. Specification X may be better than Y as long as you only have to account for phenomena P, Q, and R, but throw in S and T and all of a sudden Y scales much better and wins. It’s all very fuzzy, very tentative, mostly based on hunches, personal taste, aesthetics.</p>
<p>That’s okay. In general, researchers should do whatever makes them more productive, and in general it is the case that elegance = simplicity = productivity. But we should acknowledge that this is a methodological criterion. Lack of elegance is not a knockout argument and does not tell us much about the cognitive reality of a proposal. Reality might in fact be messy. Even though that <code>a + b = c</code> program is just a one-liner in Python, the human brain might actually be using the <a href="http://www.drregex.com/2018/11/how-to-match-b-c-where-abc-beast-reborn.html?m=1">humongous <code>sed</code> clusterfuck</a>. That doesn’t mean our theories have to be ugly — there’s nothing wrong with being better than reality — but we should be much more cautious with the use of elegance criteria in theory comparison.</p>
<p>And if you think learning considerations provide a natural push towards elegance, may I introduce you to <a href="https://github.com/MaLeLabTs/RegexGenerator">this lovely regex generator</a> that infers the intended regex from a data sample? Yes, I only brought up learning so that I could link to that.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>The second part isn’t a regular expression in the original sense of formal language theory because it uses backreferences, which require unbounded copying and simply aren’t regular. For the specific rewriting step I’m doing there, it would be trivial to specify a finite-state transducer, though. And the existence of backreferences is an interesting point in its own right: Even though every regex (in the formal language theory sense) can be converted to an equivalent deterministic finite-state automaton, most regex implementations actually use a context-free parsing mechanism — and once you have that, backreferences are an easy addition. Sometimes, a powerful thing can be more efficient than a very restricted thing.<a href="#fnref1" class="footnote-back">↩</a></p></li>
</ol>
</section>
Against math: When sets are a bad setup2020-04-06T00:00:00-04:002020-04-06T00:00:00-04:00Thomas Graftag:outde.xyz,2020-04-06:/2020-04-06/against-math-when-sets-are-a-bad-setup.html<p>Last time I gave you a piece of my mind when it comes to <a href="https://outde.xyz/2020-03-30/against-math-kuratowskis-spectre.html">the Kuratowski definition of pairs and ordered sets</a>, and why we should stay away from it in linguistics. The thing is, that was a conceptual argument, and those tend to fall flat with most researchers. Just like most mathematicians weren’t particularly fazed by Gödel’s incompleteness results because it didn’t impact their daily work, the average researcher doesn’t care about some impurities in their approach as long as it gets the job done. So this post will discuss a concrete case where a good linguistic insight got buried under mathematical rubble. </p>
<p>Last time I gave you a piece of my mind when it comes to <a href="https://outde.xyz/2020-03-30/against-math-kuratowskis-spectre.html">the Kuratowski definition of pairs and ordered sets</a>, and why we should stay away from it in linguistics. The thing is, that was a conceptual argument, and those tend to fall flat with most researchers. Just like most mathematicians weren’t particularly fazed by Gödel’s incompleteness results because it didn’t impact their daily work, the average researcher doesn’t care about some impurities in their approach as long as it gets the job done. So this post will discuss a concrete case where a good linguistic insight got buried under mathematical rubble. </p>
<p>Our case study is a <a href="https://doi.org/10.1353/lan.2018.0037">2018 paper</a> by <a href="http://departament-filcat-linguistica.ub.edu/directori-organitzatiu/jordi-fortuny-andreu">Jordi Fortuny</a>, which refines the ideas first presented in <span class="citation" data-cites="Fortuny08">Fortuny (2008)</span> and <span class="citation" data-cites="FortunyCorominasMurtra09">Fortuny and Corominas-Murtra (2009)</span>. The paper wrestles with one of the foundational issues of syntax: the interplay of structure and linear order, and why the latter seems to play second fiddle at best in syntax. Let’s first reflect a bit on the nature of this problem before we look at Fortuny’s proposed answer.</p>
<h2 id="structure-vs-linear-order">Structure VS linear order</h2>
<p>The primacy of structure is pretty much old hat to linguists. You’ve all seen the standard auxiliary fronting example before:</p>
<ol class="example" type="1">
<li>The man who is talking is tall.</li>
<li>Is the man who is talking _ tall?</li>
<li>*Is the man who _ talking is tall.</li>
</ol>
<p>Why is there no language that defines auxiliary fronting in terms of linear precedence such that the leftmost — or alternatively the rightmost — auxiliary in the sentence is fronted? Quite generally, why doesn’t syntax allow constraints that are based entirely on linear order, such as:</p>
<ol type="1">
<li><strong>Sentence-final decasing</strong><br />
Don’t display case if you are the last word in the sentence.</li>
<li><strong>RHOL subject placement</strong><br />
The subject of a clause <em>C</em> is the rightmost DP with at least two lexical items. If no such DP exists in <em>C</em>, the subject is the leftmost DP instead.</li>
<li><strong>Linear movement blocking</strong><br />
No adjunct may linearly intervene between a mover and its trace.</li>
<li><strong>Modulo binding</strong><br />
Every reflexive must be an even number of words away from the left edge of the sentence.</li>
</ol>
<p>That’s exactly the kind of question that keeps me up at night. As you know, I like the idea that syntax and phonology are actually very similar at a computational level, so the non-existence of the constraints above is problematic because they are all modeled after phenomena from the phonological literature. How can we explain the absence of those constraints?</p>
<p>There’s two answers here, and neither one is satisfying. We might reject the initial assumption that linear order doesn’t matter in syntax. That’s basically <a href="https://udel.edu/~bruening/">Benjamin Bruening</a>’s <a href="https://www.linguisticsociety.org/sites/default/files/342-388_0.pdf">story for binding</a> <span class="citation" data-cites="Bruening14">(Bruening 2014)</span>. I have a weak spot for contrarian takes, but the Bruening stance doesn’t answer why we still can’t get constraints like the ones listed above. Perhaps Bruening is right and linear order matters to some extent, but if so the extent seems to be much more limited than one could imagine.</p>
<p>This leaves us with option 2, which is the standard story: syntactic representations have no linear order, hence syntax can’t make reference to linear order. The idea goes back to <span class="citation" data-cites="Kayne94">Kayne (1994)</span> and is also a major reason for the use of sets in Bare Phrase Structure <span class="citation" data-cites="Chomsky95">(Chomsky 1995)</span>. But it simply doesn’t work because syntax is inherently asymmetric. And this is where <span class="citation" data-cites="Fortuny18">Fortuny (2018)</span> enters the picture.</p>
<h2 id="order-from-computation">Order from computation</h2>
<p>I take Fortuny’s basic point to be one that I’m very sympathetic to: linear order emerges naturally from the asymmetry that is implicit in syntactic computation. Hence it is hopeless to stipulate linear order out of syntax, the best one can do is to systematically bound the role of linear order in syntax.</p>
<p>Here’s my take on this, which I think is close in spirit to what Fortuny is driving at, but without any reliance on sets. A (strict) linear order arises when you have a relation that is transitive, irreflexive, asymmetric, and trichotomous:</p>
<ul>
<li><strong>transitive</strong>: whatever can be reached in <span class="math inline">\(n\)</span> steps can be reached in one step<br />
(<span class="math inline">\(x \mathrel{R} y\)</span> and <span class="math inline">\(y \mathrel{R} z\)</span> implies <span class="math inline">\(x \mathrel{R} z\)</span>)</li>
<li><strong>irreflexive</strong>: nothing is related to itself<br />
(<span class="math inline">\(x \not\mathrel{R} x\)</span> for all <span class="math inline">\(x\)</span>)</li>
<li><strong>asymmetric</strong>: no two elements are mutually related<br />
(<span class="math inline">\(x \mathrel{R} y \rightarrow y \not\mathrel{R} x\)</span>)</li>
<li><strong>trichotomous</strong>: no two elements are unrelated<br />
(<span class="math inline">\(x \mathrel{R} y \vee y \mathrel{R} x \vee x = y\)</span>)</li>
</ul>
<p>If you look at syntax in terms of binary Merge (or something <a href="https://outde.xyz/2019-05-15/underappreciated-arguments-the-inverted-t-model.html">slightly</a> <a href="https://outde.xyz/2019-09-18/the-subregular-complexity-of-merge-and-move.html">more</a> <a href="https://outde.xyz/2020-03-06/trees-for-free-with-tree-free-syntax.html">abstract</a>), you already have an order that satisfies three of those properties: transitivity, irreflexivity, and asymmetry. That’s the (strict) partial order we usually call <strong>proper dominance</strong>, but you can also think of it as <strong>derivational prominence</strong> or some other, more abstract concept. Not really the point here. Either way you are already dealing with something that’s inherently asymmetric and ordered, and this asymmetry can be inherited by any relation that piggybacks on this. This recourse to proper dominance (<span class="math inline">\(\triangleleft^+\)</span>) is exactly how linear order (<span class="math inline">\(\prec\)</span>) is usually defined in formal terms: <span class="math display">\[
x \prec y \Leftrightarrow
x \mathrel{S} y
\vee
\exists z_x, z_y [
z_x \triangleleft^+ x
\wedge
z_y \triangleleft^+ y
\wedge
z_x \mathrel{S} z_y
]
\]</span> This says that precedence is inherited via dominance. If <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> are properly dominated by <span class="math inline">\(z_x\)</span> and <span class="math inline">\(z_y\)</span>, respectively, and <span class="math inline">\(z_x\)</span> precedes <span class="math inline">\(z_y\)</span>, then <span class="math inline">\(x\)</span> precedes <span class="math inline">\(y\)</span>. But hold on a second, that’s circular, we’re defining precedence in terms of precedence. And if you look at the formula, there’s actually a completely different symbol there, <span class="math inline">\(S\)</span>, which isn’t the precedence symbol <span class="math inline">\(\prec\)</span>. So what’s <span class="math inline">\(S\)</span>? It’s the successor relation, and in contrast to precedence it’s only defined for siblings. So <span class="math inline">\(x \mathrel{S} y\)</span> is true iff <span class="math inline">\(y\)</span> is both a sibling of <span class="math inline">\(x\)</span> and its successor. Aha, so this is where it all breaks down, syntax doesn’t actually have such a successor relation because there is no linear order in syntax!</p>
<p>Nope, sorry, this particular hook you can’t get off that easily. You see, <span class="math inline">\(S\)</span> doesn’t actually need to tell us anything about linear order. The term successor applies to any asymmetric order. So <span class="math inline">\(S\)</span> just needs to be some relation that establishes an asymmetry between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>. And there is a relation in syntax that does this for us, it’s the head-argument relation. Merge tends to be presented as a symmetric operation, but it’s not. One of the guys is more important because it has a bigger influence on the behavior of the newly formed constituent. That’s the head. Instead of successor, you may interpret <span class="math inline">\(S\)</span> as some kind of <strong>superior</strong> relation, and the formula above will still work fine.</p>
<p>What this shows us is that linear order cannot be simply stipulated away. Syntax furnishes all the asymmetries that make up linear order, and thus a computation device that can keep track of these asymmetries is perfectly aware of linear order. The problem, then, has to be with the computational complexity of determining linear order from those asymmetries. That is to say, the formula for <span class="math inline">\(\prec\)</span> above is too hard to compute. Something about inheritance via proper dominance is beyond the computational means of syntax. If so, this would dovetail quite nicely with my pet idea that syntax overall has very low subregular complexity. For instance, I’ve argued together with <a href="https://aniellodesanto.github.io/">Aniello De Santo</a> that <a href="https://www.aclweb.org/anthology/W19-5702.pdf">sensing tree automata furnish an upper bound for syntax</a>, and these automata are indeed incapable of fully tracking linear order. So, yes, sign me up for the idea that linear order constraints don’t show up in syntax because linear order is too hard to compute from the combination of proper dominance and head-argument asymmetries. But that’s very different from the standard story that syntax lacks linear order because its representations don’t directly encode linear order.</p>
<p>Fortuny provides a technically different answer, but the core idea is very similar in nature to the story I just sketched. He first shows that syntax naturally produces linear orders, and then tries to explain why the impact of that is so limited. But he does it with sets, and that opens up a whole can of worms.</p>
<h2 id="fortunys-approach-in-detail">Fortuny’s approach in detail</h2>
<p>Fortuny starts out with a generalization of the Kuratowski definition from pairs to tuples (btw, upon rereading his paper I noticed that he actually cites <span class="citation" data-cites="Kuratowski21">Kuratowski (1921)</span>; kudos!). With this generalized definition, an <span class="math inline">\(n\)</span>-tuple <span class="math inline">\(\langle a_1, \ldots, a_n \rangle\)</span> is encoded as the <strong>nest</strong> <span class="math display">\[
\{ \{a_1\}, \{a_1, a_2\}, \{a_1, a_2, a_3\}, \ldots, \{a_1, a_2, a_3, \ldots, a_n\} \}
\]</span> Based on earlier work <span class="citation" data-cites="Fortuny08 FortunyCorominasMurtra09">(Fortuny 2008; Fortuny and Corominas-Murtra 2009)</span>, he then postulates that syntactic derivations produce sets of this form, rather than the standard bare phrase structure sets. Think of it as follows: suppose that the syntactic workspace currently contains only <span class="math inline">\(a\)</span>, which by itself forms the constituent <span class="math inline">\(\{ a \}\)</span>. Now if we Merge some <span class="math inline">\(b\)</span> into this constituent, we get <span class="math inline">\(\{ a, b \}\)</span> as the output. Fortuny then says that the actual constituent is the set of the sets built by Merge, i.e. <span class="math inline">\(\{ \{a\}, \{a,b\} \}\)</span>. Personally, I’d say it makes more sense to think of this as a derivation rather than a constituent, but my infatuation with derivation trees should be well-known by now. I won’t quibble over terminology and just follow Fortuny in calling these complex sets constituents. So we have an <strong>output</strong> <span class="math inline">\(\{a,b\}\)</span>, but a <strong>constituent</strong> <span class="math inline">\(\{ \{a\}, \{a,b\} \}\)</span>. If we merge a <span class="math inline">\(c\)</span> into the current output, we get <span class="math inline">\(\{ a, b, c\}\)</span>, and the constituent grows to <span class="math inline">\(\{ \{a\}, \{a,b\}, \{a,b,c\} \}\)</span>. In a nutshell, Merge just inserts a lexical item into a set, and the nested structure arises if we collect all the outputs into a single set, which Fortuny calls a constituent.</p>
<p>But that’s also where we run into the first problem. Well, two problems. Three, actually. First, <a href="https://outde.xyz/2020-03-30/against-math-kuratowskis-spectre.html">and at the risk of repeating myself</a>, this kind of definition only works for specific axiomatizations of sets, and that’s a lot of baggage to attach to your linguistic proposal. Second, redefining Merge in that way means that large parts of the audience will immediately check out. A major deviation from established machinery is always a tough sell, so you should avoid that if you can. And then there’s still problem three, which in a sense is the worst because it brings with it a rats tail of other problems.</p>
<p>You see, the set-theoretic representation of tuples used by Fortuny doesn’t work in full generality. Consider the following tuples and their set-theoretic representation as nests:</p>
<ol type="1">
<li><span class="math inline">\(\langle a, b \rangle = \{ \{a\}, \{a,b\} \}\)</span></li>
<li><span class="math inline">\(\langle a, b, a \rangle = \{ \{a\}, \{a,b\}, \{a,b,a\} \} = \{ \{a\}, \{a,b\}, \{a,b\} \} = \{ \{a\}, \{a,b\} \}\)</span></li>
<li><span class="math inline">\(\langle a, b, b \rangle = \{ \{a\}, \{a,b\}, \{a,b,b\} \} = \{ \{a\}, \{a,b\}, \{a,b\} \} = \{ \{a\}, \{a,b\} \}\)</span></li>
</ol>
<p>As you can see, three distinct triples all end up with same set-theoretic encoding. That’s not good. This means that if your syntax outputs <span class="math inline">\(\{ \{a\}, \{a,b\} \}\)</span>, you don’t actually know if it gave you the tuple <span class="math inline">\(\langle a, b \rangle\)</span>, <span class="math inline">\(\langle a, b, a \rangle\)</span>, or <span class="math inline">\(\langle a, b, b \rangle\)</span>. If your encoding can’t keep things distinct that should be distinct, it’s not a great encoding.</p>
<p>Fortuny is aware of that, and he has a workaround. Since the problem only arises for tuples that contain identical elements, one has to ensure that there are no identical elements. To this end, he subscripts each entry with the derivational step at which it was introduced. Here’s what this would look like for the counterexamples above:</p>
<ol type="1">
<li><span class="math inline">\(\{ \{a_1\}, \{a_1,b_2\} \} = \langle a_1, b_2 \rangle\)</span></li>
<li><span class="math inline">\(\{ \{a_1\}, \{a_1,b_2\}, \{a_1,b_2,a_3\} \} = \langle a_1, b_2, a_3 \rangle\)</span></li>
<li><span class="math inline">\(\{ \{a_1\}, \{a_1,b_2\}, \{a_1,b_2,b_3\} \} = \langle a_1, b_2, b_3 \rangle\)</span></li>
</ol>
<p>Alright, that fixes the math problem, but it creates even bigger problems — I told you it’s a rat tail. Now that Fortuny has added subscripts, and he has to allow for arbitrary many of them. From a computational perspective, that’s not that great. At best it’s clunky, at worst it creates serious issues with computational power. And from a linguistic perspective, it violates the Inclusiveness condition <span class="citation" data-cites="Chomsky95a">(Chomsky 1995)</span>, according to which syntax does not enrich lexical items with any mark-up, diacritics, or other encodings of non-lexical information. I certainly am not gonna lose any sleep over somebody’s proposal violating the Inclusiveness condition, but I’d wager that this attitude isn’t shared by the main audience for a pure theory paper on Merge and linearization. The set-based view has forced Fortuny into a formalization that makes his argument, which ultimately doesn’t hinge on sets, less attractive to his target audience.</p>
<p>That said, let’s assume you’re you’re willing to accept all those modifications and look at the payoff. You now have a system where linear order is baked directly into syntax. But Fortuny still has to tell us why linear order nonetheless doesn’t seem to matter all that much in syntax. He relates this to a crucial limitation of Merge. As you might have noticed, the nesting system gets a bit more complicated when you try to merge a complex specifier. Suppose you have already built the complex specifier <span class="math inline">\(\{d, e\}\)</span>; I omit subscripts because the notation is cluttered enough as is. Suppose furthermore that <span class="math inline">\(\{d, e\}\)</span> belongs to the constituent <span class="math inline">\(\{ \{d\}, \{d, e\} \}\)</span>. Let’s try to merge <span class="math inline">\(\{d, e\}\)</span> into <span class="math inline">\(\{ a, b, c \}\)</span>, which is part of the constituent <span class="math inline">\(\{ \{a\}, \{a,b\}, \{a,b,c\} \}\)</span>. What should be the output? Fortuny says that the whole constituent <span class="math inline">\(\{ \{d\}, \{d, e\} \}\)</span> is merged with the previous output <span class="math inline">\(\{a,b,c\}\)</span>, yielding the new output <span class="math inline">\(\{ a,b,c, \{ \{d\}, \{d,e\}\}\}\)</span>. Adding this to the previous constituent <span class="math inline">\(\{ \{a\}, \{a,b\}, \{a,b,c\} \}\)</span>, we get the new constituent <span class="math display">\[
\{ \{a\}, \{a,b\}, \{a,b,c\}, \{ a,b,c, \{\{d\}, \{d,e\}\} \} \}
\]</span> Not the most readable, but internally consistent.</p>
<p>Fortuny then observes that in general we do not want to allow movement from such subconstituents — the Specifier Island Constraint and the Adjunct Island Constraint strikes again. Under the assumption that Move is just a variant of Merge, he defines a single application domain for Merge that does not allow this operation to target any proper part of the subconstituent <span class="math inline">\(\{\{d\}, \{d,e\}\}\)</span>. But if you take that as a general constraint on syntax, it also means that syntax cannot directly relate <span class="math inline">\(d\)</span> and <span class="math inline">\(e\)</span> to <span class="math inline">\(a\)</span>, <span class="math inline">\(b\)</span>, or <span class="math inline">\(c\)</span>. Consequently, syntax cannot define a linear order over all of <span class="math inline">\(a\)</span>, <span class="math inline">\(b\)</span>, <span class="math inline">\(c\)</span>, <span class="math inline">\(d\)</span>, and <span class="math inline">\(e\)</span>. And that’s why linear order in syntax has very limited role, even though linear order is directly baked into syntax.</p>
<h1 id="did-the-sets-help">Did the sets help?</h1>
<p>Alright, time to take stock. If we compare Fortuny’s set-theoretic operations to the more high-level story I presented above, do the sets actually illuminate anything? I don’t think so.</p>
<p>You don’t need nests to establish that the syntactic computation naturally furnishes all the asymmetries that are needed to establish linear order. Actually, nests muddle this point because they force you into dealing with occurrences, subscripts, the Inclusiveness condition, and all that other stuff that’s completely orthogonal to the core issue. Nor do sets really help us understand why the role of linear order is limited. Fortuny stipulates a specific notion of domain based on empirical observations about Move, but that’s completely independent of sets as it’s a generalized version of the Specifier and Adjunct Island constraints. And those are all just more specific instances of the assumption that sensing tree automata are a computational upper bound on syntactic expressivity. I’d also say that Fortuny’s set-based definition of domain is much harder to make sense of than sensing tree automata. Overall, the set-based presentation is a handicap for the paper, not a boon.</p>
<p>It’s unfortunate, because Fortuny’s big picture points are right on the money imho. But they’re buried under the mathematical clutter of sets, sets, and more sets.</p>
<h2 id="references" class="unnumbered">References</h2>
<div id="refs" class="references">
<div id="ref-Bruening14">
<p>Bruening, Benjamin. 2014. Precede-and-command revisited. <em>Language</em> 90.342–388. doi:<a href="https://doi.org/10.1353/lan.2014.0037">10.1353/lan.2014.0037</a>.</p>
</div>
<div id="ref-Chomsky95">
<p>Chomsky, Noam. 1995. Bare phrase structure. <em>Government and binding theory and the Minimalist program</em>, ed. by. by Gert Webelhuth, 383–440. Oxford: Blackwell.</p>
</div>
<div id="ref-Chomsky95a">
<p>Chomsky, Noam. 1995. Categories and transformations. <em>The Minimalist program</em>, 219–394. Cambridge, MA: MIT Press. doi:<a href="https://doi.org/10.7551/mitpress/9780262527347.003.0004">10.7551/mitpress/9780262527347.003.0004</a>.</p>
</div>
<div id="ref-Fortuny08">
<p>Fortuny, Jordi. 2008. <em>The emergence of order in syntax</em>. Amsterdam: John Benjamins.</p>
</div>
<div id="ref-Fortuny18">
<p>Fortuny, Jordi. 2018. Structure dependence and linear order: Clarifications and foundations. <em>Language</em> 94.611–628. doi:<a href="https://doi.org/10.1353/lan.2018.0037">10.1353/lan.2018.0037</a>.</p>
</div>
<div id="ref-FortunyCorominasMurtra09">
<p>Fortuny, Jordi, and Bernat Corominas-Murtra. 2009. Some formal considerations on the generation of hierarchically structured expression. <em>Catalan Journal of Linguistics</em> 8.99–111. <a href="https://www.raco.cat/index.php/CatalanJournal/article/view/168906/221175">https://www.raco.cat/index.php/CatalanJournal/article/view/168906/221175</a>.</p>
</div>
<div id="ref-Kayne94">
<p>Kayne, Richard S.. 1994. <em>The antisymmetry of syntax</em>. Cambridge, MA: MIT Press.</p>
</div>
<div id="ref-Kuratowski21">
<p>Kuratowski, Kazimierz. 1921. Sur la notion de l’ordre dans la théorie des ensembles. <em>Fundamenta Mathematica</em> 2.161–171.</p>
</div>
</div>
Against math: Kuratowski's spectre2020-03-30T00:00:00-04:002020-03-30T00:00:00-04:00Thomas Graftag:outde.xyz,2020-03-30:/2020-03-30/against-math-kuratowskis-spectre.html<p>As some of you might know, <a href="https://thomasgraf.net/output/graf13thesis.html">my dissertation</a> starts with a quote from <em>My Little Pony</em>. By Applejack, to be precise, the only pony that I could see myself have a beer with (and I don’t even like beer). <a href="https://youtu.be/k3NkMTV9r5U">You can watch the full clip,</a> but here’s the line that I quoted:</p>
<blockquote>
<p>Don’t you use your fancy mathematics to muddy the issue.</p>
</blockquote>
<p>Truer words have never been spoken. In light of my obvious mathematical inclinations this might come as a surprise for some of you, but I don’t like using math just for the sake of math. Mathematical formalization is only worth it if it provides novel insights. </p>
<p>As some of you might know, <a href="https://thomasgraf.net/output/graf13thesis.html">my dissertation</a> starts with a quote from <em>My Little Pony</em>. By Applejack, to be precise, the only pony that I could see myself have a beer with (and I don’t even like beer). <a href="https://youtu.be/k3NkMTV9r5U">You can watch the full clip,</a> but here’s the line that I quoted:</p>
<blockquote>
<p>Don’t you use your fancy mathematics to muddy the issue.</p>
</blockquote>
<p>Truer words have never been spoken. In light of my obvious mathematical inclinations this might come as a surprise for some of you, but I don’t like using math just for the sake of math. Mathematical formalization is only worth it if it provides novel insights. </p>
<p>Some work falls short of this bar (your call whether mine does). And some work is actively worse because of its use of math. Both things have happened and are still happening in Minimalist syntax. Ever since the publication of <em>Bare Phrase Structure</em> <span class="citation" data-cites="Chomsky95">(Chomsky 1995)</span>, there has been a line of Minimalist research that wants to formalize Merge in set-theoretic terms and derive linguistic properties from mathematical set theory. This is, for lack of a better term, ass-backwards.</p>
<p>Today’s post is the start of a two-part series. It covers the general conceptual and mathematical problems with a lot of this work. The next post will discuss a concrete example of how bringing in math can actively undermine a linguistic proposal rather than strengthening it. So, without further ado, let’s talk Kuratowski.</p>
<h2 id="kuratowski-and-the-confusion-of-sets-and-set-theory">Kuratowski and the confusion of sets and set theory</h2>
<p>Quick show of hands, who has seen this before: <span class="math display">\[\{ \{a\}, \{a, b\}\} = \langle a, b \rangle\]</span> This is the <strong>Kuratowski definition</strong> of pairs in terms of sets <span class="citation" data-cites="Kuratowski21">(Kuratowski 1921)</span>. In contrast to sets, pairs have an intrinsic order, so that <span class="math inline">\(\langle a, b \rangle \neq \langle b, a \rangle\)</span> (unless <span class="math inline">\(a = b\)</span>). Instead of <span class="math inline">\(\{ \{a\}, \{a, b\} \}\)</span> one can also use <span class="math inline">\(\{ a, \{a, b\} \}\)</span>, which is called the <strong>short Kuratowski definition</strong>.</p>
<p>I can’t think of any other mathematical tidbit that has been invoked more often in syntax (although I have yet to find a paper that actually cites <span class="citation" data-cites="Kuratowski21">Kuratowski (1921)</span>). Minimalists like this definition because it looks very similar to the set-theoretic objects <span class="citation" data-cites="Chomsky95">Chomsky (1995)</span> uses to encode syntactic structure: Merge takes two syntactic objects <span class="math inline">\(a\)</span> and <span class="math inline">\(b\)</span> and combines them into the syntactic object <span class="math inline">\(\{ a, \{a, b\} \}\)</span>. Even though the object is a set and thus unordered, we can use the (short) Kuratowski definition to establish a connection to pairs, which are ordered. And from there we can develop all kinds of ideas about linear order in syntax. Except that we actually can’t because the (short) Kuratowski definition only holds in a specific version of set theory. It’s not a theorem about the connection between sets and linear order, it’s a particular mathematical definition of pairs that works in a particular version of mathematical set theory.</p>
<h2 id="why-does-the-kuratowski-definition-work">Why does the Kuratowski definition work?</h2>
<p>Now before we go on any further, let’s demystify the Kuratowski definition. Why is this the set-theoretic definition of pairs? First of all, it’s not <strong>the</strong> set-theoretic definition of pairs, it’s <strong>one</strong> set-theoretic definition. As always in math, there’s a million ways to set things up. Wiener’s definition represents the pair <span class="math inline">\(\langle a, b \rangle\)</span> as <span class="math inline">\(\{ \{ \{a\}, \emptyset \}, \{\{b\}\} \}\)</span>. Hausdorff uses the much more intuitive <span class="math inline">\(\{ \{a, 1\}, \{b, 2\} \}\)</span>. And there’s many other alternatives. So don’t attach too much metaphysical importance to the Kuratowski definition, it’s just a definition that happens to work because it captures a specific property of pairs.</p>
<p>Pairs are characterized by an essential equivalence: <span class="math display">\[\langle a, b \rangle = \langle c, d \rangle \text{ iff } a = b\ \&\ c = d\]</span> That’s what separates pairs from sets, where the expression <span class="math inline">\(\{ a, b \}\)</span> is the same as <span class="math inline">\(\{ b, a \}\)</span> because of the lack of order. With pairs, on the other hand, <span class="math inline">\(\langle a, b \rangle \neq \langle b , a \rangle\)</span> (unless <span class="math inline">\(a = b\)</span>, in which case we would have <span class="math inline">\(\langle a, a \rangle = \langle a, a \rangle\)</span>).</p>
<p>The Kuratowski definition works because sets of the form <span class="math inline">\(\{ \{a\}, \{a, b\} \}\)</span> satisfy the same equality condition: <span class="math display">\[\{ \{a\}, \{a,b\} \} = \{ \{c\}, \{c,d\} \} \text{ iff } a = b\ \&\ c = d\]</span> The right-to-left direction is easy to see. That is to say, if <span class="math inline">\(a = b\)</span> and <span class="math inline">\(c = d\)</span>, then it is pretty much inevitable that <span class="math inline">\(\{ \{a\}, \{a,b\} \} = \{ \{c\}, \{c,d\} \}\)</span>. It’s the left-to-right direction of the <em>iff</em> that’s tricky. In order to show that <span class="math inline">\(\{ \{a\}, \{a,b\} \} = \{ \{c\}, \{c,d\} \}\)</span> entails <span class="math inline">\(a = b\)</span> and <span class="math inline">\(c = d\)</span>, we have to consider several cases.</p>
<h3 id="case-1-a-b">Case 1: <span class="math inline">\(a = b\)</span></h3>
<p>Suppose <span class="math inline">\(a = b\)</span>. Remember that sets are <strong>idempotent</strong>, which means that repetitions are ignored. For instance, <span class="math inline">\(\{ a, b, c, b, a, a \} = \{a, b, c\}\)</span>. If <span class="math inline">\(a = b\)</span>, then <span class="math inline">\(\{ \{a\}, \{a,b\} \} = \{ \{a\}, \{a, a\} \} = \{ \{a\}, \{a\} \} = \{ \{a\} \}\)</span>. But then <span class="math inline">\(\{ \{a\}, \{a,b\} \} = \{ \{c\}, \{c,d\} \}\)</span> is actually <span class="math inline">\(\{ \{a\} \} = \{ \{c\}, \{c,d\} \}\)</span>. This is possible only if <span class="math inline">\(\{ c \} = \{c ,d\}\)</span>, which implies <span class="math inline">\(c = d\)</span>. So we actually have <span class="math inline">\(\{ \{c\}, \{c,d\} \} = \{ \{c\}, \{c,c\} \} = \{ \{c\}, \{c,c\} \} = \{\{c\}\} = \{\{a\}\}\)</span>, and hence <span class="math inline">\(a = c\)</span>. Overall, then, we have <span class="math inline">\(a = b = c = d\)</span>, which necessarily entails <span class="math inline">\(a = c\)</span> and <span class="math inline">\(b = d\)</span>.</p>
<h3 id="case-2-a-neq-b">Case 2: <span class="math inline">\(a \neq b\)</span></h3>
<p>Now suppose <span class="math inline">\(a \neq b\)</span>. Then either <span class="math inline">\(\{a\} = \{c,d\}\)</span> or <span class="math inline">\(\{a\} = \{c\}\)</span>.</p>
<ol type="1">
<li><p>Since two sets with distinct cardinality cannot be identical, the equality <span class="math inline">\(\{a\} = \{c,d\}\)</span> holds only if <span class="math inline">\(c = d\)</span>. Then <span class="math inline">\(\{ \{a\}, \{a,b\} \} = \{ \{c\}, \{c,d\} \} = \{ \{c\}, \{c\} \}\)</span>, but <span class="math inline">\(\{c\} \neq \{a, b\}\)</span> because <span class="math inline">\(a \neq b\)</span>. This is a contradiction, so it must be the case that <span class="math inline">\(\{a\} \neq \{c,d\}\)</span>.</p></li>
<li><p>Assume, then, that <span class="math inline">\(\{a\} = \{c\} \neq \{c,d\}\)</span>. Then <span class="math inline">\(c \neq d\)</span>, and <span class="math inline">\(\{a, b\} = \{c, d\}\)</span> iff <span class="math inline">\(b = d\)</span>. Overall, then, we have <span class="math inline">\(a = c\)</span> and <span class="math inline">\(b = d\)</span>, as required.</p></li>
</ol>
<h2 id="when-does-the-kuratowski-definition-work">When does the Kuratowski definition work?</h2>
<p>Did you notice something in the proof above? The proof is mathematically sound, but it relies on specific assumptions of set theory. For instance, that it is impossible for both <span class="math inline">\(c \neq d\)</span> and <span class="math inline">\(\{a\} = \{c,d\}\)</span> to be true because a set with one member can never be identical to a set with two members. Alright, that’s intuitive enough, but things get worse.</p>
<p>For Minimalist syntax, we don’t really care about the long Kuratowski definition with <span class="math inline">\(\{ \{a\}, \{a,b\} \} = \langle a,b \rangle\)</span>, we want the short one with <span class="math inline">\(\{ a, \{a, b\} \} = \langle a,b \rangle\)</span> because that’s the kind of set-theoretic object that’s built by Merge. The proof above, however, runs into a problem with the short definition. Suppose <span class="math inline">\(\{a, \{a, b\}\} = \{c, \{c,d\} \}\)</span> and <span class="math inline">\(a = \{c, d\}\)</span>. Then either <span class="math inline">\(\{a, b\} = c\)</span> or <span class="math inline">\(\{a, b\} = \{c, d\}\)</span>. We have to rule out the case that <span class="math inline">\(\{a, b\} = c\)</span> — otherwise, the connection to pairs will break down as we could have really weird sets that are equivalent even though they wouldn’t be equivalent when viewed as pairs.</p>
<p>Intuitively, <span class="math inline">\(\{a, b\} = c\)</span> is easy to rule out. If <span class="math inline">\(\{a, b\} = c\)</span> and <span class="math inline">\(a = \{c, d\}\)</span>, then we get some kind of infinite loop by substituting <span class="math inline">\(\{c, d\}\)</span> for <span class="math inline">\(a\)</span> and <span class="math inline">\(\{a, b\}\)</span> for <span class="math inline">\(c\)</span>: <span class="math display">\[a = \{c, d\} = \{ \{a, b\}, d \} = \{ \{ \{c,d\}, b \}, d \} = \{ \{ \{ \{a, b\}, d\}, b\}, d\} = \ldots\]</span> Clearly that’s not okay, right? Actually, it is.</p>
<p>Ruling out such cases of infinite recursion requires the <strong>axiom of regularity</strong>. This axiom is part of the standard formalization of set-theory known as ZFC, <strong>Zermelo-Fraenkel with the axiom of choice</strong>. That is actually a really weird axiomatization because it is a first-order definition, which means that sets and the objects contained by sets have the same type. If you still think a set is a collection of objects, you’re not thinking in ZFC terms where there is no distinction between objects and collections of objects. ZFC is about as far away from our informal understanding of sets as one can get.</p>
<p>And to add insult to injury, the axiom of regularity does precious little work for ZFC. All the important theorems for ZFC set theory hold irrespective of whether one enforces the axiom of regularity. And there is a giant cottage industry of non-standard set theories that all eschew the axiom of regularity plus a truckload of other ZFC axioms. There is no such thing as <strong>the</strong> definition of sets, there’s many competing formalizations that support completely different theorems. Many of them do not support the short Kuratowski definition. The short Kuratowski definition simply does not work unless one makes very specific commitments about the nature of sets.</p>
<h2 id="the-folly-of-mathing-your-syntax">The folly of mathing your syntax</h2>
<p>I think what this shows is that this kind of set-theoretic work in syntax is trying to have its cake and eat it to. On the one hand, nobody wants to say that syntax literally operates with a notion of set that corresponds to the ZFC axiomatization of set theory. That would entail a commitment to the psychological reality of its highly abstract and counter-intuitive axioms, including the axiom of regularity. And as far as cognitive commitments go, that’s pretty far out there. In general, the set-theoretic view of Merge is taken to be either a convenient metaphor or to be rooted in naive set theory. I don’t know of a single paper that uses the short Kuratowski definition and explicitly states that the sets built by merge are assumed to obey the laws of ZFC. So that’s one side of the cake: naive notions of set, rather than mathematical set theory.</p>
<p>But on the other hand this work then drags out mathematical properties such as idempotency and the short Kuratowski definition, without acknowledging that these do not work with the intuitive notion of sets. Because, let’s face it, sets simply aren’t intuitive. The closest thing we have to sets in the real world is bags, and those still aren’t sets because they lack idempotency: a bag with two <span class="math inline">\(a\)</span>-objects is not the same as a bag with one <span class="math inline">\(a\)</span>-object. There is no such thing as an intuitive notion of sets; sets are intrinsically unintuitive.</p>
<p>And this takes me to the broader point I want to make in this series. All that mathematical quibbling about definitions, equivalences, and axiomatizations is completely pointless. Why would you ever open yourself up to criticism of that kind? None of the syntactic proposals that use the Kuratowski definition actually need it to make their point. The ideas could be stated in very different terms, and they would be none the poorer for it. Dressing up your linguistic idea in terms of sets doesn’t magically make it better or derives some linguistic property without further stipulations. Quite to the contrary: the moment you invoke the Kuratowski definition, you’re implicitly stipulating the cognitive reality of half a dozen first-order axioms. And in service of what? If your idea works, we can define it in a million ways and it doesn’t really matter what it looks like when it is hashed out in terms of sets. If your idea doesn’t work, then it doesn’t work and is bunk no matter how elegantly you derived it from set theory.</p>
<h2 id="references" class="unnumbered">References</h2>
<div id="refs" class="references">
<div id="ref-Chomsky95">
<p>Chomsky, Noam. 1995. Bare phrase structure. <em>Government and binding theory and the Minimalist program</em>, ed. by. by Gert Webelhuth, 383–440. Oxford: Blackwell.</p>
</div>
<div id="ref-Kuratowski21">
<p>Kuratowski, Kazimierz. 1921. Sur la notion de l’ordre dans la théorie des ensembles. <em>Fundamenta Mathematica</em> 2.161–171.</p>
</div>
</div>
"Star-Free Regular Languages and Logic" at KWRegan's Blog2020-03-23T00:00:00-04:002020-03-23T00:00:00-04:00Jeffrey Heinztag:outde.xyz,2020-03-23:/2020-03-23/star-free-regular-languages-and-logic-at-kwregans-blog.html<p>Bill Idsardi brought this to my attention. Enjoy your reading!</p>
<p><a href="https://rjlipton.wordpress.com/2020/03/21/star-free-regular-languages-and-logic/">Star-Free Regular Languages and Logic</a></p>
<p>on the <a href="https://rjlipton.wordpress.com/">Gödel’s Lost Letter and P=NP</a> blog.</p>
Trees for free with tree-free syntax2020-03-06T00:00:00-05:002020-03-06T00:00:00-05:00Thomas Graftag:outde.xyz,2020-03-06:/2020-03-06/trees-for-free-with-tree-free-syntax.html<p>Here’s another quick follow-up to the <a href="https://outde.xyz/2020-02-20/unboundedness-is-a-red-herring.html">unboundedness argument</a>. As you might recall, that post discussed a very simple model of syntax whose only task it was to adjudicate the well-formedness of a small number of strings. Even for such a limited task, and with such a simple model, it quickly became clear that we need a more modular approach to succinctly capture the facts and state important generalizations. But once we had this more modular perspective, it no longer mattered whether syntax is actually unbounded. Assuming unboundedness, denying unboundedness, it doesn’t matter because the overall nature of the approach does not hinge on whether we incorporate an upper bound on anything. Well, something very similar also happens with another aspect of syntax that is beyond doubt in some communities and highly contentious in others: syntactic trees. </p>
<p>Here’s another quick follow-up to the <a href="https://outde.xyz/2020-02-20/unboundedness-is-a-red-herring.html">unboundedness argument</a>. As you might recall, that post discussed a very simple model of syntax whose only task it was to adjudicate the well-formedness of a small number of strings. Even for such a limited task, and with such a simple model, it quickly became clear that we need a more modular approach to succinctly capture the facts and state important generalizations. But once we had this more modular perspective, it no longer mattered whether syntax is actually unbounded. Assuming unboundedness, denying unboundedness, it doesn’t matter because the overall nature of the approach does not hinge on whether we incorporate an upper bound on anything. Well, something very similar also happens with another aspect of syntax that is beyond doubt in some communities and highly contentious in others: syntactic trees. </p>
<h2 id="the-first-and-only-example">The first and only example</h2>
<p>Remember that finite-state automata (FSAs) can be represented much more compactly via recursive transition networks (RTNs). As long as we put an upper bound on the number of recursion steps, every RTN can be compiled out into an FSA, although the FSA might be much larger and contain numerous redundancies. Here’s the RTN I provided for a tiny fragment of English:</p>
<figure>
<img src="https://outde.xyz/img/thomas/underappreciated_unboundedness/ftn_factored_embedding.svg" alt="An RTN with center-embedding" /><figcaption>An RTN with center-embedding</figcaption>
</figure>
<p>And indubitably you also remember how this device would generate <em>the fact that the fact surprised me surprised me</em>, which I explained with such remarkable lucidity that it should be indelibly etched into your mind:</p>
<blockquote>
<p>We start at S0 and take the NP edge, which puts us at NP0. At the same time, we put S1 on the stack to indicate that this is where we will reemerge the next time we exit an FSA at one of its final states. In the NP automaton, we move from NP0 all the way to NP3, generating <em>the fact that</em>. From NP3 we want to move to NP4, but this requires completing the S-edge. So we go to S0 and put NP4 on top of the stack. Our stack is now [NP4, S1], which means that the next final state we reach will take us to NP4 rather than S1. Anyways, we’re back in S0, and in order to go anywhere from here, we have to follow an NP-edge. Sigh. Back to NP0, and let’s put S1 on top of the stack, which is now [S1, NP4, S1]. We make our way from NP0 to NP2, outputting <em>the fact</em>. The total string generated so far is <em>the fact that the fact</em>. NP2 is a final state, and we exit the automaton here. We consult the stack and see that we have to reemerge at S1. So we go to S1 and truncate the stack to [NP4, S1]. From S1 we have to take a VP-edge to get to S2. Alright, you know the spiel: go to VP0, put S2 on top of the stack, giving us [S2, NP4, S1]. The VP-automaton is very simple, and we move straight from VP0 to VP2, outputting <em>surprised me</em> along the way. The string generated so far is <em>the fact that the fact surprised me</em>. VP2 is a final state, so we exit the VP-automaton. The stack tells us to reemerge at S2, so we do just that while popping S2 from the stack, leaving [NP4, S1]. Now we’re at S2, but that’s a final state, too, which means that we can exit the S-automaton and go… let’s query the stack… NP4! Alright, go to NP4, and remove that entry from the stack, which is now [S1]. But you guessed it, NP4 is also a final state, so we go to S1, leaving us with an empty stack. From S1 we have to do one more run through the VP-automaton to finally end up in a final state with an empty stack, at which point we can finally stop. The output of all that transitioning back and forth: <em>the fact that the fact surprised me surprised me</em>.</p>
</blockquote>
<p>But believe it or not, this miracle of exposition can be represented more compactly in the form of a single diagram.</p>
<figure>
<img src="https://outde.xyz/img/thomas/underappreciated_unboundedness_trees/ftn_trace.svg" alt="A graph depicting how the subautomata call each other" /><figcaption>A graph depicting how the subautomata call each other</figcaption>
</figure>
<p>Looks familiar? There, let me rearrange it a bit and add an S at the top.</p>
<figure>
<img src="https://outde.xyz/img/thomas/underappreciated_unboundedness_trees/ftn_tree.svg" alt="OMG, it’s a tree" /><figcaption>OMG, it’s a tree</figcaption>
</figure>
<p><a href="https://www.smbc-comics.com/comic/sob">Son of a gun!</a></p>
<h2 id="trees-computational-traces">Trees = Computational traces</h2>
<p>The graph that we have up there is called a <strong>computational trace</strong>. It is a record of the steps of the computation that lead to the observed output. Computational traces aren’t anything fancy or language-specific, they arise naturally wherever computation takes place.</p>
<p>Computational traces don’t necessarily exhibit tree-like structures. They can just be strings, or they can be more complex objects, e.g. directed acyclic graphs (which a linguist would call multi-dominance trees that can have multiple roots). The interesting thing is that models of syntax inevitably give rise to computational traces that are at least trees. And the reason is once again that syntax pushes us in the direction of factorization, the direction of many small systems that invoke each other. The computational nature of syntax is intrinsically tree-like.</p>
<h2 id="closing-thoughts">Closing thoughts</h2>
<p>So there you have it. Even if syntax may just generate strings, like an FSA or RTN, it nonetheless exhibits tree structure in the accompanying computations. It doesn’t hinge on unboundedness. It doesn’t hinge on the self-embedding property or recursion, either — even if the RTN were just a finite transition network, the process of moving between automata would induce tree structure. And that’s why this is an underappreciated argument: it depends on so little, it avoids all the usual hot-button issues like recursion, and yet it is used so rarely.</p>
<p>Btw, the connection between trees and computation isn’t some fancy new insight. Mark Steedman has long argued for this view of syntactic structure. Heck, trees made their way into generative syntax as a compact way of representing the derivations of context-free grammars. But they also got reified very quickly, changing from records of syntactic computation to the primary data structure. This had the unintended consequence that the connection between trees and computation has slowly fallen into oblivion, and that makes trees look a lot more stipulative to outsiders.</p>
<p>I personally believe that the reification of trees has largely been a bad thing for the field. The original insights got shortened to the dogma that a syntactic formalism that doesn’t produce trees can’t possibly be right, even though the structure of the generated object has no direct connection to the structure of the generation mechanism. The reification of trees has erased that distinction, resulting in an overly narrow space of analytical options. One of the most important developments in computational syntax in the last twenty years was to tease them apart again and study the computational traces independently of what output they produce. This has been a very productive enterprise, and the insights obtained this way suggest that this is really what syntax is about.</p>
<p>It also fits naturally with the computational view of the inverted T-model. The <a href="https://outde.xyz/2019-05-15/underappreciated-arguments-the-inverted-t-model.html">bimorphism perspective</a> puts syntax in the position of an interpolant, a means of succinctly describing a computational system of bidirectional mappings. From this perspective, syntax simply is computation; syntactic structure is computational structure.</p>