Structural insight: conlanging

Showing posts with label conlanging. Show all posts

Tuesday, March 17, 2020

Irregularity in language

you wouldn't believe the kind of hate mail I get about my work on irregular verbs
— Stephen Pinker, in an interview with The Guardian, 2007.

Assembling my prototype conlang Lamlosuo transformed my understanding of irregularity in language. That was unexpected. The prototype was supposed to be a learning vehicle, yes — for learning about the language model I'd devised. Irregularity wasn't mentioned on the syllabus.

I set out to create an experimental prototype conlang with radically different semantic underpinnings than human natural languages (natlangs). (This blog is littered with evidence of my penchant for studying the structure of things by devising alternative structures to contrast with them.) The prototype was meant as a testbed for trying out features for conlangs based on the envisioned semantics; it had no strong stake in regularity, one way or another, aside from an inclination not to deliberately build in irregularities that would make the testbed less convenient to work with. The effect of the experiment, though, was rather like scattering iron filings on a piece of paper above a magnet, and thereby revealing unsuspected, ordinarily-invisible structure. From contemplating the shape of the prototype that emerged, I've both revised my thinking on irregularity in general, and drawn take-away lessons on the character of the language structure the prototype is actually meant to explore.

My first post on Lamlosuo, several years ago now, laid out the premise of the project and a limited set of its structural consequences, while deferring further complications —such as an in-depth discussion of irregularity— to some later post. This post is its immediate sequel, describing major irregular elements of Lamlosuo as they emerged, as well as what I learned from them about irregularity in general and about the language model in particular.

[Overall insights about the language project are largely —though by no means entirely— concentrated in the final section below. Insights into irregularity are distributed through the discussion, as they arise from details of the language.]

Contents
Irregularity
Vector language
Regularity
Routine idiosyncrasies
Patterns of variation
Extraordinary idiosyncrasies
Whither Lamlosuo?

Irregularity

From our early-1970s hardcopy Britannica (bought by my parents to support their children's education), I gathered that commonly used words tend to accumulate irregularities, while uncommonly used words tend to accumulate regularity by shedding their irregularities. From 1990s internet resources on conlanging (published there by scattered conlangers as they reached out through the new medium to form a community), I gathered that irregularity may be introduced into a conlang to make it feel more naturalistic. All of which I still believe, but these credible ideas can easily morph into a couple of major misapprehensions about irregularity, both of which I was nursing by the time I committed to my first conlanging project, at the turn of the century: that the only reason natlangs have irregularity is that natlangs evolve randomly in the field, so that a planned conlang would only have irregularity if the designer deliberately put it there; and that irregularity serves no useful function in a language, so that desire for naturalism would be the only reason a conlang designer would put it there.

20 years later, I'd put my current understanding this way: Irregularity is a natural consequence of the impedance mismatch between the formal structure of language and the sapient semantics communicated through it (a mismatch I last blogged about yonder). Sapient thought structures are too volatile to fit neatly into a single rigid format; large parts of a language, relatively far from its semantic core, may be tolerably regular, but the closer things get to its semantic core, the more often they call for variant structure. It may even be advantageous for elements near the core to be just slightly out of tune with each other, so they create (to use another physics metaphor) a complex interference pattern that can be exploited to slip sapient-semantic notions through the formal structure. Conversely, one may be able to deduce where the semantic core of the language is, from where this effect stirs up irregularity. By similar feedback, also, structural nonuniformities can orient sapient users of the language as they work intensively with the semantic core; I analogize this with the bumps on the F and J keys of a QWERTY keyboard, which allow a touch-typist to feel when their fingers are in standard position.

These effects are likely to apply as well to programming languages, which are ultimately vehicles for sapient thought. Note that the most peculiar symbol names of Lisp are concentrated at its semantic core: lambda, car, cdr.

Vector language

My central goal for this first conlanging project was to entirely eliminate nouns and verbs, in a grammatical sense, by replacing the verb-with-arguments structural primitive of human natlangs with some fundamentally different structural primitive. The verb-with-arguments structural pattern induces asymmetry between the grammatical functions of the central "verb" and the surrounding "nouns", which afaics is where the grammatical distinction between verbs and nouns comes from. (My notes also call these "being-doing" languages, as verbs commonly specify "doing" something while nouns specify simply "being" something.) In the structure I came up with to replace this, each content element would be, uniformly, an act of motion ("going"), understood to entail a thing that goes (the cursor), where it's going from and to, and perhaps some other elements such as the path by which it goes. For the project as a whole I hoped to have several related languages and some grammatical variance between them, but figured I'd need first to understand better how a language of this sort can work, to understand the kinds of variation possible. So I set out to build a prototype language, to serve as a testbed for studying whether-and-how the language model could work.

In the prototype language, there is just one open class of vocabulary words, called vectors, each of which has five participant slots, called roles. The five roles are: cursor, start, end, path, and pivot. The name pivot suggests that the action is somehow oriented about the pivot element, but really the pivot role is a sort of catch-all, a place to put an additional object associated with the action in some way. The pivot role in itself says something about irregularity. In lexicon building, each vector has definitions for each of its occupied roles. Defining all these roles for a given vector, I've found, establishes the meaning of the vector with great clarity. The cursor is the only absolutely mandatory role: there can't be a going without something that goes. The start and end are usually clear. The path is usually fairly straightforward as well, though sometimes occupied by an abstract process rather than a physical route of travel. But each vector is, in the end, semantically unique; and its uniqueness rebels against being pinned down precisely into a predetermined form —I analogize this to the Heisenberg uncertainty principle, where constraining one part of a particle's description requires greater leeway for another part— so that while the cursor, start, and end are usually quite uniform, and the path has limited flexibility, the pivot provides more significant slack to accommodate the idiosyncrasy of each vector.

For example: The first meaning I worked out for the language was a vector meaning speak. This was before the language even had a phonology; it was meant to verify, before investing further in the structural primitive, that it was capable of handling abstracts; and speak, as a meaning in a conlang, was appealingly meta. In a speech act, it seemed the thing that goes from somewhere to somewhere is the message; so I reckoned the cursor should be the message, the thing said. The start would then be the speaker; and the end would be whomever receives it, the audience. It was unclear whether the path would be more usefully assigned to the route by which the message travels, or the transmission medium through which it travels (such as the air carrying sound, or aether carrying radio waves); waiting for a preference to emerge, I toyed with one or the other in my notes but ultimately the path role of that vector has remained unoccupied. For the pivot, I struck on the idea of making it the language in which the message is expressed (such as English — or Lamlosuo).

The "escape-valve" pattern —regularity with an outlet to accommodate variant structure that doesn't neatly fit the regularity— recurred a number of times in the language design as it gradually emerged. The various escape mechanisms accommodate different grades of variant structure, and while the relations between these devices are more complex than mere nesting, the whole reminds me somewhat of a set of matryoshka dolls. With that image in mind, I'm going to try to order my description of these devices from the outside in, from the broadest and mildest irregularities to the narrowest and most extreme.

It's a fair question, btw, where all this emergent structure in the prototype emerges from. It all comes through my mind; the question is, what was I tapping into? (I'll set the origin of the vector primitive itself outside the scope of the question, as the initial inspiration seems rather discontinuous whereas the process after that may be somewhat fathomable.) My intent has been to access the platonic structure of the language model; that's platonic with a lower-case p, meaning, structure independent of particular minds in the same sense that mathematical structure is independent of particular minds. Given the chosen language primitive, I've tried through the prototype to explore the contours of the platonic structural space around that chosen primitive, letting natural eddies in that space shape the design while, hopefully, reducing perturbations from biases-of-thought enough to let the natural eddies dominate. (I also have some things I could say on the relationship between platonic structure and sapient thought, which I might blog about at some point if I can figure out how to address it without getting myself, and possibly even those who read it, hopelessly mired in a quag of perspective bias.)

Regularity

The outermost nesting shell; the outer matryoshka doll, as it were; is, in theory, the entirely regular structure of the language. I shall attempt to enumerate just those parts in this section, as briskly as may be. This arrangement turns out to be somewhat challenging, both because language features aren't altogether neatly arranged by how regular they are, and because the noted concentration of irregularity toward the semantic core assures there will be some irregularity in nearly all remotely interesting examples in Lamlosuo (on top of the limitations of Lamlosuo's thin vocabulary). Much of this material, with a bit more detail on some things and less on others, is included in the more leisurely treatment in the earlier post.

Ordinarily, a syllable has five possible onsets: f s l j w (as in fore, sore, lore, yore, wore); five possible nuclei: i u e o a (close front, close back, mid front, mid back, open; in my idiolect, roughly as in teem, tomb, tame, tome, tom); and two possible codas: n m (as in nor, more). In writing a word, if a front vowel (i or e) is followed by j and another vowel, or if a back vowel (u or o) is followed by w and another vowel, the consonant between those vowels is omitted; for example, lam‍losu‍wo would be shortened to lam‍losu‍o. Two other sounds occasionally arise: an allophone of f, written as t (the initial sound of thorn); and one plosive written as an apostrophe, ' (the initial sound of tore).

A basic vector word consists of an invariant stem and a mandatory class suffix. The stem is two or more consonant-vowel syllables (accent on the first syllable of the stem), and the class suffix is one consonant-vowel syllable. There are eleven classes: the neutral class, and ten genders; a neutral vector is sort-of a lexical verb, an engendered vector is sort-of a lexical noun (though this distinction lacks grammatical punch, as they're all still vectors). The neutral suffix after a back vowel (u or o) is ‑wa, otherwise it's ‑ja (so, the suffix consonant is omitted unless the stem ends with a). Genders identify role (one of the five) and volitionality (volitional or non-volitional). Non-volitional genders use front vowels, volitional genders use back vowels; the onset determines the role: ‑li/‑lu cursor, ‑ti/‑tu start, ‑se/‑so end, ‑je/‑jo path, ‑we/‑wo pivot. Somewhat relevant to irregularity, btw: start and end genders deliberately use different vowels to strengthen their phonological contrast since they have relatively weak semantic contrast; while, on the other hand, an earlier experiment in the language determined that assigning the vowels in consistent sets (either i/u or e/o, never i/o or e/u) is a desirable regularity to avoid confusion.

For example: The vector meaning speak has stem losu-. The neutral form is losua; engendered forms are losuli (message, non-volitional), losutu (speaker, volitional), losuso (audience, volitional), losuo/losue (living language/non-living language). My first thought for the non-volitional pivot, losue, was dead language; but then it occurred to me that that gender would also suit a conlang.

Vector words can also take any of a limited set of prefixes, each of the form consonant-vowel-consonant; as the two coda consonants are very similar (m and n), I try to avoid using two prefixes that differ only by coda. In ideal principle, each prefix would modify its vector in a uniform way. A vector prefix can also be detached from the vector it modifies, to become a preposition.

A simple clause is a chain of vectors, where each pair of consecutive vectors in the chain are connected by means of role alignment. Generically, one puts between the two vectors first a dominant role particle, which specifies a role of the first vector (the dominant vector in the alignment), then a subordinate role particle specifying a role of the second vector (the subordinate vector in the alignment), indicating that the same object occupies those two roles. Ordinarily, the dominant role particles are just the volitional gender suffixes, the subordinate role particles are just the non-volitional gender suffixes, all now as standalone words, except using f rather than t for the start particles. For instance, losua fu li susu‍a would equate the start of losua with the cursor of susu‍a. If a vector is engendered, one may omit its role particle from an alignment, in which case by default it aligns on its engendered role (though an engendered vector can be explicitly aligned on any of its roles). There are also a set of combined role particles, using the usual role consonants with vowel a; a combined role particle aligns both vectors on that role.

Each of the fifteen basic role particles (five dominant, five subordinate, five combined) has a restrictive variant; the distinction being that a non-restrictive alignment asserts a relationship between vectors whose meanings are determined by other means, while a restrictive alignment must be taken into account in determining the meanings of the vectors. Each restrictive role particle prefixes the corresponding non-restrictive particle with its own vowel; thus, ja → a‍ja, etc.

A clause can be packaged up as an object by preceding it with a subordinate content particle. A subordinate content particle is simply a single vowel, as a standalone word. The five subordinate content particles determine the mood of the objectified clause (and can also be used at the front of a sentence to assign a mood to the whole thing): a, indicative; i, invitational; u, imperative; e, noncommittal; o, tentative. Having bundled up a clause as an object, one can then treat it as the subordinate half of a role alignment with a dominant vector. There are also dominant content particles, which package up the dominant vector (just the one vector) as an object to align with some role of the subordinate vector, thus beginning a subordinate relative clause. Dominant content particles prefix ow- to the corresponding subordinate content particles (the w attaches to the second syllable, and then is dropped since preceded by a back vowel) — with a lone exception for the dominant tentative content particle, which by strictly regular construction should be oo but uses leading vowel u (thus, uo) to avoid confusion with the dominant restrictive pivot particle (oo). (In crafting that detail, I was reminded of English "its" versus "it's".)

The image of a subordinate content particle packaging up a subordinate clause and objectifying it for alignment with a dominant role seems to have built into it a phrase-structure view of the situation. Possibly there is a way to view the same thing in a dependency-grammar framework (rather like wave-particle duality in physics); the whole constituency/dependency thing is not yet properly clear to me, and when I designed that part of Lamlosuo I was unaware of the whole controversy: phrase-structure was the only approach to grammar I'd even seen, somewhat in grade-school and intensively in compiler parsing algorithms. So, this particular part of the language design might or might not contain an embedded structural bias.

A provector has a stem of the form vowel-consonant and a class suffix. The provector stems are in- (interrogative), um- (recollective), en- (indefinite), on- (relative), an- (demonstrative). The recollective provector has an antecedent earlier in the clause, and does not align with its syntactic predecessor; where ordinarily alignment can only align a vector with two others (the one before it and the one after it), as antecedent of a recollective provector it can participate in any number of additional alignments. (The demonstrative provector, btw, serves the function of a third-person pronoun, using cursor an‍lu/an‍li in general, volitional start an‍tu for a person of the same sex/gender as the speaker, volitional end an‍so for a person of different sex/gender from the speaker; but I digress.)

A vector can incorporate a simple clause. Position the vector at the front of the simple clause, and join the entire clause together with plosives (') between its words; the whole then aligns as its first vector, with the rest of the incorporated clause aligned to it independent of any other surrounding context. Recollective provectors may be disambiguated by incorporating a copy of the antecedent vector.

Routine idiosyncrasies

Beyond a vector's definitions of its neutral form and up-to-ten genders, each vector has a number of conventions associated with it that accommodate low-to-medium-grade vector-idiosyncrasies of the sort that occur broadly throughout the vocabulary. Role alignment is not as simple as "the object that occupies this role of this vector is the same object that occupies that role of that vector": that isn't always the sort of relation-between-vectors that's wanted, and when it is, there may be refinements needed to clarify what is meant. The meaning of an alignment is resolved primarily by alignment conventions of the dominant vector. My notes on the language design suggest that exceptions to the regular sense of alignment are most often associated with vectors corresponding, in a verb-with-arguments language, to conjunctions and helping verbs.

Combined role particles play a significant part in this because, it turns out, the "standard" meaning of the combined role particles —to align the same role of both vectors, thus la = lu li, sa = so se, etc.— is rarely wanted. The combined role particles are therefore an especially likely choice for reassignment by convention based on more realistic uses of a particular vector. A given vector often has some practical use, due to the particular meaning of that vector, for alignments that involve multiple roles of each vector (as a simple example, one might equate the cursor of both vectors, and at the same time equate the end of the first vector with the start of the second); or, sometimes, for some other more peculiar alignment strategy appropriate to the vector's particular meaning; and combined role particles are routinely drafted for the purpose.

Several rather ordinary vectors have some role that, by the nature of their meaning, is often a complex information structure described by a subordinate clause, and therefore they use the combined role particle on that role to imply a subordinate noncommittal content particle (e): losua la — (say that —), lawa‍ja la — (teach that —), susu‍a wa — (dream that —); sofo‍a (deduce) and soo‍a (imply) do this on multiple roles. A more mundane example of variant alignment conventions (not involving implied content particles) applies to stem seli-, go repeatedly, whose cursor is the set of instances of repeated going. When dominant in an alignment with combined cursor particle la, the subordinate vector is what is done repeatedly (a restrictive alignment); subordinate start, path, and end are assigned to those dominant roles, while subordinate cursor is assigned to dominant pivot. Preceding seli- by a number indicates the number of repetitions; for example, siwe‍a seli‍a la jasu‍a = sneeze three times. (In fact, this can be shortened to siwe seli jasu‍a; see my earlier remark on interesting examples.)

A moderately irregular configuration is two neutral vectors used consecutively in a clause with no explicit particle between them. The strictly regular language assigns no meaning to this arrangement, as there are no gender suffixes on the vectors to determine default roles when omitting the role particles; the configuration has to depend on conventions of the dominant (or, less likely, the subordinate) vector. The language notes stipulate that this type of alignment is restrictive.

Patterns of variation

The alignment idiosyncrasies of particular vectors fall into overall patterns. At the start of Lamlosuo I didn't see this coming, which in retrospect seems part of my general failure to appreciate that irregularity is more than skin deep. With increasing number of vectors explored in the lexicon, though, I began to sense the shapes of these patterns beneath the surface, and then tried to work out what some of them were.

Because these are patterns that arise in other patterns that arise in the language, they compound the ambiguity between (again) the language's platonic structure versus my latent biases of thinking: each lexicon entry is subject to this ambiguity, both in the choice of the entry and in its articulation, while the perception of patterns amongst the data points is ambiguous again. This blog post has a lopsided interest in the platonic structure —my biases would be entirely irrelevant if not for the drive to subtract them from the picture— but I'd recommend here to not stint on even-handed skepticism. Vulnerable as the process is to infiltration by biases of thinking (the phrase "leaky as a sieve" comes to mind), it should be no less vulnerable to infiltration from the platonic structure of the language. Influences from the platonic realm can seep in both directly by perturbing interplay of language elements, and indirectly by perturbing biases of thought at any point in the backstory of the thought. Biased influence can therefore be platonic influence; or, putting the same thing another way, the only biases we'd want to subtract from the picture are those that aren't ultimately caused by the platonic structure. However murky the process gets, I'd still hope for the emergent patterns to carry evidence of the platonic structure.

Very early on, I'd speculated consecutive neutral vectors might align by chaining sequentially, cursor-to-cursor and end-to-start. This in its pure form looked less plausible as the lexicon grew, as it became clear that many vectors were of the wrong form. (For instance, aligning losua susu‍a in this way —susu‍a means sleep— would equate the message with the person who sleeps, and the audience with the act of falling asleep.) Another early notion was that some vectors would be used to modify other vectors, by aligning in parallel with them — equating cursor-to-cursor, path-to-path, start-to-start, end-to-end. I've called these modifiers advectors. Parallel alignment could be assigned, by dominant-vector-driven convention, to consecutive neutral vectors, and perhaps to the combined path particle (ja, which in this case would take on restrictive effect by convention). The sequential/parallel preference also arises in the semantics of more general alignments, such as the sentence (mentioned earlier) losua fu li susu‍a, which describes a speech act and a sleep act, both by the same person (dominant start, the speaker, is aligned with subordinate cursor, the sleeper); to understand the import of the alignment, one has to know whether the speaking and sleeping events take place in parallel (so that the person is speaking while sleeping) or in series (so that the person speaks and then sleeps).

When the merger of two vectors allows their combination to be treated as a single vector, the vector stems may be concatenated directly, forming a compound stem which can then be engendered after the merge. For example, stem lolulelo- means father, sequentially combining lolu- (impregnate) and lelo- (give birth to). According to current notes, btw, lolulelo- has shortened form lole-.

When a series of consecutive neutral vectors form a compact clause, short of merging into a single compact vector, I've considered a convention that the neutral class suffix may be omitted from all but the last of the sequence — "in all but the most formal modern usage", as the language notes say. (Evidently I hesitated over this, as the latex document has a boolean flag in it to turn this language feature on or off; but it's currently on.)

Accumulating vocabulary gradually revealed that pivots generally fell into several groupings. A reference point defining the action (whence the term pivot). Intermediate point on the path. Motivation for the action. Agent causing the action. Instrument. Vehicle. Listing these now, it seems apparent these are the sorts of things that —in a typical European natlang— might well manifest as a clutter of more-or-less-obscure noun cases. I'd honestly never thought of those sorts of clutters-of-noun-cases as a form of intermediate-grade irregularity (despite having boggled at, say, Finnish locatives); and now I'm wondering why I hadn't.

Eventually, I worked out a tentative system of three logical roles —patient, agent, instrument— superimposed on the five concrete roles. These logical roles would map to concrete roles identifying the associated noun primarily affected by the action (patient), initiating the action (agent), and implementing the action (instrument). Of the three, only patient is mandatory; agent and instrument often occur, but sometimes either or both don't. Afaics, agent and instrument are always distinct from each other, but either may map to the same concrete role as patient.

Patient is usually either cursor or end, though occasionally pivot or start; "path patient", say my notes, "is unattested". Agent is usually either cursor, start, or pivot; if the patient is the cursor, usually the agent is either pivot or cursor. Instrument is usually either cursor or pivot: pivot when cursor is agent, cursor when cursor isn't agent. Patient also correlates with natlang verb valency: when a vector corresponds to an intransitive verb, its patient is almost always the cursor, when to a transitive verb, its patient is typically the end.

For some time it remained unclear whether the logical roles should be considered a platonic feature. I've often taken a "try it, see if it works" attitude toward adding things to the language, which is after all meant to be a testbed; the eventual rough indicator of a feature's platonic authenticity (platonicity?) is then how well it takes hold in the language once added. A few of the things I've added just sat there inertly in the language design, until eventually discarded as failing to resonate with the design (such as a vector equivalent of the English verb be; which in retrospect clashes with the Lamlosuo design, both as copula which is what role particles are for, and as pure being whereas vectors impose motion on everything they portray). Given some time to settle in, logical roles appear reasonably successful, having deeply integrated into some inner workings of the language: various sorts of alignments both guide and are guided by logical roles. Alignment guides logical roles, notably, in restrictive sequential or parallel alignments; for example, an advector inherits the logical roles of the other vector in parallel alignment. Logical roles guide alignment in the highly irregular vector(s) at the apparent heart of the language, which I'll describe momentarily.

I wondered about aspect —the structure of an activity with respect to time (as opposed to its placement in time, which is tense)— for the prototype language, since aspect is a prominent feature of human natlangs. Aspect has arisen in Lamlosuo mainly through the principle that the action of a neutral vector is usually supposed by default to happen once, whereas the action of an engendered vector is usually supposed to happen habitually. Thus, in losua fu li susu‍a someone speaks and then sleeps, whereas in losutu li susu‍a a habitual speaker sleeps. Usually, in a restrictive alignment, aspect too is inherited by the dominant vector, which affords some games with aspect by particular vectors (deferred to the next section below). If one wanted more nuanced sorts of aspect in the testbed language, one might introduce them through alignments with particular vectors that exist to embody those aspects; however, I never actually did this. Allowing myself to be guided by whatever "felt" natural to pursue (so one may speculate what sort of butterfly started the relevant breeze), my explorations led me instead to something... different. Not relating a vector to time, but rather taking "tangents" to the basic vector at various points and in various abstract-spatial directions. As the trend became more evident, I dubbed that sort of derived relation attitude. (My language notes assert, within the fictional narrative, that the emphasis on attitude rather than aspect is a natural consequence of the language speakers' navigational mindset.) Some rather mundanely regular particular vectors were introduced to support attitudes; looking through the lexicon, I see stems jali- (leave), jeli- (go continuously), joli- (arrive), supporting respectively the inceptive, progressive, and terminative attitudes.

Extraordinary idiosyncrasies

In any given language, it seems, there's likely to be some particular hotspot in the vocabulary where idiosyncrasies cluster. Hopefully, the location of such a hotspot ought to say something profound about the language model, though as usual there's always potential bias-of-thought to take into account. The English verb be is a serious contender for the most irregular verb in the language, with do coming in a respectable second to round out this semantic heart of the language structure. As noted earlier, I've sometime referred to human languages as "being-doing languages"; and occasionally my notes have called vector languages "going languages". Early on, I simplistically imagined that a generic vector meaning go might be the center of the language. Apparently not, though; in the central neighborhood, yes, but not at the very heart. The stand-out vector that's accumulated irregularity like it's going out of style is fajoa — meaning change state.

A sort of microcosm for this hotspot effect occurs in the finitely bounded set of Lamlosuo's vector prefixes (which, by the phonotactics described earlier, are each consonant-vowel-consonant, so there are at most 5×5×2 = 50 of them, or 25 if no two prefixes differ only in their final consonant; the current lexicon has 12, which is about 50% build-out and feels fairly dense). Most of the prefixes are fairly straightforward in function (since prefix jun- makes a vector reflexive, junlosua would be talk to oneself; and so on). The most exceptional prefix, consistently through the evolution of the language, has been lam-, which makes the vector deictic , i.e, makes it refer to the current situation. The deictic prefix, as I've used it, is rather strongly transformative and I've used it only sparingly, on a few vectors where its effect is especially useful; in particular, stems losu-, sile-, jilu-. (Though I would expect a fluent speaker to confidently use lam- in less usual ways when appropriate, as fluent speakers are apt to occasionally bend their language to the bafflement of L2 speakers.)

Stem lam‍losu- is the speaking in which the word itself occurs. Several of its engendered forms are particularly useful; lamlosuo (volitional pivot) is the living language of the speaking in which the word occurs, hence the conlang itself (viewed fictionally as a living language); lam‍losu‍so (volitional end) is the audience, thus the second-person pronoun; lam‍losu‍tu (volitional start) is the speaker, thus the first-person pronoun. The latter two are contracted (a purely syntactic form of irregularity, motivated by convenience/practicality) to laso and latu.

Stem sile- means experience the passage of time; the cursor is the experiencer; path, time; start, the experiencer's past; end, their future; pivot, the moment of experience. lam‍sile- is the immediate experience-of-time, whose pivot is now; after working with it for a while, I adopted a convention that the past/present/future might colloquially omit the prefix. Tense is indicated by aligning a clause with engendered sile‍tu (past), sile‍wo (present, if one wants to specify that explicitly), or sile‍so (future). Hence, latu fi losua oa sile‍tu = sile‍tu a latu fi losua = I spoke.

Stem jilu- means go or travel in a generic sense (whereas go in a directional sense is wilu-). lam‍jilu‍a is the going we're now engaged in; its cursor is an inclusive first-person pronoun (we who are going together); path, the journey we're all on (i.e, the activity we're engaged in); pivot, here; end (or occasionally start), there. With preposition sum indicating a long path, this enters into the formal phrase sum lam‍sile‍tu sum lam‍jilu‍se: long ago and far away.

Now, fajo-. Change state. The cursor is the thing whose state changes. Non-volitional path is the process of state-change, volitional path is the instrument of state-change. Non-volitional pivot is an intermediate state, volitional pivot is the agent of state-change. Start and end, both non-volitional, are the state before and after the change.

When dominant fajo- aligns its cursor with some role of a subordinate vector, fajo- is the state change undergone by the aligned subordinate role during the action of the subordinate vector. Either the dominant role, the subordinate role, or both may be elided; the dominant role when unspecified defaults to cursor —even if fajo- is engendered, an extraordinary exception— while the subordinate role when unspecified defaults to patient, making the meaning of the construct overtly dependent on which concrete role of the subordinate vector is the patient. Along with all this, the dominant pivot aligns to the subordinate agent, and dominant path to subordinate instrument (when the subordinate vector has those logical roles). According to the language notes, if the subordinate vector doesn't have an agent, and the subordinate pivot is an intermediate point on the subordinate path (as e.g. for sile-), and the subordinate cursor aligns with the dominant cursor, the dominant pivot is the state of the subordinate cursor as it passes through the subordinate pivot.

One thus has such creatures as fajo‍ti losu‍tu, the state of having not yet spoken; and fajo‍se losu‍so, the state of having been spoken to. (Notice that these things take many more words to say in English than in Lamlosuo, whereas the past tense took many more words to say in Lamlosuo than in English.)

Cursor-aligned fajo- can also take the form of a preposition fam or prefix fam-, with the difference between the two that engenderment of the vector is applied after a prefix, but before a preposition. Thus, fam⟨stem⟩⟨gender⟩ = fajo⟨gender⟩ ⟨stem⟩a. For example, susu‍e = event of dreaming, fam‍susu‍e = fajo‍e susu‍a = state of dreaming.

When dominant fajo- aligns its path with a subordinate content clause, fajo- is the state change vector of the complex process described by the content clause. Combined role particle ja initiates a noncommittal content clause by implying subordinate content particle e. The dominant cursor is then the situation throughout the process, dominant start the situation before the process, dominant end the situation after the process, dominant pivot the agent of the process.

fajoa has siblings lajoa and wajoa.

lajoa describes a change of mental state. Dominant path of lajo- doesn't align with a subordinate clause, but dominant cursor aligns similarly to fajo-, describing the change of mental state of whichever participant in the subordinate action; noting, the agent, if not otherwise determined, is the cursor's inclination toward the change (always available in the volitional pivot engendered form, lajo‍o). For example, recalling sile‍tu = past (earlier point in time), where fajo‍ti sile‍a = fam‍sile‍ti = youth (external state at earlier point in time), lajo‍ti sile‍a = inexperience (internal state at earlier point in time). When the subordinate vector already describes the mind, fajo- describes mental state, and lajo- is not used; e.g., fam‍susu‍e = state of dreaming is primarily an internal state.

wajoa describes the abstract process of being used as an instrument. Cursor, instance of use; non-volitional path, process of use; volitional path, person who uses; (volitional/non-volitional) pivot, agent of use; (volitional/non-volitional) start, instrument of use; end, patient of use. Alignment is similar to fajo-, but subordinate role defaults to instrument rather than patient. For example, wajo‍o jilu‍a = person who uses a vehicle or riding beast, wajo‍o jilu‍e = person who uses a vehicle, wajo‍o jilu‍o = person who uses a riding beast.

On the periphery of this central knot of irregularity is jilu-, meaning (again) go or travel in a generic sense. When dominant in an alignment with combined path particle ja, the role particle implies subordinate noncommittal content particle e, and jilu- aligns in parallel (it's an advector) to whatever complex process is described by the following subordinate clause. (I don't group this with the larger family of mundane vectors using combined role particles to imply subordinate content, because here the alignment is implicitly restrictive and doesn't follow from complexity in the semantics of the vector, as with teach (lawa-), imply (soo-), etc.) Here the alignment is purely a grammatical device; it unifies a complex process from the subordinate clause into a coherent vector, and objectifies it as the volitional path (engendered form jilu‍jo). More subtly, jilu‍a ja with an engendered subordinate vector can provide a neutral vector with habitual aspect: jilu‍a wo jeo‍e = go using a fast vehicle (once), jilu‍a ja jeo‍e = habitually go using a fast vehicle.

One can (btw) also play games with habitual aspect in using fajo-, exactly because it doesn't inherit the aspect of the subordinate clause: engendering fajo- gives the state change habitual aspect, but gender in the subordinate clause does not. Thus, latu we fajoa ja susu‍a lu laso = I (once) cause you to (once) sleep; latu we fajoa ja susu‍lu laso = I (once) cause you to habitually sleep; latu fajo‍o ja susu‍a lu laso = I habitually cause you to (once) sleep; latu fajo‍o ja susu‍lu laso = I habitually cause you to habitually sleep. (Why I would have this soporific effect, we may suppose is provided by the context in which the sentence occurs.)

Whither Lamlosuo?

After a while —perhaps a year or more of tinkering— Lamlosuo began to take on an increasingly organic texture. Natlangs draw richness from being shaped by many different people; a personal project, I think, when carried on for a long time starts to accrue richness from essentially the same source: its single author is never truly the same person twice. If you set aside the project and come back to it a week or a month later, you're not the same person you were when you set it aside; beside the additional things you've experienced in that time, most people would also no longer be quite immersed in some project details and would likely develop a somewhat different experience of them while reacquiring. So the personal project really is developed by many people: all the people that its single author becomes during development. This enrichment cannot be readily duplicated over a short time, because the author doesn't change much in a short time. This may be part of why the most impressive conlangs tend to be decades-long efforts; of course total labor adds up, but also, richness adds up.

The most active period of Lamlosuo development tailed off after about three years, due to a two-part problem in the vocabulary — phonaesthetic and semantic.

The phonology and phonotactics of Lamlosuo (whose conception I discussed a bit in the earlier post) are flat-out boring. There are just-about no internal markers indicating morphological structure within a vector stem —even the class suffix is generally hard to recognize as not part of the stem— so there has been a bias toward two-syllable vector stems; it's been my perception that uniformly two-syllable simple stems help a listener identify the class suffix, so that nonuniform stem lengths (especially, odd-syllable-count stems) can be disorienting. There are only a rather small number of two-syllable stems possible (basically, 5⁴ = 625) and, moreover, packing those stems too close together within the available space not only makes them harder to remember, but harder even to distinguish. After a while I reformed the lexicon a bit by easing in some informal principles about distance between stems (somewhat akin to Hamming distance) and some mnemonic similarities between semantically related stems. The most recent version of the language document has 70 simple vector stems.

Semantically, a large part of the extant vocabulary is about the mechanics of saying things — attitude, conjunctions, numbers. One also wants to have something to talk about. Not wanting to build social biases into a vocabulary that didn't yet have a culture attached to it, I started with vocabulary for rather generic biological functions (eat, sleep...) and navigational maneuvers (go fast/slow, go against the current...) on the —naive— theory this would be "safe". Later, with the mechanics-oriented vocabulary more complete, a small second wave of content-oriented words ventured into emotional, intellectual, and spiritual matters. (The notes outline somewhat more ambitious spiritual structure than has been implemented yet; though I do rather like the stems deployed so far (speaking of bias) — fulo-, go wrongly, go contrary to the proper order of things; jolo-, go rightly, go with the proper order of things; wio-, inform emotion with reason; wie-, inspire reason with emotion.)

I did take away some lessons from building content vocabulary for Lamlosuo. The vector approach has a distinctly dynamic effect on the language's outlook, since it doesn't lend itself to merely labeling things but asks for some sort of "going" to underlie each word. This led, for instance, to the coining of two different words for blood, depending on what activity it's engaged in — jesa‍lu (circulating blood) and fesa‍lu (spilt blood). Also, just as the vector concept induces conception of motions for a given noun, the identification of roles for each vector induces conception of participants for a given activity; for instance, in trying to provide a vector corresponding to English adjective fast, one has first advector jeo‍a, go at high speed, from which one then gets jeo‍lu (fast goer), jeo‍e (fast vehicle), jeo‍o (fast riding beast).

The dynamism of everything being in motion is accompanied by a secondary effect playing in counterpoint: whereas human languages tend to provide each verb with a noun actor, Lamlosuo is more concerned to provide each vector with a noun affected. This is a rather subtle difference. The human-language tendency manifests especially in the nominative case (which of course ergative languages don't have, but then, accusative languages are more common); the Lamlosuo tendency is visible in the stipulation that the patient logical role is mandatory while the agent role is optional (keeping in mind, my terms patient and agent for Lamlosuo have somewhat different connotations than those terms as commonly used in human grammar: affected is not quite the same as acted upon). The distinction seems to derive from the relatively calm, measured nature of the vector metaphor for activity: while going is more dynamic than being, it is on balance less dynamic than most forms of doing. (If there's a bias there in my patterns of thought, I'm not sure its effect on this feature could be distinguished from its effect on the selection of the vector primitive in the first place.)

From time to time as Lamlosuo has developed, I've wondered about personal names. If even labeling ordinary classes of things requires the conception of underlying motions, how is one to handle a label meant to signify the unique identity of a particular person? I would resist a strategy for names that felt too much like backsliding into "being-doing" mentality, since much of the point of the exercise is to try something different (and since, into the bargain, any such backsliding would be highly suspect of bias on my part; not that I'd absolutely stonewall such a development, but the case for it would have to be pretty compelling). Early in the development of Lamlosuo, I was able to simply defer the question, as at that point questions about the language that had answers were the exception, and this was just one more in the general sea of unknowns. Lately, though, in closely studying the evolution of abstract thought in ancient Greece (reading Bruno Snell's The Discovery of Mind , as part of drafting a follow-up to my post on storytelling), I'm struck by how Lamlosuo's ubiquity of process sits in relation to Snell's analysis of abstract nouns, concrete nouns, and proper nouns (and verbs, to which Snell ascribes a special part in forming potent metaphors). The larger conlanging project, within which Lamlosuo serves, posits a long timeline of development of the conspeakers' civilization, and as I look at it now this begs the question of how their abstractions evolved. Mapping out the evolution might or might not provide, or inspire, a solution to the naming problem; at any rate, it's deeply unclear at this point what these factors imply for Lamlosuo, as well as for the larger project.

Avoiding cultural assumptions in the vocabulary created a highly clinical atmosphere (which is why I called the hope of culture-neutrality "naive": lack of cultural features is a kind of culture; also note, human culture ought to contain traces from the evolution of abstract thought). Each word tended to be given a rather pure, antiseptic meaning (until late in the game when I started deliberately working in a bit of flavor), heightening a trend already latent in the cut-and-dried mechanics of the language that arose from its early intent, as a testbed, to not bother with naturalism (so, in a way all this traces back to regularity). For example: hoping to insulate the various sex-related vocabulary words from lewd overtones, I set out to fashion an advector corresponding to the English adjective obscene, so that one might then claim the various other words weren't obscene without the advector (which of course amounts to making those other words more clinical). The result took on a life of its own. Advector josu-, do something obscene (with absolutely no implication whatever as to what is done); start agent; end patient; pivot instrument. One is naturally led to consider the difference between a non-volitional instrument and a volitional instrument. Throw in the reflective prefix and, for extra vitriol, an invitational mood particle, and you've got i jun‍josu‍a, which the language notes translate as "FU", but really it's more precise, more... clinical than that.

One natural next major step for Lamlosuo —if there were to be a next major step, rather than moving on to the other languages it was meant to prepare the way for— would be a push to significantly expand the vocabulary, to allow testing the dynamics of larger discourses. (I wrote a long post a while back about long discourses). However, the bland, narrow vocabulary space seemed an obstacle to this sort of major vocabulary-expansion operation. A serious naturalistic conlang would combat this sort of problem partly through the richness that, as noted, comes from developing in many sessions over a long time; but ultimately one also has to mix this with some technological methods. Purely technological methods would always create something with an artificial feel, so one really wants to find ways of using technological methods to amplify whatever sapient richness is input to the system; and that sounds like an appropriate study for a testbed language such as Lamlosuo. Moreover, I just don't readily track all the complex details of a linguistic project like this — not if it's skipping like a pebble across time, with intervals between development sessions ranging from a few hours to a few years; I therefore imagined some sort of automated system that would help keep track of the parts of the language design, noting which parts are more, or less, conformant to expected patterns — and why. (I'm very much aware that, in creating such designs, to maintain an authentic sapient pattern you need to be able to explain an exception just once and not have the system keep hounding you about it until you give the answer the automated system favors.)

And at this point, things take an abrupt turn toward fexprs. (Cf. the law of the instrument.) My internal document describing the language is written in LaTeX. Yet, as just described, I'd like it to do more, and do it ergonomically. As it happens, I have a notion how to approach this, dormant since early in the development of my dissertation: I've had in mind that, if (as I've been inclined to believe for some years now) fexprs ought to replace macros in pretty much all situations where macros are used, then it follows that TeX, which uses macros as its basic extension mechanism, should be redesigned to use fexprs instead. LaTeX is a (huge) macro package for TeX.

So, Lamlosuo waits on the speculative notion of a redesign of TeX. It seems I ought to come out of such a redesign with some sort of deeper understanding of the practical relationship between macro-based and fexpr-based implementations, because Knuth's design of TeX is in essence quite integrated — a daunting challenge to contemplate tampering with. (One also has to keep in mind that the extreme stability of the TeX platform is one of its crucial features.) It's rather sobering to realize that a fexpr-based redesign of TeX isn't the most grandiose plan in my collection.

Posted by John Shutt at 7:57 PM No comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: conlanging, linguistics

Saturday, September 22, 2018

Discourse and language

"Of course, we practiced with computer generated languages. The neural modelers created alien networks, and we practiced with the languages they generated."

Ofelia kept her face blank. She understood what that meant: they had created machines that talked machine languages, and from this they thought they had learned how to understand alien languages. Stupid. Machines would not think like aliens, but like machines.

— Remnant Population , Elizabeth Moon, 1996, Chapter Eighteen.
A plague o' both your houses!
— Mercutio, Romeo and Juliet , William Shakespeare, circa 1591–5, act 3 scene 1.

I've realized lately that, for my advancement as a conlanger, I need to get a handle on discourse, by which I mean, the aspects of language that arise in coordinating texts beyond the scope of the ordinary grammatical rules of sentence formation. Turns out that's a can of worms; in trying to get a handle on discourse, I find myself confronting what are, as best I can work out, some of the messiest controversies in linguistics in the modern era.

I think conlanging can provide a valuable service for linguistics. My purpose in this post is overtly conlinguistic: for conlanging theory, I want to understand the linguistic conceptual tangle; and for conlanging practice, I want methodology for investigating the discursive (that would be the adjectival form of discourse) dynamics of different language arrangements. But linguistics —I've in mind particularly grammar— seems to me to have gotten itself into something of a bind, from a scientific perspective. It's got a deficit of falsifiable claims. Since we've a limited supply of natural languages, we'd like to identify salient features of the ones we have; but how can we make falsifiable claims about which features matter unless we have an unconstrained space of possibilities within which we can see that natural languages are not randomly distributed? We have no such unconstrained space of possibilities; it seems we can only define a space of possibilities by choosing a particular model of how to describe language, and inability to choose between those models is part of the problem; they're all differently constrained, not unconstrained. Conlanging, though, lets us imagine possibilities that may defy all our constrained models — if we don't let the models constrain our conlanging — and any tool that lets us do conlang thought-experiments without choosing a linguistic model should bring us closer to glimpsing an unconstrained space of possibilities.

As usual, I mean to wade in, and document my journey as well as my destination; such as it is. In this case, though a concrete methodology is my practical goal, it will be all I can manage, through mapping the surrounding territory (theoretical and practical), to see the goal more clearly; actually achieving the goal, despite all my striving for it here, will be for some future effort. The search here is going to take rather a lot of space, too, not least when I start getting into examples, since it's in the nature of the subject —study of large language structures— that the examples be large. If one wants to get a handle on a technique suitable for studying larger language structures, though, I reckon one has to be willing to pay the price of admission.

Contents
Advice
Academe
Gosh
Easy as futhorc
Revalency
Relative clauses
The deep end

Advice

Some bits of advice I've picked up from on-line conlangers:

The concept of morphemes — word parts that compose to make a word form, like in‑ describe ‑able → indescribable — carries with it a bunch of conceptual baggage, about how to think about the structure of language, that is likely counterproductive to inject into conlanging. David J. Peterson makes this point.
Many local features of language are best appreciated when one sees how they work in an extended discourse. This has been something of a recurring theme on the Conlangery Podcast, advice they've given about a variety of features.
Ergativity, a feature apparently fascinating to many conlangers, may not even be a thing. I was first set on to this objection by the Conlangery Podcast, who picked it up from the linguistic community where it occurred in, afaict, the keynote address of a 2005 conference on ergativity.

The point about morphemes is that they are just one way of thinking about how word forms arise. In fact, they are an unfalsifiable way to think about it: anything that language does, one can invent a way to describe using morphemes (ever hear of a subtractive morpheme?). That might be okay if you're trying to make sense of the overwhelming welter of natural languages, but doesn't make it a good way to construct a language; at least, not a naturalistic artlang. If you try to reverse-engineer morphemic analysis to construct a language, you'll end up with a conlang whose internal structure is the sort of thing most naturally described by morphemic analysis. It's liable to feel artificial; which, from a linguistic perspective, may suggest that morphemic analysis isn't what we're doing when we use language.

There are a couple of alternatives available to morphemic analysis. There's lexeme-based morphology; that's where you start with a "stem" and perform various processes on it to get its different word forms. For a noun that's called declining, for a verb it's conjugating. The whole collection of all the forms of the word is called a lexeme; yes, that's another -eme word for an abstract entity defined by some deeper structure (like a phoneme, abstracted from a set of different allophones that are all effectively the same sound in the particular language being studied; or a morpheme that is considered a single abstract entity although it make take different concrete forms in particular words, like English plural -s versus -es that might be analyzed as two allomorphs of the same morpheme; though afaik there's no allo- term corresponding to lexeme). The third major alternative approach is word-based morphology, in which the whole collection of word forms is considered as a set, rather than trying to view it as a bunch of different transformations applied to a stem. For naturalistic artlanging — or for linguistic experimentation — word-based morphology has the advantage that it doesn't try to artificially impose some particular sort of internal structure onto the word; but then again, not all natlangs are equally chaotic. For example, morpheme-based morphology is more likely to make sense if you're studying an agglutinative language (which prefers to simply concatenate word parts, each with a single meaning), while becoming more difficult to apply with a fusional language (where a single word part can combine a whole bunch of meanings, like a Latin noun suffix for a particular combination of gender case and number).

As I've tried to increase my sophistication as a conlanger, more and more I've come up against things for which discourse is recommended. But, I perceive a practical problem here. Heavyweight conlangers tend to be serious polyglots. Such people tend to treat learning a new language as something done relatively casually. Study the use of this feature in longer discourses, such a person might suggest, to get a feel for how it works. But to do that properly, it seems you'd have to reach a fairly solid level of comfort in the language. Not everyone will find that a small investment. And if you want to explore lots of different ways of doing things, multiplying by a large investment for each one may be prohibitive.

So one would really like to have a method for studying the discursive properties of different language arrangements without having to acquire expensive fluency in each language variant first. Keeping in mind, it's not all that easy even to give one really extended example of discourse, nor to explain to the reader what they ought to be noticing about it.

Okay, so, what's the story with ergativity? Here's how the 2005 paper explains the general concern:

when we limit a collection to certain kinds of specimens, there is a question whether a workshop on "ergativity" is analogous to an effort to collect, say, birds with talons — an important taxonomic criterion —, birds that swim — which is taxonomically only marginally relevant, but a very significant functional pattern —, or, say, birds that are blue, which will turn out to be pretty much a useless criterion for any biological purpose.
— Scott Delancey, "The Blue Bird of Ergativity", 2005.

The paper goes on to discuss particular instances of ergativity in languages, and the sense I got in reading these discussions was (likely as the author intended) that what was going on in these languages was really idiosyncratic, and calling it "ergativity" didn't begin to do it justice.

Now, another thing often mentioned by conlangers in the next breath after ergativity is trigger languages. Trigger languages are yet another way of marking the relations between a verb and its arguments, different again from nominative-accusative languages (like English) or ergative-absolutive languages (like Basque). There's a catch, though. Trigger languages are a somewhat accidental invention of conlangers. They were meant to be a simplification of Austronesian alignment — but (a) there may have been some misunderstanding of what linguists were saying about how Austronesian alignment works, and (b) linguists are evidently still trying to figure out how Austronesian alignment works. From poking around, the sense I got was that Austronesian alignment is only superficially about the relation of the verb to its arguments, really it's about some other property of nouns — specificity? — and, ultimately, one really ought to... study its use in longer discourses, to get a feel for how it works. Doh.

Another case where exploratory discourse is especially recommended is nonconfigurationality . Simply put, it's the property of some languages that word order isn't needed to indicate the grammatical relationships of words within a sentence, so that word order within a sentence can be used to indicate other things — such as discursive structure. Here again, though, there's a catch. There's a distinct difference between "configurational" languages like English, where word order is important in determining the roles of words in a sentence (the classic example is dog bites man versus man bites dog), and "nonconfigurational" languages like ancient Greek (or conlang Na'vi), where word order within a given sentence is mostly arbitrary. However, the massively polysyllabic terms for these phenomena, "configurational" and "nonconfigurational", come specifically from the phrase-structure school of linguistics. That's, more-or-less, the Chomskyists. Plenty of structural assumptions there, with a side order of controversy. So, just as you take on more conceptual baggage by saying "bound morpheme" than "inflection", you take on more conceptual baggage by saying "nonconfigurational" than "free word order".

Is there another way of looking at "nonconfigurationality", without the phrase-structure? The Wikipedia article on nonconfigurationality notes that in dependency grammar, the configurational/‌nonconfigurational distinction is meaningless. One thing about Wikipedia, though: more subtle than merely "don't take it as gospel", you should think about the people who wrote what you're reading. In this case, some digging turns up a remark in the archives of Wikipedia's Linguistics project, to the effect that, we really appreciate your contributions, but please try for a more balanced presentation of different points of view, as for example in the nonconfigurationality article you spend more time talking about how dependency grammar doesn't like it than actually talking about the thing itself.

It seems Chomskyism is not the only linguistic school that has its enthusiasts.

The thought did cross my mind, at this point, that if dependency grammar fails to acknowledge the existence of a manifest distinction between configurational and nonconfigurational languages, that's not really something for dependency grammar to brag about. (Granting, this failure-to-acknowledge may be based on a narrower interpretation of "nonconfigurationality" than the phrase-structurists actually had in mind.)

In reading up on morpheme-like concepts, I came across the term phonaestheme — which caught my attention since I was aware J.R.R. Tolkien's approach to language, both natural and constructed, emphasized phonaesthetics . A phonaestheme is a bit of the form of a word that's suggestive of the word's meaning, without actually being a "unit of meaning" as a morpheme would be; that is, a word isn't composed of phonaesthemes, it just might happen to contain one or more of them. Notable phonaesthemes are gl- for words related to light or vision, and sn- for words related to the nose or mouth. Those two, so we're told, were mentioned two and a half millennia ago, by Plato.

The whole idea of phonaesthemes flies in the face of the principle of the arbitrariness of signs. More competing schools of thought. This is a pretty straightforward disagreement: Swiss linguist Ferdinand de Saussure, 1857–1913, apparently put great importance on the principle that the choice of a sign is completely arbitrary, absolutely unrelated to its meaning; obviously, the idea that words of certain forms tend to have certain kinds of meanings is not consistent with that.

That name, Ferdinand de Saussure, sounded familiar to me. Seems he was hugely influential in setting the course for twentieth-century linguistics, and he's considered the co-founder, along with C.S. Pierce, of semiotics — the theory of signs.

Semiotics definitely rang a bell for me. Not a particularly harmonious one, alas; my past encounters with semiotics had not been altogether pleasant.

Academe

Back around 1990, when I first started thinking about abstraction theory, I did some poking around to get a broad sense of who in history might have done similar work. There being no systematic database of such things (afaik) on the pre-web internet, I did my general poking around in a hardcopy Encyclopædia Britannica. Other than some logical terminology to do with defining sets, and specialized use of the term abstraction for function construction in λ-calculus, I found an interesting remark that (as best I can now recall) while Scholasticism, the dominate academic tradition in Europe during the Dark Ages, was mostly concerned with theological questions, one (or did the author claim it was the one?) non-theological question Scholastics extensively debated was the existence of universals — what I would tend to call "abstractions". There were three schools of thought on the question of universals. One school of thought said universals have real existence, perhaps even more real than the mundane world we live in; that's called Platonism, after Plato who (at least if we're not misreading him) advocated it. A second school of thought said universals have no real existence, but are just names for grouping things together. That's called nominalism; perhaps nobody believes it in quite the most extreme imaginable sense, but a particularly prominent representative of that school was William of Ockham (after whom Occam's razor is named). In between these two extremes was the school of conceptualism, saying universals exist, but only as concepts; John Locke is cited as representative (who wrote An Essay Concerning Human Understanding, quoted at the beginning of the Wizard Book for its definition of abstraction).

That bit of esoterica didn't directly help me with abstraction theory. Many years later, though, in researching W.V. Quine's dictum To be is to be the value of a variable (which I'd been told was the origin of Christopher Strachey's notion of first-class value), when I read a claim by Quine that the three early-twentieth-century schools of thought on the foundations of mathematics — logicism, intuitionism, and formalism — were latter-day manifestations of the three medieval schools of thought on universals, I was rather bemused to realize I understood what he was saying.

I kept hoping, though, I'd find some serious modern research relevant to what I was trying to do with abstraction theory. So I expanded the scope of my literature search to my alma mater's university library, and was momentarily thrilled to find references to treatment of semiotics (I'd never heard the term before) in terms of sets of texts, which sounded a little like what I was doing. It took me, iirc, one afternoon to be disillusioned. Moving from book to book in the stacks, I gathered that the central figure in the subject in modern times was someone (whom I'd also never heard of before) by the name of Jacques Derrida. But it also became very clear to me that the material was coming across to me as meaningless nonsense — suggesting that either the material was so alien it might as well have been ancient Greek (I hadn't actually learned the term nonconfigurational at that point, but yes, same language), or else that the material was, in fact, meaningless nonsense.

The modern growth of the internet, all of which has happened since my first literature searches on that subject, doesn't necessarily improve uniformly on what could be done by searching off-line through physical stacks of books and journals in a really good academic library (indeed, it may be inferior in some important ways), but what is freely available on-line can be found with a lot less effort (if you can devise the right keywords to search for; which was less of a problem for off-line searches before the old physical card catalogs were destroyed — but I digress). Turns out I'm not alone in my reaction to Derrida; here are some choice quotes about him from Wikiquote:

Derrida's special significance lies not in the fact that he was subversive, but in the fact that he was an outright intellectual fraud — and that he managed to dupe a startling number of highly educated people into believing that he was onto something.
— Mark Goldblatt, "The Derrida Achievement," The American Spectator, 14 October 2004.
Those who hurled themselves after Derrida were not the most sophisticated but the most pretentious, and least creative members of my generation of academics.
— Camille Paglia, "Junk Bonds and Corporate Raiders : Academe in the Hour of the Wolf", Arion, Spring 1991.
Many French philosophers see in M. Derrida only cause for silent embarrassment, his antics having contributed significantly to the widespread impression that contemporary French philosophy is little more than an object of ridicule. M. Derrida's voluminous writings in our view stretch the normal forms of academic scholarship beyond recognition. Above all — as every reader can very easily establish for himself (and for this purpose any page will do) — his works employ a written style that defies comprehension.
— Barry Smith et al. "Open letter against Derrida receiving an honorary doctorate from Cambridge University", The Times (London), Saturday, May 9, 1992.

This is the intellectual climate in which, in the 1990s, physicist Alan D. Sokal submitted a nonsense article to peer reviewed scholarly journal Social Text , to see what would happen — and it was published (link).

One might ask what academic tradition (if any) Derrida's work came from. Derrida references Saussure. Derrida's approach is sometimes called post-structuralism, as it critiques the structuralist tradition of the earlier twentieth century. Structuralism, I gather, said that the relation between the physical world and the world of ideas must be mediated by the structures of language. (In describing post-structuralism one may cover a multitude of sins with a delicate term like "critique", such as denying that the gap between reality and ideas can be bridged, or denying that there is such a thing as reality.) Structuralism, in turn, grew out of structural linguistics, the theory that language could be understood as a hierarchy of discrete structures — phonemes, morphemes, lexical categories, and so on. Structural linguistics is due in significant part to Ferdinand de Saussure.

It doesn't seem fair to blame Saussure for Derrida. Apparently a large part of all twentieth-century linguistic theory traces back through Saussure. Saussure's tidily structured approach to linguistics does appear to have led to both the Chomskyist and (rather less directly, afaict) the dependency grammar approaches — the phrase-structure approach is also called constituency grammar to contrast with dependency, as the key difference is whether one looks at the parts (constituents) or the connections (dependencies). Despite my suspicion that both of those approaches may have inherited some over-tidiness, I'm not inclined to "blame" Saussure for them, either; it seems to me perfectly possible that the structural strategy may have been a good way to move things forward in its day, and also not be a good way to move things forward from where we are now. The practical question is, where-to next?

That term phonaestheme, which reminded me of phonaesthetics associated with J.R.R. Tolkien? Turns out phonaestheme was coined by J.R. Firth, an English contemporary of J.R.R. Tolkien. Firth was big on the importance of context. "You shall know a word by the company it keeps", he's quoted from 1957. Apparently he favored "polysystematism", which afaict means that you don't limit yourself to just one structural system for studying language, but switch between systems opportunistically. Since that's pretty much what a conlanger has to do — whatever works — I rather like the attitude. "His work on prosody," says Wikipedia somewhat over-sagaciously, "[...] he emphasised at the expense of the phonemic principle". It took me a few moments to untangle that; it says a lot less than it seems; as best I can figure, prosody is aspects of speech sound that extend beyond individual sound units (phonemes), and the phonemic principle basically says all you need are phonemes, i.e., you don't need prosody. So... he emphasized prosody at the expense of the principle that you don't need prosody? Doesn't sound as impressive, somehow. Unsurprisingly, Saussure comes up in the early history of the idea of phonemes.

In hunting around for stuff about discourse, I've been aware for a while there's another whole family of approaches to grammar called functional grammar — as opposed to structural grammar. So the whole constituent/‌dependency thing is all structural, and this is off in a different world altogether. Words are considered for their purpose, which naturally puts a lot of attention on discourse because a lot of the purpose of words has to do with how they fit into their larger context (that's why the advice to consider discourses in the first place). There are a bunch of different flavors of functional grammar, including one — systemic functional grammar — due to Firth's student Michael Halliday; Wikipedia notes that Halliday approaches language as a semiotic system, and lists amongst the influences on systemic functional grammar both Firth and Saussure. (Oh what a tangled web we weave...) I keep hoping to find, somewhere in this tangle, a huge improvement on the traditional —evidently Saussurean— approach to grammar/‌linguistics I'm more-or-less familiar with. Alas, I keep being disappointed to find alien vocabulary and alien concepts, and keep gradually coming to suspect that a lot of what the structural approach can do well (there are things it does well) has been either sidelined or simply abandoned, while at the same time the terminology has been changed more than necessary, for the sake of being different.

It's a corollary to the way new scientific paradigms seek to gain dominance (somewhat relevant previous post: yonder) that the new paradigm will always change more than it needs to. A new paradigm will not succeed if it tries merely to improve those things about the old paradigm that need improvement. Normal science gets its effectiveness from the fact that normal scientists don't have to spend time and effort defending their paradigm, so they can put all that energy into working within the paradigm, and thereby make rapid progress at exploring that paradigm's space of possible research. Eventually this leads to clear recognition of the inadequacies of the paradigm; but even then, many folks will stick to the old paradigm, and we probably shouldn't think too poorly of them for doing so, even though we might think they're being shortsighted in the particular case. Determination to make one or another paradigm work is the wind in science's sails. But, exactly because abandoning the old paradigm for a new one is so traumatic, nobody's going to want to do it for a small reason. And those who do want to do it are likely to want to disassociate themselves from the old paradigm entirely. That means changing way more than necessary. Change for its own sake, far in excess of what was really needed to deal with the problems that precipitated the paradigm shift in the first place.

Another thread in the neighborhood of functional grammar is emergent grammar, a view of linguistic phenomena proposed in a 1987 paper by British-American linguist Paul Hopper. Looking over that paper gave me a better appreciation of the structuralism/‌functionalism schism as a struggle between rival paradigms. As Thomas Kuhn noted, rival paradigms aren't just alternative theories; they determine what entities there are, what questions are meaningful, what answers are meaningful — so followers of rival paradigms can fail to communicate by not even agreeing on what the subject of discussion is. Notably, even Hopper's definition of discourse isn't the same as what I thought I was dealing with when I started. My impression, starting after all with traditional (structural) grammar by default, was that discourse is above the level of a sentence; but for functional grammarians, to my understanding, that sentence boundary is itself artificial, and they'd object to making any such strong distinction between intra-sentence and inter-sentence. Hopper's paper is fairly willing to acknowledge that traditional grammatical notions aren't altogether illusions; its point is that they are only approximations of the pattern-matching reality assembled by language speakers, for whom the structure of language — abstract rules, idioms, literary allusions, whatever — is perpetually a work in progress.

Which sounds great... but, looking through some abstracts of more recent work in the emergent grammar tradition, one gets the impression that much of it amounts to "we don't yet have a clue how to actually do this". So once again, it seems there's more backing away from traditional structural grammar than replacing it; I've sympathy for their plight, as anyone trying to develop an alternative to a well-established paradigm is sure to have a less developed paradigm than their main competition, but that sympathy doesn't change my practical bottom line.

It was interesting to me, looking through Hopper's paper, that while much of it was quite accessible, the examples of discourse were not so much.

Gosh

Fascinating though the broad sweep of these shifting paradigmatic trends may be, it seems kind of like overkill. I do believe it's valuable big picture, but now that we've oriented to that big picture, it seems we ought to come down to Earth a bit if we're to deal with the immediate problem; I started just wanting a handy way to explore how different conlang features play out in extended discourse. As a conlanger I've neither been a great fan of linguistic universals (forsooth), nor felt any burning need to overturn the whole concept of grammatical structure. As I've remarked before, a structural specification of a conlang is likely to be the conlang's primary identity; most conlangs don't have a lot of L1 speakers with which to do field interviews. Granting Kuhn's observation that a paradigm determines what questions and answers are possible, if a linguistic paradigm doesn't let me effectively answer the questions I need to answer to define my conlang, I won't be going in whole-hog for that linguistic paradigm.

Also, as remarked earlier, the various modern approaches — both structural and functional — analyze (natural) language, and there's no evident reason to suppose that running that analysis in reverse would make a good way to construct a language, certainly not if one hopes for a result that doesn't have the assumptions of that analysis built in.

So, for conlanging purposes, what would an ideal approach to language look like?

Well, it would be structural enough to afford a clear definition of the language. It would be functional enough to capture the more free-form aspects of discourse that intrude even on sentences in "configurational" languages like English. In all cases it would afford an easily accessible presentation of the language. Moreover, we would really like it to — if one can devise a way to achieve this — avoid pre-determining the range of ways the conlang could work. It might be possible, following a suitable structuralist paradigm, to reduce the act of building a language to a series of multiple-choice questions and some morpheme entries (or just tell the wizard to use a pseudo-random-number generator), but the result would not be art, just as paint-by-numbers isn't art; and, in direct proportion to its lack of artistry, it would lack value as an exploration of unconstrained language-space. For my part, I see this as an important and rather lovely insight: the art of conlanging is potentially useful to the science of linguistics only to the extent that conlanging is an art rather than a science.

The challenge has a definitional aspect and a descriptive aspect. One way to define how a language works is to give a classical structural specification. This can be relatively efficient and lucid, for the amount of complexity it can encompass. As folks such as Hopper point out, though, it misses a lot of things like idioms, and proverbs, and overall patterns of discourse. Not that they'd necessarily deny the classical structural description has some validity; it's just not absolute, nor complete. We'd like to be able to specify such linguistic patterns in a way that includes the more traditional ones and also includes all these other things, in a spectrum. Trouble is, we don't know how. One might try to do it by giving examples, and indeed with a sufficient amount of work that might more-or-less do the job; but then the descriptive aspect rears its head. Some of these patterns are apparently quite complicated and subtle, and by-example is quite a labor-intensive way to describe, and quite a labor-intensive way to learn, them. Insisting on both aspects at once, definitional and descriptive, isn't asking "too much", it's asking for what is actually needed for conlanging — which makes conlanging a much more practical forum for thrashing this stuff out; an academic discipline isn't likely to reject a paradigm on the grounds that it isn't sufficiently lucid for the lay public. The debatable academic merits of some occult theoretical approach to linguistics is irrelevant to whether an artlang's audience can understand it.

So what we're looking for is a lucid way to describe more-or-less-arbitrary patterns of the sort that make up language, ranging from ordinary sentence structure through large-scale discourse patterns and whatnot. Since large-scale discourse patterns are, afaict, already both furthest from lucidity and furthest from being covered by the traditional structural approach, they seem a likely place to start.

Easy as futhorc

Let's take one of those extended examples that I found impenetrable in Hopper's paper. It's a passage from the Anglo-Saxon Chronicle, excerpted from the entry for Anno Domini 755; that's the first year that has a really lengthy report (it's several times the length of any earlier year). Here is the passage as rendered by Wikisource (as Hopper's paper did not fare well in conversion to html). It's rather sparsely punctuated; instead it's liberally sprinkled with symbol ⁊, shorthand for "and" (even at junctures where there is a period). The alphabet used is a variant of Latin with several additional letters — æ and ð derived from Latin letters, þ and ƿ derived from futhorc runes (in which Anglo-Saxon had been written in earlier times, whose first six runes are feoh ur þorn os rad cen — futhorc ).

⁊ þa geascode he þone cyning lytle werode on wifcyþþe on Merantune ⁊ hine þær berad ⁊ þone bur utan beeode ær hine þa men onfunden þe mid þam kyninge wærun

The point Hopper is making about this passage has to do with the way its verbs and nouns are arranged, which wouldn't have to be arranged that way under a traditional structural description of the "rules" of Anglo-Saxon grammar. Truthfully, coming to it cold, his point fell completely flat for me because only laborious scrutiny would allow me to even guess which of those words are verbs and which are nouns, let alone how the whole is put together. And that is the basic problem, right there: the pattern meant to be illustrated can't be seen without first achieving a level of comfort with the language that may be expensive. If, moreover, you want to consider a whole range of different ways of doing things (as I have sometimes wanted to do, in my own conlanging), the problem is greatly compounded.

Since Hopper's point involves logical decomposition of the passage into segments, he does so and sets next to each its translation (citing Charles Plummer's 1899 translation); as Hopper's paper (at least, in html conversion) rather ran together each segment with its translation, making them hard to separate by eye, I've added tabular format:

Anglo-Saxon English

⁊ þa geascode he þone cyning and then he found the king

lytle werode with a small band of men

on wifcyþþe a-wenching

on Merantune in Merton

⁊ hine þær berad and caught up with him there

⁊ þone bur utan beeode and surrounded the hut outside

ær hine þa men onfunden before the men were aware of him

þe mid þam kyninge wærun who were with the king
Table 1.

I looked at that and struggled to reason out which Anglo-Saxon word contributes what to each segment (and even then it was just a guess). The problem is further highlighted by Hopper's commentary, where he chooses to remark particularly on which bits are verb-initial and which are verb-final — as if I (his presumed interested, generally educated but lay, reader) could see at a glance which words are the verbs, or, as he may have supposed, just see at a glance the whole structure of the thing.

We can glimpse another part of the same elephant from Tolkien's classic 1936 lecture "Beowulf: The Monsters and the Critics", in which he promoted the wonderfully subversive position that Beowulf is beautiful poetry, not just a subject for dry academic scholarship. His lecture has been hugely influential ever since; but my point here is that he was one of those polyglots I was talking about earlier, and was able to appreciate the beauty of the poem because he was fluent in Old English (as well as quite a lot of other related languages, including, of all things, Gothic). I grok that such beauty is best appreciated from the inside; but it really is difficult for mere mortals to get inside like that. One suspects a shortfall of deep fluency even amongst the authors of academic treatises on Beowulf may have contributed significantly to the dryness Tolkien was criticizing. My concern here is that we want to be able to illustrate (and even investigate) facets of the structure of discourses without requiring prior fluency; if these illustrations also contribute to later fluency, that'd be wicked awesome.

The two problems with Table 1 are, apparently, that it's not apparent what's going on with the individual words, and that it's not apparent what's going on with the significant part (whatever that is) of the high-level structure. There's a standard technique meant to explain what the individual words are doing, glossing. There are a couple of good reasons why one would not expect glossing to be a good fit here, but we need to start somewhere, so here's at attempt at an interlinear gloss for this passage:

⁊ þa geascode he þone cyning

and
then
intensive-ask
3rd;sg;past he
3rd;nom;sg the
acc;msc;sg king

and then found he the king

lytle werode

small
instrumental;sg troop
dative;sg

with a small band of men

on wifcyþþe

on/at/about
woman.knowledge
dative;sg

about woman-knowledge

on Merantune

on/in/about
Merton
dative

in Merton

⁊ hine þær berad

and
he
acc;sg there
adverb catch.up.with.by.riding
3rd;sg;past

and him there caught up with

⁊ þone bur utan beeode

and
the
acc;msc;sg room
from.without
adverb bego/surround
3rd;sg;past

and the hut outside surrounded

ær hine þa men onfunden

before
he
acc;sg the
nom;pl man
nom;pl en-find
subjunctive;pl;past

before him the men became aware of

þe mid þam kyninge wærun

who/which/that
with
he
dative king
dative;sg be
pl;past

who with the king were
Table 2.

Okay. Those two reasons I had in mind, why glossing would not be a good fit here, are both in evidence. Basically we get simultaneously too much and too little information.

I remarked in a previous post that it's very easy for glossing to fail to communicate its information ("too little" information). I didn't "borrow" the above gloss from someone else's translation (though I did occasionally compare notes with one); I put it together word-by-word, and got far more out of that than is available in Table 2. The internal structures of some of those words are quite fascinating. Hopper was talking about verb-initial and verb-final clauses, and I was sidetracked by the fact that his English translations didn't preserve the positions of the verbs; I've tried to fix that in Table 2, by tying the translation more closely to the original; but I was also thrown off by the translation "a-wenching", because it gave me the impression that was a verb-based segment. I do like the translation a-wenching rather more than other translations I've found, as it doesn't beat around the bush; I also found womanizing, with a woman, and, just when I thought I'd seen it all, visiting a lady, which forcefully reminded me of Mallory's Le Morte Darthur . The original is a prepositional phrase, with preposition on and object of the preposition wifcyþþe.

I first consciously noticed about thirty years ago that prepositions are spectacularly difficult to translate between languages, an awareness that has shaped the directions of my conlanging ever since. Wiktionary defines Old English (aka Anglo-Saxon) on as on/in/at/among. wifcyþþe is even more fun; not listed as a whole in Wiktionary, an educated guess shows it's a compound whose parts are individually listed — wif, woman/wife, and an inflected form of cyþþu, suggested definitions either knowledge, or homeland (country which is known to you). So the king was on about woman-knowledge. Silly me; I'd imagined that "biblical knowledge" thing was a euphemism for the sake of Victorian sensibilities, which perhaps it was in part, but the origin is at least a thousand years earlier and not apparently trying to spare anyone's sensibilities. It also doesn't involve any verb, so I adjusted the translation to reflect that.

The declension of cyþþu was rather insightful, too. The Wiktionary entry is under cyþþu because that's the nominative singular. The declension table has eight entries; columns for singular and plural, rows for nominative, accusative, genitive, and dative; no separate row for the instrumental case, though instrumental does show up separately for some Old English nouns. But here's the kicker: cyþþe is listed in all the singular entries except nominative, and as an alternative for the plural nominative and accusative. I've listed it as dative singular, because in this context (as best I can figure) it has to be dative to be the object of the preposition, and as a dative it has to be singular, but that really isn't an intrinsic property of the word. It really seems very... emergent. This word is somewhere in an intermediate state between showing these different cases and not showing them. Putting it another way, the cases themselves are in an intermediate state of being: the "reality" of those cases in the language depends on the language caring about them, and evidently different parts of the language are having different ideas about how "real" they should be (in contrast to unambiguous, purely regular inflections in non-naturalistic conlangs such as Esperanto or, for that matter, my own prototype Lamlosuo).

There's also rather more going on than the gloss can bring out in geascode and berad, which involve productive prefixes ge- and be- added to ascian and ridan — ge-ask = discover by asking/demanding (interrogating?), be-ride = catch up with by riding. All that, beneath the level of what the gloss brings out — as well as the difficulty the gloss has bringing it out. The gloss seems most suited to providing additional information when focusing in on what a particular word is doing within a particular small phrase; it can't show the backstory of the word ("too little" information) at the same time that it clutters any attempt to view a large passage ("too much" information; the sheer size of Table 2 underlines this point). Possibly, for this purpose, the separate line for the translation is largely redundant, and could be merged with the gloss to save space; but there's still too much detail there. The next step would be to omit some of the information about the inflections; but this raises the question of just which information about the words does matter for the sort of higher-level structure we're trying to get at.

Here's a compactified form based on the gloss, merging the gloss with the translation and omitting most of the grammatical notes.

⁊ þa geascode he þone cyning

and
then
ge-asked
(found) he
(nominative) the
(accusative) king

lytle werode

with a small
(instrumental) band of men
(dative)

on wifcyþþe

on/about
woman.knowledge
(dative; wenching)

on Merantune

in
Merton
(dative)

⁊ hine þær berad

and
him
(accusative) there
(adverb) be-rode
(caught up with)

⁊ þone bur utan beeode

and
the
(accusative) hut
outside
(adverb) be-goed
(surrounded)

ær hine þa men onfunden

before
him
(accusative) the men
(nominative) en-found
(noticed)

þe mid þam kyninge wærun

who
with
the
(dative) king
(dative) were

Table 3.

Imho this is better, bringing out a bit more of the most important low-level information, less of the dispensable low-level clutter, and perhaps leaving more opportunity for glimpses of high-level structure. In this particular case, since the higher-level structure Hopper wants to bring out is simply where the verbs are, one might do that by putting the verbs in boldface, thus:

⁊ þa geascode he þone cyning

and
then
ge-asked
(found) he
(nominative) the
(accusative) king

lytle werode

with a small
(instrumental) band of men
(dative)

on wifcyþþe

on/about
woman.knowledge
(dative; wenching)

on Merantune

in
Merton
(dative)

⁊ hine þær berad

and
him
(accusative) there
(adverb) be-rode
(caught up with)

⁊ þone bur utan beeode

and
the
(accusative) hut
outside
(adverb) be-goed
(surrounded)

ær hine þa men onfunden

before
him
(accusative) the men
(nominative) en-found
(noticed)

þe mid þam kyninge wærun

who
with
the
(dative) king
(dative) were

Table 4.

Hopper's point is, broadly, that this follows the pattern of "a verb-initial clause, usually preceded by a temporal adverb such as a 'then'; [...] [which] may contain a number of lexical nouns introducing circumstances and participants [...] followed by a succession of verb-final clauses". And indeed, we can now see that that's what's going on here.

The technique used by Table 4, with some success, also has a couple of limitations. (1) It is specific to this one type of structure, with no apparent generalization. (2) It appears to be a means only for showing the reader a pattern that the linguist already recognizes, rather than for the linguist to discover patterns, or, even more insightfully, for the linguist to experiment with how the high-level dynamics would be changed by an alteration in the rules of the language. Are those other things too much to ask? Heck no. Ask, otherwise ye should expect not to receive.

Revalency

For a second case study to move things forward, I have in mind something qualitatively different; not a single extended passage with some known property(-ies) to be conveyed to the reader, but a battery of examples exploring different ways to arrange a language, meant to be exhaustive within a limited range of options. It is, in fact, a variant of the case study that set me on the road to the discourse-representation problem. About ten years ago, after first encountering David J. Peterson's essay on Ergativity, I dreamed up a verb alignment scheme, alternative to nominative-accusative (NA) or ergative-absolutive (EA), called valent-revalent (VR), and was curious enough to try to get a handle on it by an in-depth systematic comparison with NA and EA. The attempt was both a success and a failure. I learned some interesting things about VR that were not at all apparent to start with, but I'm unsure how far to credit the lucidity of the presentation — by which we want to elucidate things for both the conlanger and, hopefully, their audience — for those insights; it seems to some extent I learned those things by immersing myself in the act of producing the presentation. I also came away from it with a feeling of artificiality about VR, but it's taken me years to work out why; and in the long run I didn't stay satisfied with the way I'd explored the comparison between the systems, which is part of why I'm writing this blog post now.

First of all, we need to choose what form our illustrations will take — that is, we have to choose our example "language". Peterson's essay defines a toy conlang — Ergato — with only a few words and suffixes so that simply working through the examples, with a wide variety of different ways for the grammar to work, is enough to confer familiarity. I liked his essay and imitated it, using a subset of Ergato for an even smaller language, to illustrate just the specific issues I was interested in. Another alternative, since we're trying to explore the structure itself, might be to use pseudo-English with notes, like the translational gloss in the previous section but without the Old English at the top. Some objections come to mind, though pseudo-English is well worth keeping handy in the toolkit. The pseudo-English may be distracting; Ergato is, gently, more immersive. The pseudo-English representation would be less compact than Ergato. And a micro-conlang Ergato has more of the fun of conlanging in it.

The basic elements of reduced Ergato:

Verbs Nouns Pronoun

English Ergato

to sleep
to pet
to give sapu
lamu
kanu

English Ergato

panda
woman
book
man
fish palino
kelina
kitapo
hopoko
tanaki

English Ergato

she li

Conjunction

English Ergato

and i

Suffixes Suffixes

English Ergato

Valency Reduction
Past Tense
Plural
-to
-ri
-ne

English Ergato

Default Case
Special Case
Recipient/Dative Case
Oblique Case
Extra Case —
-r
-s
-k
-m

Table 5.

Peterson's essay had more verbs, especially, so he could explore various subtle semantic distinctions; for the structures I had in mind, I just chose one intransitive (sleep), one transitive (pet), and one ditransitive (give).

Quick review: NA and EA concern core thematic roles of noun arguments to a verb: S=subject, the single argument to an intransitive verb; A=agent, the actor argument to a transitive verb; P=patient, the acted-upon argument to a transitive verb. In pure NA and pure EA, two of the three core thematic roles share a case, while one is different; in pure NA, the odd case is accusative patient, the other two are nominative; in pure EA, the odd case is ergative agent, the other two are absolutive. There are other systems for aligning verb arguments, but I was, mostly, only looking at those two and VR.

Word order was a question. Peterson remarks that he finds SOV (subject object verb) most natural for an ergative language, and I find that too. (I'll have a suggestion as to why, a bit further below.) I'm less sure of my sense that SVO (subject verb object) is natural for a nominative language, because my native English is nominative and SVO, which might be biasing me (or, then again, the evolution of English might be biased in favor of SVO because of some sort of subtle affinity between SVO and nominativity). But I found verb-initial order (VSO) far the most natural arrangement for VR. So, when comparing these, should one use a single word order so as not to distract from the differences, or let each one put its best foot forward by using different orders for the three of them? I chose at the time to use verb-initial order for all three systems.

Okay, here's how VR works, in a nutshell (skipping some imho not-very-convincing suggestions about how it could have developed, diachronically); illustrations to follow. Argument alignment is by a combination of word order with occasional case-like marking. By default, all arguments have the unmarked case, called valent; the first argument is the subject/agent, the second is the patient, and if it's ditransitive the third is the recipient. Arguments can be omitted by simply leaving them off the end. If an argument is marked with suffix -t, it's in the revalent case, which means that an argument was omitted just before it; the omitted argument can be added back onto the end. To cover a situation that can only come up with a ditransitive verb, there's also a double-revalent case, marked by -s, that means two arguments were omitted. (The simplest, though not the only, reason VR prefers verb-initial order is that, in order to deduce the meaning of an argument from its VR marking, you have to already know the valency of the verb.)

A first step is to illustrate the three systems side-by-side for ordinary sentences. To try to bring out the structure, such as it is, the suffixes are highlighted. This would come out better with a wider page; but we'd need a different format if there were more than three systems being illustrated, anyway.

The woman is sleeping.
The woman is petting the panda.
The woman is giving the book to the panda.

NA EA VR

Sapu kelina.

Lamu kelina palinor.

Kanu kelina kitapor palinos.

Sapu kelina.

Lamu kelinar palino.

Kanu kelinar kitapo palinos.

Sapu kelina.

Lamu kelina palino.

Kanu kelina kitapo palino.

Table 6.

NA	EA	VR
Sapu kelina. Lamu kelina palinor. Kanu kelina kitapor palinos.	Sapu kelina. Lamu kelinar palino. Kanu kelinar kitapo palinos.	Sapu kelina. Lamu kelina palino. Kanu kelina kitapo palino.
Table 6.

Valency reduction changes a transitive verb to an intransitive one. Starting with a transitive sentence, the default-case argument is dropped, the special-case argument is promoted to default-case, the verb takes valency-reduction marking to make it intransitive, and the dropped argument may be reintroduced as an oblique. In NA, this is passive voice; in EA, it's anti-passive voice. VR is different from both, in that the verb doesn't receive a valency-reduction suffix at all (in fact, I chose revalent suffix -t on the premise that it was descended from a valency-reduction suffix that somehow moved from the verb to one of its noun arguments), and both the passive and anti-passive versions are possible.

The woman is petting the panda.
The woman is petting.
The panda is being petted.
The panda is being petted by the woman

NA EA VR

Lamu kelina palinor.

Lamuto palino.

Lamuto palino kelinak.

Lamu kelinar palino.

Lamuto kelina.

Lamuto kelina palinok.

Lamu kelina palino.

Lamu kelina.

Lamu palinot.

Lamu palinot kelina.

Table 7.

NA	EA	VR
Lamu kelina palinor. Lamuto palino. Lamuto palino kelinak.	Lamu kelinar palino. Lamuto kelina. Lamuto kelina palinok.	Lamu kelina palino. Lamu kelina. Lamu palinot. Lamu palinot kelina.
Table 7.

There may be a hint, here, of why Ergativity would have an affinity for SOV word order. In this VSO order, anti-passivization moves the absolutive (unmarked) argument from two positions after the verb to one position after the verb (or to put it another way, it changes the position right after the verb from ergative to absolutive). Under SVO, anti-passivization would move the absolutive argument from after the verb to before it (or, change the position before the verb from ergative to absolutive). But under SOV, the absolutive would always be the argument immediately before the verb.

This reasoning doesn't associate NA specifically with SVO, but does tend to discourage NA from using SOV, since then passivization would move the nominative relative to the verb. On the other hand, considering more exotic word orders (which conlangers often do), this suggests NA would dislike VOS but be comfortable with OVS or OSV, while EA would dislike OVS and OSV but be comfortable with VOS.

Passivization omits the woman. Anti-passivization omits the panda. Pure NA Ergato has no way to omit the woman, pure EA Ergato has no way to omit the panda. Pure VR can omit either one. English can also omit either one, because in addition to allowing passivization of to pet, English also allows it to be an active intransitive verb — "The woman is petting." (One could say that to pet can be transitive or intransitive, or one might maintain that there are two separate verbs to pet, a transitive verb and an intransitive verb; it's an artificial question about the structural description of the language, not about the language itself.)

Table 7 is a broad and shallow study; and by that standard, imho rather successful, as the above is a fair amount of information to have gotten out of it. However, it's too shallow to provide insight into why one might want to use these systems (if, in fact, one would, which is open to doubt since natlangs generally aren't "pure" NA or EA in this sense, let alone VR). A particularly puzzling case, as presented here, is why a speaker of pure EA Ergato would want to drop the panda and then add it back in exactly the same position but with the argument cases changed; but on one hand this is evidently an artifact of the particular word order we've used, and on the other hand Delancey was pointing out that different languages may have entirely different motives for ergativity.

Using VR, it's possible to specify any subset of the arguments to a verb, and put them in any order.

VR English

Kanu kelina kitapo palino.

Kanu kelina palinot kitapo.

Kanu kitapot palino kelina.

Kanu kitapot kelinat palino.

Kanu palinos kelina kitapo.

Kanu palinos kitapot kelina.

The woman is giving the book to the panda.
The woman is giving to the panda the book.
The book is being given to the panda by the woman.
The book is being given by the woman to the panda.
The panda is being given by the woman the book.
The panda is being given the book by the woman.

Kanu kelina kitapo.
Kanu kelina palinot.
Kanu kitapot palino.
Kanu kitapot kelinat.
Kanu palinos kelina.
Kanu palinos kitapot. The woman is giving the book.
The woman is giving to the panda.
The book is being given to the panda.
The book is being given by the woman.
The panda is being given to by the woman.
The panda is being given the book.

Kanu kelina.
Kanu kitapot.
Kanu palinos. The woman is giving.
The book is being given.
The panda is being given to.
Table 8.

And this ultimately is why it fails. You can do this with VR; and why would you want to? In shallow studies of the pure NA and pure EA languages, we could suspend disbelief there would turn out to be some useful way to exploit them at a higher level of structure, because we know those pure systems at least approximate systems that occur in natlangs. But VR isn't approximating something from a natlang. It was dreamed up from low-level structural concerns; there's no reason to expect it will have some higher-level benefit. It's not something one would do without need, either. It requires tracking not just word order, not just case markings, but a correlation between the two such that case markings have only positional meaning about word order, rather than any direct meaning about the roles of the marked nouns, which seems something of a mental strain. It's got no redundancy built into it, and it's perfectly unambiguous in exhaustively covering the possibilities — much too tidy to occur in nature. There's also no leeway in it for the sort of false starts and revisions that take place routinely in natural speech; you can't decide after you've spoken a revalent noun argument to use a different word and then say that instead, because the meaning of the revalent suffix will be different the second time you use it.

It's still a useful experiment for exploring the dynamics of alignment systems, though.

But just a bit further up in scale, we meet a qualitatively different challenge

Relative clauses

Consider relative clauses. In my revalency explorations ten years ago, I seem to have simply chosen a way for relative clauses to work, and run with it. There was a Conlangery Podcast about relative clauses a while back (2012), which made clear there are a lot of ways to do this. Where to start? Not with English; too worn a trail. My decade-past choice looks rather NA-oriented; so, how about an EA language? Lots of languages have bits of ergativity in them — even English does — but deeply ergative languages are thinner on the ground. Here's a sample sentence from a 1972 dissertation on relative clauses in Basque (link).

Aitak irakurri nai du amak erre duen liburua.
Father wants to read the book that mother has burned.

I had a lot more trouble assembling a gloss for this sentence than for the earlier example in Anglo-Saxon. You might think it would be easier, since Basque is a living language actively growing in use over recent decades, where Anglo-Saxon has been dead for the better part of a thousand years; and since the example is specifically explicated in a dissertation by a linguist — one would certainly like to think that being explicated intensely by a linguist would be in its favor. The dissertation did cover more of these words than Wiktionary did. My main problem was with du/duen; I worked out from general context, with steadily increasing confidence, they had to be finite auxiliary verbs, but my sources were most uncooperative about confirming that.

aitak irakurri nai du amak erre duen liburua

father to read wants mother to burn has done book

(ergative) (infinitive) (desire has) (ergative) (infinitive) (relativized) (absolutive)

Basque is a language isolate — a natlang that, as best anyone can figure, isn't related to any other language on Earth. Suggested origins include Cro-Magnons and aliens.

Basque is thoroughly ergative (rather than merely split ergative — say, ergative only for the past tense). It's not altogether safe to classify Basque by the order of subject object and verb, because Basque word order apparently isn't about which noun is the subject and which is the object; it's about which is the topic and which is the focus; I haven't tackled that to fully grok, but it makes all kinds of sense to me that a language that thoroughly embraces ergativity would also not treat subject as an important factor in choosing word order, since subject in this sense is essentially the nominative case. That whole line of reasoning about why SOV would be more natural for an ergative language than SVO or VSO exhibits, in retrospect, a certain inadequacy of imagination. Also, most Basque verbs don't have finite forms. Sort-of as if most verbs could only be gerunds (-ing). Nearly all conjugation is on an auxiliary verb, that also determines whether the clause is transitive or intransitive — as if instead of "she burned the book" you'd say "she did burning of the book" (with auxiliary verb did). There are also more verbal inflections than in typical Indo-European languages; the auxiliary verb agrees with the subject, the direct object, and the indirect object (if those objects occur). I was reminded of noted conlang Kēlen, which arranges to have, in a sense, no verbs; if you took the Basque verbal arrangement a bit further by having no conjugating verbs at all beyond a small set of auxiliaries, and replaced the non-finite verbs with nouns, you'd more-or-less have Kēlen.

When a relative clause modifies a noun, one or another of the nouns in the relative clause refers to the antecedent — although in Basque the relative clause occurs before the noun it modifies, so say rather one of them refers to the postcedent. In my enthusiastically tidy mechanical tinkering ten years ago, I worried about how to specify such things unambiguously. Basque's solution? Omit the referring word entirely. Which also means you omit all the affixes that would have been on that noun in the relative clause; and Basque really pours on the affixes. So, as a result of omitting the shared noun from the relative clause, you may be omitting important information about its role in the relative clause, thus important information about how the relative clause relates to the noun it modifies, leaving lots of room for ambiguity which the audience just resolves from context. Now that's a natural language feature; I love it.

The 1972 dissertation took time out (and space, and effort) to argue, in describing this omission of the shared noun, that the omitted noun is deleted in place, rather than moved somewhere else and then deleted. This struck me as a good example of what can happen when you try to describe something (here, Basque) using a structure (here, conventional phrase-structure grammar) that mismatches the thing described, and have to go through contortions to make it come out right. It reminded me of debating how many angels can dance on the head of a pin. The sense of mismatch only got stronger when I noticed, early in the dissertation's treatment, parenthetical remark "(questions of definiteness versus indefiniteness will not be raised here)". He'd put lots of attention into things dictated by his paradigm even though they don't correspond to obvious visible features of the language, while dismissing obvious visible things his paradigm said shouldn't matter.

Like I was saying earlier: determination to make one or another paradigm work is the wind in science's sails.

It's tempting to perceive Basque as a bizarre and complicated language. Unremitting ergativity. Massive agglutinative affixing. Polypersonal agreement on auxiliary verbs. Even two different "s" phonemes (that is, in English they're both allophones of the alveolar fricative). I'm given to understand such oddness continues as one drills down into details of the language. The Conlangery Podcast's discussion of Basque notes that it has a great many exceptions, things that only occur in one corner of the language. But, there's something wrong with this picture. All I've picked up over the years suggests there is no such thing as an especially complicated or bizarre natlang. Basque is agglutinative, the simply composable morphological strategy that lends itself particularly well to morpheme-based analysis. The Conlangery discussion notes that Basque verbs are extremely regular. Standard Basque phonology has the most boring imaginable set of vowels (if you're looking for a set of vowels for an international auxlang, and you want to choose phonemes basically everyone on Earth will be able to handle, you choose the same five basic vowel sounds as Basque). From what I understand of the history of grammar, our grammatical technology traces its lineage back to long-ago studies of either Sanskrit, Greek, or Latin, three Indo-European languages whose obvious similarities famously led to the proposal of a common ancestor language. It's to be expected that a language bearing no apparent genetic relationship whatsoever to any of those languages would not fit the resultant grammatical mold. If somehow our theories of grammatical structure had all been developed by scholars who only knew Basque, presumably the Indo-European languages wouldn't fit that mold well, either.

The deep end

All this discussion provides context for the problem and a broad sense of what is needed. The examples thus far, though, have been simple; even the Anglo-Saxon, despite its length. There's not much point charging blindly into complex examples without learning first what there is to be learned from more tractable ones. Undervaluing conventional structural insights seems a likely hazard of the functional approach.

My objective from the start, though, has been to develop means for studying the internal structure of larger-scale texts. Not these single sentences, about which hangs a pervasive sense of omission of larger structures intruding on them from above (I'm reminded (tangentially?) of the "network" aspect of subterms in my posts on co-hygiene). Sooner or later, we've got to move past these shallow explorations, to the deep end of the pool.

We've sampled kinds of structure that occur toward the upper end of the sentence level. (I could linger on revalency for some time, but for this post that's only a means to an end.) Evidently we can't pour dense information into our presentation without drowning out what we want to exhibit — interlinear glosses are way beyond what we can usefully do — so we should expect an effective device to let us exhibit aspects of a large text one aspect at a time, rather than displaying its whole structure at once for the audience to pick things out of. It won't be "automatic", either; we expect any really useful technique to be used over a very wide range of structural facets with sapient minds at both the input and output of the operation — improvising explorations on the input side and extracting patterns insightfully from the output. (In other words, we're looking not for an algorithm, but for a means to enhance our ability to create and appreciate art.)

It would be a mistake, I think, to scale up only a little, say to looking at how one sentence relates to another; that's still looking down at small structures, rather than up at big ones. It would also be self-defeating to impose strict limitations on what sort of structure might be illustrable, though it's well we have some expectations to provide a lower bound on what might be there to find. One limitation I will impose, for now: I'm going to look at reasonably polished written prose, rather than the sort of unedited spoken text sometimes studied by linguists. Obviously the differences between polished prose and unedited speech are of interest — for both linguistics and conlanging — but ordinary oral speech is a chaotic mess of words struggling for the sort of coherent stream one finds in written prose. So it should be possible to get a clearer view of the emergent structure by studying the polished form, and then as a separate operation one might try to branch outward from the relatively well-defined structures to the noisily spontaneous compositional process of speech. The definition of a conlang seems likely to be more about the emergent structure than the process of emergence, anyway.

So, let's take something big enough to give us no chance of dwelling in details. The language has got to be English; the point is to figure out how to illustrate the structure, and a prerequisite to that is having prior insight (prior to the illustrative device, that is) into all the structure that's there to be illustrated. Here's a paragraph from my Preface to Homer post; I've tried to choose it (by sheer intuition) to be formidably natural yet straightforward. I admit, this paragraph appeals to me partly because of the unintentional meta-irony of a rather lyrical sentence about, essentially, how literate society outgrows oral society's dependence on poetic devices.

Such oral tradition can be written down, and was written down, without disrupting the orality of the society. Literate society is what happens when the culture itself embraces writing as a means of preserving knowledge instead of an oral tradition. Once literacy is assimilated, set patterns are no longer needed, repetition is no longer needed, pervasive actors are no longer needed, and details become reliably stable in a way that simply doesn't happen in oral society — the keepers of an oral tradition are apt to believe they tell a story exactly the same way each time, but only because they and their telling change as one. When the actors go away, it becomes possible to conceive of abstract entities. Plato, with his descriptions of shadows on a cave wall, and Ideal Forms, and such, was (Havelock reckoned) trying to explain literate abstraction in a way that might be understood by someone with an oral worldview.

Considering this as an example text in a full-fledged nominative-accusative SVO natlang, with an eye toward how the nouns and verbs are arranged to create the overall effect — there's an awful lot going on here. The first sentence starts out with an example of topic sharing (the second clause shares the subject of the first; that's another thing I explored for revalency, back when), and then an adverbial clause modifying the whole thing; just bringing out all that would be a modest challenge, but it's only a small part of the whole. I count a little over 150 words, with at least 17 finite verbs and upwards of 30 nouns; and I sense that almost everything about the construction of the whole has a reason to it, to do with how it relates to the rest. But even I (who wrote it) can't see the whole structure at once. How to bring it into the light, where we can see it?

The only linguistic tradition I've noticed marking up longer texts like this is incompatible with my objectives. Corpus linguistics is essentially data mining from massive quantities of natural text; in terms of the functions of a Kuhnian paradigm, it's strong on methodology, weak on theory. The method is to do studies of frequencies of patterns in these big corpora (the Brown Corpus, for example, has a bit over a million words); really the only necessary theoretical assumption is that such frequencies of patterns are useful for learning about the language. There is btw, interestingly, no apparent way to reverse-engineer the corpus-linguistics method so as to construct a language. There is disagreement amongst researchers as to whether the corpus should be annotated, say for structure or parts of speech (which does entail some assumption of theory); but annotation even if provided is still meant to support data mining of frequencies from corpora, whereas I'm looking to help an audience grok the structure of a text of perhaps a few hundred words. Philosophically, corpus linguistics is about algorithmically extracting information from texts that cannot be humanly apprehended at once, whereas I'm all about humanly extracting information from a text by apprehension.

We'd like a display technique(s) to bring out issues in how the text is constructed; why various nouns were arranged in certain ways relative to their verbs and to other nouns, say. Why did the first sentence say "the orality of the society" rather than "the society's orality"? The second sentence "Literate society is what happens when" rather than "Literate society happens when" (or, for that matter, "When [...], literate society happens")? More widely, why is most of the paragraph written in passive voice? We wouldn't expect to directly answer these, but they're sorts of things we want the audience to be able to get insight into from looking at our displays.

Patterns of use of personal pronouns (first, second, third, fourth), and/or animacy, specificity, or the like are also commonly recommended for study 'to get a feel for how it works'; though this particular passage is mostly lacking in pronouns.

A key challenge here seems to be getting just enough information into the presentation without swamping it in too much information. We can readily present the text with a few elements —words, or perhaps affixes— flagged out, by means of bolding or highlighting, and show a small amount of text structure by dividing it into lines and perhaps indenting some of them. Trying to use more than one means of flagging out could easily get confusing; multiple colors would be hard to reconcile with various forms of color-blindness, conceivably one might get away with about two forms of flagging by some monochromatic means. But, how to deal with more than two kinds of elements; and, moreover, how to show complex relationships?

One way to handle more complex flags would be to insert simple tags of some sort into the text and flag the tags rather than the text itself. Relationships between the tags, one might try to make somewhat more apparent through the text formatting (linebreaks and indentation).

Trying to ease into the thing, here is a simple formatting of the text, with linebreaks and a bit of indentation.

Such oral tradition can be written down,
and was written down,
without disrupting the orality of the society.
Literate society is what happens when the culture itself embraces
writing as a means of preserving knowledge
instead of an oral tradition.
Once literacy is assimilated,
set patterns are no longer needed,
repetition is no longer needed,
pervasive actors are no longer needed,
and details become reliably stable
in a way that simply doesn't happen in oral society —
the keepers of an oral tradition are apt to believe
they tell a story exactly the same way each time,
but only because they and their telling change as one.
When the actors go away,
it becomes possible to conceive of abstract entities.
Plato, with his descriptions of shadows on a cave wall,
and Ideal Forms,
and such,
was (Havelock reckoned) trying to explain literate abstraction
in a way
that might be understood
by someone
with an oral worldview.

This brings out a bit of the structure, including several larger or smaller cases of parallelism; just enough, perhaps, to hint that there is much more there that is still just below the surface. One could imagine discussing the placement of each noun and verb relative to the surrounding structures, resulting in an essay several times the length of the paragraph itself. No wonder displaying the structure is such a challenge, when there's so much of it.

One could almost imagine trying to mark up the paragraph with a pen (or even multiple colors of pens), circling various words and drawing arrows between them. Probably creating a tangled mess and still not really conveying how the whole is put together. Though this does remind us that there's a whole other tradition for representing structure called sentence diagramming. Granting that sentence diagramming, besides its various controversies, doesn't bring out the right sort of structure, brings out too much else, and is limited to structure within a single sentence; it's another sort of presentational strategy to keep in mind.

Adding things up: we're asking for a simple, flexible way to flag out a couple of different kinds of words in an extended text and show how they're grouped... that can be readily implemented in html. The marking-two-kinds-of-words part is relatively easy; set the whole text in, say, grey, one kind of marked words in black, and a second kind of marked words (better perhaps to choose the less numerous marked kind) in black boldface. For grouping, indentation such as above seems rather clumsy and extremely space-consuming; as an experimental alternative, we could try red parentheses.

Taking this one step at a time, here are the nouns and verbs:

Such oral tradition can be written down, and was written down , without disrupting the orality of the society. Literate society is what happens when the culture itself embraces writing as a means of preserving knowledge instead of an oral tradition. Once literacy is assimilated, set patterns are no longer needed, repetition is no longer needed, pervasive actors are no longer needed, and details become reliably stable in a way that simply doesn't happen in oral society — the keepers of an oral tradition are apt to believe they tell a story exactly the same way each time, but only because they and their telling change as one. When the actors go away, it becomes possible to conceive of abstract entities. Plato, with his descriptions of shadows on a cave wall, and Ideal Forms, and such, was (Havelock reckoned) trying to explain literate abstraction in a way that might be understood by someone with an oral worldview.

Marking that up was something of a shock for me. The first warning sign, if I'd recognized it, was the word "disrupting" in the first sentence; should that be marked as a noun, or a verb? Based on the structure of the sentence, it seemed to belong at the same level as, and parallel to, the two preceding forms of write, so I marked "disrupting" as a verb and moved on. The problem started to dawn on me when I hit the word "writing" in the second sentence, which from the structure of that sentence wanted to be a noun. The word "preserving", later in the sentence, seems logically more of an activity than a participant, so feels right as a verb although one might wonder whether it has some common structure with "writing". The real eye-opener though —for me— was the word "descriptions" in the final sentence. Morphologically speaking, it's clearly a noun. And yet. Structurally, it's parallel with "trying to explain"; that is, it's an activity rather than a participant.

The activity/participant semantic distinction is a common theme in my conlanging. I see this semantic distinction as unavoidable, although the corresponding grammatical and lexical noun/verb distinctions are more transitory. My two principal conlang efforts each seek to eliminate one of these transitory distinctions. Lamlosuo, the one I've blogged about, shuns grammatical nouns and verbs, though it has thriving lexical noun and verb classes. My other conlang, somewhat younger and less developed with current working name Refactor, has thriving grammatical nouns and verbs yet no corresponding lexical classes. (The semantic distinction is scarcely mentioned in my post on Lamlosuo; my draft post on Refactor, not nearly ready for prime time, has a bit more to say about activities and participants.)

In this case, had "descriptions" been replaced by a gerund —which grammatically could have been done, though the prose would not have flowed as smoothly (and why that should be is a fascinating question)— we've already the precedent from earlier in the paragraph of choosing to call a gerund a noun or verb depending on what better fits the structure of the passage. Imagine replacing "descriptions", or perhaps "descriptions of", by "describing". (An even more explicitly activity-oriented transformation would be to replace "with his descriptions of" by "when describing".)

The upshot is that I'm now tempted to think of noun and verb as "blue birds", in loose similarity to Delancey's doubts about ergativity. I'm starting to feel I no longer know what grammar is. Which may be in part a good thing, if you believe (as I do; cf. my physics posts) that shaking up one's thinking keeps it limber; but let's not forget, we're trying to aid conlanging, and the grammar of a conlang is apt to be its primary definition.

Meanwhile, building on the noun/verb assignments such as they are, here's a version with grouping parentheses:

(Such oral tradition ((can be written down), and (was written down )), without (disrupting (the orality (of the society.)))) (Literate society (is what happens (when the culture itself (embraces (writing as a (means of (preserving knowledge))) instead of (an oral tradition.))))) ((Once literacy (is assimilated,)) (set patterns (are no longer needed,)) (repetition (is no longer needed,)) (pervasive actors (are no longer needed,)) and (details (become reliably stable (in a way that simply (doesn't happen (in oral society)))))) — (the keepers (of an oral tradition) (are apt to believe (they (tell (a story) (exactly the same way (each time,))))) but only because ((they and their telling) (change (as one)))). (When (the actors (go (away,))) it (becomes possible (to (conceive of (abstract entities.))))) (Plato, (with his descriptions of (shadows (on a cave wall,)) and (Ideal Forms,) and (such,)) (was (Havelock reckoned) trying to explain literate abstraction (in a way that (might be understood by someone (with an oral worldview.)))))

Maybe I should have been prepared for it this time, after the noun/verb marking shook my confidence in the notions of noun and verb. Struggling to decide where to add parentheses here showing the nested, tree structure of the prose has convinced me that the prose is not primarily nested/tree-structured. This fluent English prose (interesting word, fluent, from Latin fluens meaning flowing) is more like a stream of key words linked into a chain by connective words, occasionally splitting into multiple streams depending in parallel from a common point — very much in the mold of Lamlosuo. Yes, that would be the conlang whose structure I figured could not possibly occur in a natural human language, motivating me to invent a thoroughly non-human alien species of speakers; another take on the anadew principle of conlanging, in which conlang structures judged inherently unnatural turn out to occur in natlangs after all. In fairness, imho Lamlosuo is more extreme about the non-tree principle than English, as there really is an element of "chunking" apparent in human language that Lamlosuo studiously shuns; but I'm still not seeing, in this English prose, nearly the sort of syntax tree that grade-school English classes, or university compiler-construction classes, had primed me to expect. (The tree-structured approach seems, afaict, to derive from sentence diagramming, which was promulgated in 1877 as a teaching method.)

So here I am. I want to be able to illustrate the structure of a largish prose passage, on the order of a paragraph, so that the relationships between words, facing upward to large-scale structure, leap out at the observer. I've acquired a sense of the context for the problem. And I've discovered that I'm not just limited by not knowing how to display the structure — I don't even know what the structure is, not even in the case of my own first language, English. Perhaps the tree-structure idea is due to having looked at the structure facing inward toward small-scale structure rather than outward to large-scale; but I'm facing outward now, and thinking our approach to grammatical structure may be altogether wrong-headed. Which, as a conlanger, is particularly distressing since conlangs tend to use a conventionally structured grammar in the primary definition of the language.

Saturation point reached. Any further and I'd be supersaturated, and start to lose things as I went along. Time for a "reset", to clear away the general clutter we've accumulated along the path of this post. Give it some time to settle out, and a fresh post with a new specific focus can select the parts of this material it needs and start on its own path.

Posted by John Shutt at 9:51 PM No comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: conlanging, linguistics