Thursday, July 6, 2017

What's the relationship between language and thought? The Optimal Semantic Expressivity Hypothesis

(This post came directly out of a conversation with Alex Carstensen. I'm writing a synthesis of others' work, but the core hypotheses here are mostly not my own.)

What is the relationship between language and thought? Do we think in language? Do people who speak different languages think about the world differently? Since my first exposure to cognitive science in college, I've been fascinated with the relationship between language and thought. I recently wrote about my experiences teaching about this topic. Since then I've been thinking more about how to connect the Whorfian literature – which typically investigates whether cross-linguistic differences in grammar and vocabulary result in differences in cognition – with work in semantic typology, pragmatics, language evolution, and conceptual development.

Each of these fields investigates questions about language and thought in different ways. By mapping cross-linguistic variation, typologists provide insight into the range of possible representations of thought – for example, Berlin & Kay's classic study of color naming across languages. Research in pragmatics describes the relationship between our internal semantic organization and what we actually communicate to one another, a relationship that can in turn lead to language evolution (see e.g., Box 4 of a review I wrote with Noah Goodman). And work on children's conceptual development can reveal effects of language on the emergence of concepts (e.g., as in classic work by Bowerman & Choi on learning to describe motion events in Korean vs. English).

All of these literatures provide their own take on the issue of language and thought, and the issue is further complicated by the many different semantic domains under investigation. Language and thought research has taken color as a central case study for the past fifty years, and there is also an extensive tradition of research on spatial cognition and navigation. But there are also more recent investigations of object categorization, number, theory of mind, kinship terms, and a whole host of other domains. And different domains provide more or less support to different hypothesized relationships. Color categorization seems to suggest a simple model where it's faster to categorize different colors because the words help with encoding and memory. In contrast, exact number may require much more in the way of conceptual induction, where children bootstrap wholly new concepts.

The Optimal Semantic Expressivity Hypothesis. Recently, a synthesis has begun to emerge that cuts across a number of these fields. Lots of people have contributed to this synthesis, but I associate it most with work by Terry Regier and collaborators (including Alex!), Dedre Gentner, and to a certain extent the tradition of language evolution research from Kenny Smith and Simon Kirby (also with a great and under-cited paper by Baddeley and Attewell).* This synthesis posits that languages have evolved over historical time to provide relatively optimal, discrete representations of particular semantic domains like color, number, or kinship. Let's call this the optimal semantic expressivity (OSE) hypothesis.** 

What does it mean to say that linguistic representations are optimal? Language users have non-linguistic representations in a particular domain, say color space. Languages map these non-linguistic representations to discrete linguistic expressions that can be used to transmit speakers' representations of objects, events, and relations to listeners. A particular representation is more informative to the extent that it conveys a more precise estimate of the speaker's intended representation. The average informativeness of a particular set of linguistic expressions is roughly a product of the informativeness of the terms across the speakers' distribution of communicative needs. This need distribution governs how frequently particular non-linguistic representations are invoked and how precise the communication must be, for example how often you need to express particular color distinctions, or particular patterns of kinship relationships between individuals. Finally, an optimal set of linguistic representations for a particular domain should be learnable – fewer terms is good, and terms with less complex meanings is also good. 

In sum, an optimal language balances factor 1, informativeness relative to communicative need and factor 2, learnability/complexity. I take OSE to be roughly the view expressed by this chapter by Regier, Kemp, & Kay. This synthesis leads to a number of important predictions, several of which have their own names in the theoretical landscape across fields. The contribution of this post is to help me get these predictions straight, since I think they’ve been under-explored in previous work.

A brief digression first though – one that will become relevant later. For the OSE to make most of its currently testable predictions, non-linguistic representations must be shared across speakers. I'll call this the "semantic stability" assumption. That's for reasons that affect both factors in the OSE calculus. First, if speakers of English and Tagalog don't share the same underlying color space, then what's informative for English speakers (in the technical definition of informativeness) is not the same as what's informative for Tagalog speakers. So then we would need to measure non-linguistic semantic spaces in every language before we could tell what OSE predicted. And then, our predictions about cross-linguistic diversity would be based on cross-linguistic diversity – making the whole thing circular. Second, if non-linguistic representations in a semantic domain are substantially different across cultures (beyond, e.g., the distribution of communicative need, which is clearly going to be different), then we can't make the complexity/learnability predictions that OSE needs, at least not without introducing the same circularity as above. So hold onto this idea as a critical underpinning of the hypothesis.

Now on to predictions of OSE.

OSE in typology. In typology, the prediction is that languages should reflect optima in the broader space of communicative systems defined as above. There are many possible representations of particular semantic systems, but only a small number are realized in human languages. Those that we observe are very close to the optimal boundary between the two factors, informativity and learnability. This prediction has been the focus of multiple tests by Regier and colleagues have tested, most notably in the domains of color and kinship but extending to others as well, using a variety of increasingly more sophisticated operationalizations of this boundary. But semantic typology is hard – it requires work like the World Color Survey or EOSS – so experimental language evolution can be a great tool to test predictions another way.

OSE in language evolution. In a slightly different form, OSE emerges in experimental work on language evolution as well. (It's this convergence between the two literatures – noticed by participants in both, I believe – that piqued my recent interest in the topic). In iterated learning experiments, participants communicate about a particular semantic domain – often a very simple one, like objects with different shapes – repeatedly across "generations," such that the language created by one set of participants is then used to train another. Much of the work in this tradition has focused on how iterated learning can reveal underlying non-linguistic representations. But an important recent paper by Kirby et al. formalizes the idea that linguistic structures (abstractions beyond simple word-object mappings) appear to emerge from the tradeoff again between informativeness and complexity. Without this trade-off you either get degenerate languages that are easy to learn but uninformative, or very informative languages that are not easily learnable. Compositional structure only emerges when these two factors compete. 

Kirby et al. suggest that compositionality (a core feature of language) emerges from the competition of the OSE factors. So that's a big win for OSE. But it is difficult to use this set of tools to consider particular semantic domains. Typically iterated learning experiments focus on novel domains (e.g., novel objects or abstract geometric features) so as to avoid adult participants' knowledge of language biasing the process. Because of this choice, it can be hard to study whether OSE applies in any particular cognitive domain (with the notable exception of color). For that reason, people often turn to developmental research to look at how language interacts with conceptual development.

OSE in development. In development, the OSE hypothesis connects to a number of other important theoretical positions – and these links are (to my mind) underexplored. The first link is to the “core knowledge hypothesis,” the thesis that prelinguistic infants are born with evolved, domain-specific mechanisms for understanding specific content domains. While some claims of the core knowledge hypothesis are controversial, I take the general idea that we have innate domain structure for some semantic domains to be a prerequisite for OSE. For example, particular color partitions being more optimal rely on the fact that our perceptual color space is not symmetric. One point of tension here, however, is how domain-specific the constraints on representation need to be. For example, maybe the relational terms in kinship language are restricted by general constraints on compositional semantic representation rather than domain-specific machinery. But in other domains the domain-specificity claim is likely less controversial; for example, languages likely provide grammatical marking of small exact numbers (e.g., singular/plural or singular/dual/plural systems) due to our ability to perceive the exact quantity of small sets of objects but only the approximate quantity of larger sets (review). 

A second connection in development is to the “typological prevalence hypothesis“ of Gentner and Bowerman. This hypothesis is that the developmental ordering of words/partition labels in language learning should reflect their cross-linguistic prevalence. They state this (very clearly) as follows: "all else being equal, within a given domain, the more frequently a given way of categorizing is found in the languages of the world, the more natural it is for human cognizers, hence the easier it will be for children to learn." Gentner and Bowerman provide some evidence for this hypothesis in the domain of spatial prepositions, and there is also some intriguing positive evidence for color overgeneralizations

At a first glance, the typological prevalence hypothesis fits perfectly with OSE, but there is one issue. On OSE, typological distribution is the product of both learnability and informativity given need, whereas it seems to me that acquisition ordering might plausibly reflect something about learnability and the communicative need distribution, but not so much informativity. That's because there may be some things that are useful to communicate about but also conceptually hard. A hokey example of this is that the concept "ten" is quite informative and quite useful but not very learnable (you have to be pretty good at numbers to reason about ten-ness). In contrast, children can know "three" without yet knowing the meanings of the rest of the count list (they're called "three-knowers"). Yet in adult language, ten gets used as much or more than "three." So it is a bit problematic to map OSE directly to acquisition, even though components of the OSE tradeoff should be related to development.

A third connection, and one that fascinates me personally, is what I'll call the “pragmatic overextension hypothesis.” The idea here is that children's use of words in individual semantic domains is related to what competitors they have in those domains. For example, if you have a word for blue but no word for green, you might be more likely to overextend "blue" based on the absence of a competitor. This hypothesis is described glancingly in a wonderful chapter by Wagner, Tillman, and Barner in which they discuss the relationship between core knowledge and the acquisition of language for complex conceptual domains like number, time, and color. The evidence for pragmatic overextension is probably strongest in the domain of color, where Wagner's work showed pragmatic overextension of meanings before other competing words are learned. There is a lot left to do to test this hypothesis, but if it is true in other domains as well, it suggests that data on children's use of words describing semantic domains is a function not just of their semantics but also their pragmatics (like Eve Clark's classic work on referential overextension) – further complicating the typological prevalence hypothesis. 

Finally, here's a potential challenge to OSE from development. To the extent that there is "Quinian Bootstrapping" in a domain – a representational discontinuity in which a new representational system is created – this domain may violate a core tenet of what makes OSE work. Remember above, when I said that non-linguistic representations needed to be constant for OSE to make testable predictions? This is precisely the assumption that might be violated by bootstrapping. If by bootstrapping all we mean is creating new linguistic representations then that's fine. But if there are new non-linguistic representations, then all of the arguments above about semantic stability go through. If you can create a new non-linguistic representation in a particular language community, then what's informative is different in that community, and so OSE should make different predictions about what words that community needs, etc. The primary example of Quinian Bootstrapping developed by Susan Carey in her book is number knowledge, and this is linguistic bootstrapping (at least that's my argument; some people deny that it's bootstrapping entirely), so that kind doesn't violate semantic stability. But any examples of non-linguistic Quinian bootstrapping – if those are possible! – would be important problems for OSE.

OSE in cross-linguistic cognition. I reviewed the theoretical landscape on cross-linguistic variation and Whorfian effects in my previous post. The "thinking for speaking" idea – that cross-linguistic differences are more apparent in more linguistically- and communicatively-demanding tasks – is quite consistent with OSE. The idea is simply that the "core knowledge" (shorthand for whatever underlying non-linguistic semantic representation we have) is best revealed by non-linguistic tasks, while we tap our variable linguistic representations in more communicative tasks.

In contrast, a stronger and more permanent Whorfian hypothesis – the most viable version of which I called the "habits of mind" hypothesis in my previous post – challenges the OSE quite directly, for the same reasons as non-linguistic Quinian bootstrapping above. I take "habits of mind" to be the claim that continuous use of particular linguistic coding may lead to changes in non-linguistic representations – so, the more you talk using cardinal directions, the better you get at tracking them in general, even in non-linguistic tasks. This pattern doesn't fit OSE. If the underlying non-linguistic representation is altered by practice with language, then again, semantic stability is violated.

Maybe this connection is obvious, but I don't think so. In fact, I missed it almost entirely in my previous post, where I sloppily drew an arc from "habits of mind" to functionalist explanations like OSE. So, let me say it again: If strong Whorfian conclusions are right for a semantic domain, then the predictions of OSE for that domain are not the same as those assumed by the standard typological tests. To the extent that we think that navigation practice really does change people's spatial representations beyond language, we should assume that the typological distribution of large-scale spatial language will be odd from a received OSE perspective. That's because if you practice cardinal directions (which many people find difficult, and hence not easily learnable) they should actually become both more informative and more learnable. And that fact should imply that there is a stable equilibrium of cardinal direction languages that is not predicted by the pattern of data from speakers of non-cardinal direction languages. E.g., on a Kemp and Regier-style approach, cardinal direction languages should be over-represented. Clearly this line of thinking is just speculation, but it seems like an interesting direction for the future.***

Conclusions. OSE is one of the most the most exciting big ideas in Cognitive Science (and let me reiterate again that I'm not taking any credit for it myself!). But as it's emerged, I personally haven't understood how it interacted with other important parts of the literature on language and thought – for example, its critical relationship with the semantic stability assumption. In development, in particular, more work is needed to understand whether OSE contributes meaningfully to describing developmental change.

* One possible source for this hypothesis, given by Regier, Kemp, & Kay is a chapter by Rosch (1978/1999), where she writes that "task of category systems is to provide maximum information with the least cognitive effort." I love this Rosch chapter, but actually wonder if this is little too generous a citation – the form of the claim is right but the content is a little different. In my reading, Rosch is talking about cognitive efficiency, not communication and communicative efficiency; the link to communication here is critical for understanding the mechanism.

** Two notes on OSE. First, I’ve put the word “optimal” in the name of OSE, which some migth not agree with. Optimal here for me is a shorthand for “reflecting an approximation to the normative distribution.” Just as in claims about optimality in cognition, where every subject need not be optimal on every trial, OSE doesn’t need to claim that every language is optimal, only that the distribution of languages approximates the optimal one. Second, OSE is a functionalist hypothesis, in the sense that it proposes that language emerges from (among other things) its efficiency for communication. It's maybe a little weird to call it functionalist in that the semantic variant of OSE that I'm writing about is mostly concerned about lexical systems (color words, spatial prepositions, etc.) and not specific syntactic rules or constructions. Many people working in this tradition seem to assume in some moments that the same arguments go through for syntax, but that's mostly an article of faith right now, I think.

*** The reverse inference here is also kind of interesting, which is that if you can't predict typological distribution from non-linguistic informativity and learnability, then you should assume a strong Whorfian claim. That's a pretty cool prediction, but in practice I think you'd be on thin ice making it...

No comments:

Post a Comment