Pilot Study

Reconstruction beyond proto-languages in the middle Andes

Willem Adelaar

Leiden University image/svg+xml


Andean Proto-Languages
Historical Language Contact
Linguistic Reconstruction


The perception that the numerous similarities in lexicon, phonology and structure which unite the Quechuan and Aymaran language families in the Middle Andes region are due to intensive language contact prior to the stage of their proto-languages, rather than to a common genetic source as was previously assumed, has made it possible to visualize some of the originally inherited characteristics of each of the two linguistic lineages. This new perspective opens up multiple fields of further investigation, for instance, (a) determining the directionality of loan words (mainly from Quechuan to Aymaran, and rarely the other way around); (b) reconstructing the the structural profile of each of the two lineages prior to the beginning of their contact relation; and (c) creating the conditions for a separate external comparison of each lineage with other language families and isolates in the wider surroundings. In more general terms, it now appears possible to access earlier stages in the historical development of the Quechuan and Aymaran than that of the two proto-languages, to locate the original homeland of each lineage in relation to the newly established chronology, and to speculate on the societal context of the initial contact.


The last few decades have witnessed substantial advances in our knowledge of the lesser known languages of the world, including their structure and typological features, their vocabulary and loan relations, as well the social and historical conditions which determine their survival or imminent disappearance (Cf. REHG & CAMPBELL, 2018[1]). Along with the intensification of academic research, methods have been developed to delay or prevent extinction, even though such efforts may not always be successful in the long run. The state of the world’s languages is now thoroughly inventoried and monitored, due to the support of international organizations such as UNESCO and academic as well as private initiatives. Lexical and grammatical research of languages in danger of disappearing, as well as the development of modern means of documentation geared at their preservation have received considerable attention (GIPPERT, HIMMELMANN & MOSEL, 2006[2]).

Nevertheless, research and preservation efforts focusing on the world’s threatened languages tend to address their present situation and their future in terms of documentation, standardization efforts and perspectives of survival. The study of the history of these languages and the developments that led to their genesis, their past existence, their progressive endangerment and their recent or imminent extinction, often in the midst of other languages in equally precarious conditions, has received far less attention. The study of the past of languages belongs to the terrain of historical linguistics, which for some parts of the world can benefit from a wealth of written documentation and historical records. However, in other areas, such as Latin America, the study of the linguistic past is dramatically restricted by a scarcity and incompleteness of data and information, the legacy of massive language shifts and extinctions which took place before the languages in question could be properly documented. In South and Middle America, the disappearance of languages without proper documentation started before the arrival of the European invaders and continued well into the 20th century.

Naturally, there is a limit to our capacity of tracing back the genesis and past development of individual languages throughout time, but the limitations of our knowledge of the linguistic past become even more notable and vexing when the shortage of information concerns recent centuries or decades. Particularly problematic is the case of extinct languages whose former existence is known or suspected but for which there is no significant documentation at all. Should we abandon such languages to oblivion, as if they had never existed, and concentrate on documenting the surviving ones, or is it legitimate to continue and search for data, however minimal and fragmentary, which may help to construct a profile, if not a proper description, of the languages in question? One might consider that, even though languages became extinct, they may have played a role in the formation of languages still spoken today, by transmitting elements in the areas of phonology, morphosyntactic structure and lexicon, through processes of inheritance, borrowing and remodeling.

Detailed knowledge of past language situations can also be important for communities and individuals who seek to reconnect with their historical roots in search of a lost identity. In Andean countries such as Peru and Ecuador, for instance, hundreds of family names originating from poorly known and undocumented extinct languages continue to be in use. In some areas, these names appear in clusters as a legacy of ancestral socio-cultural entities which became invisible, sometimes even in the historical records. The descendants of such entities may be at loss on how to interpret these names. The same holds for place names which in some areas openly reflect the distribution and historical layering of lost languages, while providing information about specific features of their sound system and word-structure. Hybrid place names are proof that communities of surviving languages and extinct languages once shared the same space and influenced each other at a given point of time.

1. Linguistic Diversity in the Middle Andes Region

The Middle Andes region, which roughly coincided with the cordilleran and coastal sections of the former Inca empire at its greatest extension (around 1532), constitutes a striking example of an area with a poorly and unevenly documented linguistic past. According to 16th and 17th century sources, it was a region with a particularly great linguistic diversity (cf. Mannheim 1991). However, most of the languages at issue have vanished. A majority of the indigenous Andean languages spoken today belong to only two language families which both have survived with substantial numbers of speakers: Quechuan (+ 7,000,000 speakers) and Aymaran (+ 2,500,000 speakers). These language families are relatively well documented thanks to numerous linguistic studies, grammars, dictionaries, and text collections, ranging from the 16th century to the present. However, their pre-contact history can only be recovered by family-internal comparison, internal reconstruction, and the study of past contact relations which by default can shed light on some of the past stages of development of these language families. Such techniques have proven to be particularly productive in the case of the Quechuan and Aymaran lineages, which share a long history of sustained and repeated contact with a varying intensity and geographical scope (Cf. CERRÓN-PALOMINO, 2000[3], 2008[4]; EMLEN 2017[5]).

Of the other indigenous languages attested in the Middle Andes, almost all of them extinct, only a handful are documented in any significant way. They include small language families and individual language isolates, such as the Uru-Chipaya group on the Bolivian and Peruvian Altiplano (extinct in the 20th century, except for Chipaya), Puquina in the same area (including the professional jargon of Callahuaya healers in Bolivia, which is partly based on Puquina; Puquina itself became extinct in the early-19th century), Mochica (language of the Peruvian north coast, extinct since the mid-20th century) , Hibito-Cholón (Amazonian language group with historical extensions into the northern Peruvian Andes, extinct in the late-20th century), Esmeraldeño (on the Ecuadorian north coast, extinct in the late-19th century), Atacameño (in northern Chile, extinct around 1900), Lule (in northwestern Argentina, extinct since the late-18th century), and the Huarpean languages Allentiac and Millcayac (in the Cuyo region of Argentina, extinct since the 17th century). These partly documented languages provide a small but appreciable testimony of the impressive linguistic diversity that existed in the Middle Andes only a few centuries ago. For none of these languages and small families external genealogical relations have been attested, except in a tentative manner for Lule (with Vilela) and for Puquina (possibly with the Arawak languages). Connections with Quechuan and Aymaran are limited to borrowings.

Several other languages of the Middle Andes are known from historical records, but remain virtually undocumented. Such languages were mainly found on the coast and in the highlands of Ecuador (Cañar, Cara, Huancavilca, Malacato, Palta, Panzaleo, Puruhá, et al.) and of northern Peru (Chacha, Culli, Quingnam, Sechura, Tallán, et al.); some of them, such as Culli, may have survived well into the 20th century. For a detailed overview of what is known about the extinct languages of the northern Peruvian coast see a recent study by Urban (2019); Jijón y Caamaño (1940-1945) and Paz y Miño (1940-1942, 1961) remain the main sources for the extinct languages of the Ecuadorian highland.

At the northern border of the Middle Andes region, several well-documented languages of the Barbacoan language family are spoken until today, three in the Pacific foothills of Ecuador (Tsafiki, Cha’palaa, Awa Pit), and two more in Colombia (Guambiano and Totoró, in addition to Awa Pit). This family also had ramifications in the Ecuadorian and Colombian highland (the extinct Cara and Pasto languages). At the southern borders of the Middle Andes region, the Mapudungun language, now confined to areas of southern Chile and Argentina, was once also used within the borders of the Inca empire. It still has a substantial number of speakers, although it is also very much endangered (LONCÓN, 2017[6]). Other once important languages of northern Chile and northern Argentina, such as Diaguita (Cacán) and Tonocoté (often mentioned in connection with Lule), were reportedly documented in the 16th century, but the grammars in question are sadly lost.

Finally, a substantial number of languages with predominantly Amazonian roots have been preserved on the eastern counterforts of the Andean cordillera. Some of them exhibit specifically Andean features or traces of sustained interaction with highland languages. Doubtless, the most extreme example of such interaction is that of the Amuesha or Yanesha’ language, a member of the Arawak family, which borrowed a large amount of vocabulary from a Central Peruvian Quechuan language, including more than 60 basic verbal roots. This case of intensive language contact between an Andean and an originally Amazonian language probably pre-dates the consolidation of Inca power in the region. Meanwhile, Yanesha’ also shows traces of contact with Panoan languages and a pre-Arawak substrate of unknown identity, which once must have been prevalent in the Yanesha’ homeland, a witness of the layered complexity of past linguistic contact relations in this pre-Andine region (WISE, 1976[7]; ADELAAR, 2006[8]).

2. Linguistic Reconstruction in the Middle Andes Region

After the assessment of linguistic diversity in the Middle Andes region, we turn to the reconstruction of the past evolution of the two major Andean language groups, Aymaran and Quechuan, and its relevance for our knowledge and understanding of the Andean linguistic past. Although Aymara and Quechua were traditionally referred to as languages with an extensive internal diversification at the geographical micro-level, it has been established since the mid-twentieth century (PARKER, 1963[9]; TORERO, 1964[10]) that part of their internal differences had been systematically underestimated. In fact, they should be treated as language families comprising several distinct languages, rather than as languages subdivided into dialects. Nevertheless, the distinction between language and dialect, particularly in the Quechuan family, remains largely fluid, the reason why some authors prefer to use more neutral terms such as ‘varieties’ [our choice] or ‘lects’, instead of ‘languages’ or ‘dialects. Furthermore, part of the internal differentiation of the Quechuan family pre-dates the arrival of the European invaders by centuries, whereas another part is of colonial or relatively recent origin and linked to Inca expansion, Spanish colonial penetration of new territories, and the subsequent campaigns of evangelization. Spanish colonial administrators used specific varieties of Quechua as a lengua general (‘general language’) and made efforts to artificially standardize it as a unified language (Cf. DURSTON, 2007[11]). It stands to reason that a proper understanding of Andean colonial history is of major importance for the interpretation of the development and diversification of the main Andean language groups, because the effects of colonial interference have been considerable.

For a superficial observer it is not always easy to distinguish between ‘old’ and ‘recent’ diversity, as all the present-day varieties can be viewed a-historically as languages or dialects in their own right in accordance with the observer’s point of view, whereas early colonial documentation of local Quechuan varieties is rather scarce. Essential for the reconstruction of the oldest retrievable stages of the historical development of Quechuan is the establishment of a solid and reliable chronology of the changes and replacements that underlie linguistic differentiation within the Quechuan language family. For that purpose it is necessary to study all the existing varieties in the perspective of their complex interrelations throughout the past and also in their tiniest phonological, morphological, and lexical detail.

As for the Aymaran language family today, it consists of no more than two (possibly three) languages, but it almost certainly comprised additional undocumented varieties which became extinct in the colonial and early independence periods, or even as late as the 20th century. Some of this Aymaran presence and diversity can be witnessed in 16th and 17th century sources (TORERO, 1970[12]; CERRÓN-PALOMINO, 2000[3]). At the same time, some modern Quechuan varieties that are not presently in touch with an Aymaran language feature traces of close contact with Aymaran speaking populations. An illustrative case is that of the moribund Pacaraos Quechua in the upper Chancay valley, north of Lima. This variety harbors a substantial amount of Aymaran lexicon that cannot easily be linked to any of the surviving Aymaran languages, nor has the presence of any extinct group of Aymaran speakers been assessed in its immediate neighborhood (Cf. ADELAAR, 1982[13]).

The internal differentiation within both Quechuan and Aymaran not only facilitates reconstruction within the families themselves, but also provides crucial information on their contact history in a convincing chronological context. Identification of the effects of historical contact in both language families makes it possible to sketch the original profiles of their respective lineages with all their pertinent characteristics and restrictions. It also helps to understand the nature and the different stages of evolution that affected each language family. Eventually, the reconstruction of pre-contact stages of both Quechuan and Aymaran through the isolation of elements that were clearly borrowed between the two families can offer new benchmarks for comparison with other languages and language groups in the Middle Andes region or elsewhere in the Americas, as well as new perspectives on still earlier stages in the evolution of the Middle Andean languages.

3. Historical Relations Between Aymaran and Quechuan

Aymaran and Quechuan are not only the best documented indigenous language groups of the Middle Andean region, they are also very similar in many respects. These similarities were already noticed in the 17th century by Jesuit grammarians and chroniclers, such as Cobo (1653), and have preoccupied philologists and historical linguists ever since. The elements shared by both language groups include (1) an approximate 20% to 25% of nearly identical lexical roots which can be reconstructed to the proto-languages; (2) largely coinciding phoneme inventories, including such elements as a persistent three-vowel system, otherwise seldom found in South America, /a/, /i [~e]/, /u[~o]/, a distinctive opposition between velar and post-velar stops /k/, /q/, and a distinctive opposition between palatal and retroflex affricates /č/, /č̣/; and (3) morphosyntactic structures that are almost identical, not in form but in function and content, including the very specific semantics of a large number of verbal derivational categories. Other Andean languages generally do not share the systematic similarities that unite Aymaran and Quechuan. These similarities form a clear indication of the likelihood that both language groups existed in close contact with each other for at least part of their history. Nevertheless, the emphasis on similarities should not make us overlook the notable dissimilarities that separate the two language groups, in particular, a radically different basic vocabulary in the non-shared sector, crucial differences in the phonotactics of roots and words, and a highly specialized morphophonemic behavior of affixes in affix sequences which is exclusive for the Aymaran languages. Last but not least, the existence of glottalized and aspirated stops and affricates is held to be an original feature of the Aymaran languages, although an external origin of this phenomenon through contact with other neighboring languages might be considered (EMLEN, 2017[5]).

The question of the nature of the Quechuan-Aymaran relationship, also known as the Quechumaran hypothesis, has occupied historical linguists and other observers for a long time in a controversy that often boiled down to the question: “Are Quechuan and Aymaran genetically related or not?” (Cf. CERRÓN-PALOMINO, 2000[3]). Implicit to this debate was the assumption of an exclusive genealogical relationship which excluded all other languages. It may be prudent to assume that a majority of the obvious similarities between the two families are due to borrowing and contact. Such a conclusion does not preclude the possibility of an eventual common source for the two language families, but it would have to be found at a much deeper historical level than the one defined by the (nearly) identical elements on which the Quechumaran hypothesis was traditionally based (e.g., in ORR & LONGACRE, 1968[14]). The implied historical depth would inevitably involve comparisons with other New World languages and language families, depriving the Quechuan-Aymaran hypothesis of its self-sufficient exclusive character. As is often the case in such situations, there is a notable imbalance between borrowed vocabulary items, which are nearly identical in both languages, especially at the level of the proto-languages, and non-borrowed items, which normally show no similarity at all. It is, therefore, not possible to detect any gradual historical divergence as is characteristically found between languages with uncontroversial genealogical affinities.

The historical intertwinement of the Quechuan and Aymaran languages has often been referred to in the literature as a case of linguistic ‘convergence’, but this concept also requires further refinement (ADELAAR, 2012[15]). As a matter of fact, the effects of profound Quechuan-Aymaran contact can be reconstructed right back to the stage of the Quechuan and Aymaran proto-languages. In other words, there is no modern variety of Quechuan that does not bear the structural marks of intensive contact with Aymaran, and the opposite probably holds true for Aymaran with respect to Quechuan lexical influence. Both the intensity and the systematic character of the contact situation suggest that the ancestors of the two linguistic lineages shared the same geographical area during the formation of the proto-languages, a deviation from earlier representations in which Quechuan and Aymaran used to be located in separate, albeit marginally overlapping, areas (e.g., Torero [1970], who incidentally refers to Aymaran as Aru ‘language’). The most likely candidate for such a shared area are probably the Andean highlands of Central Peru. The evidence seems to point at a situation in which Aymaran and Quechuan speakers shared communities and locations throughout a large area, a situation which is still locally found in parts of the Bolivian and Southern Peruvian Andes (BASTIEN, 1978[16]; HOWARD, 1995[17]). The simultaneous success of Aymaran and Quechuan expansion towards the southern boundaries of the Middle Andes region suggests a kind of concerted action facilitated by kinship ties and possibly a division of labor and economic activity based on ecological levels (Cf. EMLEN & ADELAAR, 2017[5]).

4. Disentangling the Quechuan-Aymaran Knot

The determination of the directionality of lexical and structural borrowings between Quechuan and Aymaran, which had remained stagnant for a long time, was made possible by the recognition of fundamental phonotactic differences between the two linguistic lineages and the hypothesis that on this basis criteria could be developed in order to establish this directionality (ADELAAR, 1986[18]). This line of research has been elaborated by Emlen (2017[19]), who separated the lexicon shared by Quechuan and Aymaran from non-shared lexicon exclusively found in either one of the two families. It appeared that most of the ancient lexicon shared by the proto-languages underlying both language groups formally resembled the non-shared Quechuan lexicon in that it frequently contained consonant clusters or reflexes of consonant clusters that were absent from the lexicon exclusive to Aymaran. Quechuan has a preference for disyllabic roots, many of which contain complex word-internal consonant clusters with few combinatory restrictions. These clusters are also frequently found in the shared lexicon. By contrast, roots that are specifically Aymaran do not normally contain consonant clusters consisting of two stops or affricates in their canonical form (that is, without accompanying suffixes affecting the shape of these roots by morphologically conditioned vowel suppression, a common phenomenon in the Aymaran languages). In roots which are exclusively Aymaran, the first consonant of an internal cluster has to be a resonant (/n/, /m/, /l/, /lj/, /r/) or a sibilant (s, š), open syllables are preferred, and tri-syllabic roots are relatively frequent. Naturally, the cluster criterion is meaningless for shared lexicon that does not contain any internal cluster, e.g. the shared verb root apa- ‘to carry’, but there is little reason to assume that the directionality of the borrowing would have been different in those cases. It speaks in favor of a massive lexical borrowing flow from pre-proto-Quechuan to proto-Aymaran, rather than the other way around. An important byproduct of Emlen’s work is the discovery of some unexpected phonological features of the Aymaran fricatives and glides in terms of frequency and distinctive load, which invite further comparative research.

The situation on the structural level appears to be the opposite of that on the lexical level. Although Quechuan and Aymaran structure are very similar in terms of categories, semantic distinctions and use of affixes, Aymaran clearly preserves a more archaic state of affairs. For instance, in the domain of personal reference, it has maintained a four-person system based on the inclusion resp. exclusion of the Speaker and the Addressee (Cf. HARDMAN et al., 1988[20]) with a straightforward match of form and meaning. Such four-person systems can be reconstructed for several other Native American language families, in which they are often embedded in more elaborate structural settings. Quechuan features a similar four-person system, but it appears to be built on a basis of affixes and fragments of affixes which originally may have had other functions, thus betraying the system’s relatively recent adaptation to the Aymaran model. Also in Aymaran, combinations of two Speech Act Participants expressed in the same verb are codified by means of complex endings (nine in total including the four that express only one Speech Act Participant). These complex endings appear to be indivisible and difficult to analyze. They differ in form according to tense and mood. Quechuan has a similar system, but its expression is based on a limited number of personal reference affixes, most of which can appear in endings consisting in a combination of two separable markers. It is suggestive of a process in which existing affixes became recycled in an economic manner, building a set of combinations as required by the Aymaran model with the smallest possible amount of material already available. Similar economic considerations can be called to account for further changes and replacements affecting the verbal inflectional morphology of individual Quechuan varieties.

5. Conclusion

Without going into too much detail, we may conclude that the Aymaran linguistic lineage provided a structural model for what was to become Quechuan as we know it, whereas the Quechuan lineage contributed a large part of its original lexicon to the shared vocabulary of both families. As a result, no modern representative of either language group is free of this type of heritage. A point that must be emphasized here is that the reconstructed interaction must have preceded the entire history of each of the two language families as they appear today, their proto-languages being a product of precisely that interaction. Considering the time needed for the internal diversification of both families to develop in a gradual way, it is defendable to assume that the moment of the initial convergence could have occurred around the beginning of our era, and that it was possibly the result of an invasion by the Quechuan lineage of what was originally Aymaran territory.

The removal of borrowed elements and structural principles that are alien to either one of the two families leaves us with two retrospectively modified languages. Apparently, the Quechuan lineage may originally have had a three-person system, and its inventory of complex personal reference endings may have been limited to the combination of a 1st person subject acting upon a 2nd person object (compare the situation in modern Hungarian). Furthermore, the phonetically complex structure of Quechuan roots calls for further investigation. Were the complex word-internal clusters always there, or are they the product of some sort of derivation which remains to be identified? Several sets of semantically related roots appear to share an initial element, as, for instance, wa- in verbs referring to ‘hanging or suspension in the air’, or ya- in verbs of going (e.g., in Southern Peruvian Quechua warku- ‘to hang’ and Central Peruvian Quechua yarku- ‘to climb’, in which the element -rku- can be identified as a marker for ‘upward motion’). Such reconstructed roots are generally monosyllabic, and often have a canonical CV shape, suggesting a rather simple verbal root structure for the Quechuan lineage in early pre-convergence times (EMLEN, 2017[5]). This finding also calls for a further analysis of the derivational elements with which these monosyllabic roots are combined. On the Aymaran side the high frequency of initial /h/ and /s/ in contrast with their marginal occurrence and low distinctiveness in Quechuan also requires further scrutiny.

Nevertheless, the systematic separation of inherited Aymaran and inherited Quechuan elements does not yield two language families that are spectacularly different in their main characteristics. The reconstructed sound systems of the two pre-contact lineages still look very much the same. The reassignment of Proto-Aymaran to the same region in Central Peru where Central Peruvian Quechuan varieties are spoken today brings into doubt the inherited character of Aymaran glottalization and aspiration, which may have been an areal feature of the languages of the Titicaca basin much further south and could have entered Aymaran after its occupation of that region (for instance, from local languages, such as Uru-Chipaya, Leco or Atacameño). It would eliminate one of the most significant differences between the original sound inventories of Quechuan and Aymaran (with the provision that the occurrence of glottalization and aspiration in Jaqaru and Kawki, the Central Peruvian branch of Aymaran, would remain to be explained; see EMLEN, 2017[5]). Furthermore, there is no indication suggesting that the two language groups would not have shared their suffix-based agglutinative character and SOV structure before entering into contact. One of the main advantages of the separation of the inherited characteristics of the two families is that they can now freely be compared to any other language or language family in the region or beyond without the burden of inherited foreign elements.

From a social-historical point of view the cohabitation of the descendants of the Aymaran and Quechuan lineages in a situation of bilingualism during a formative period of their existence many centuries ago may provide an explanation for much of their behavior in later years. Linked by age-old family ties and loyalties, Quechuan and Aymaran speakers may have chosen to operate jointly in their conquests and distribution of new territories. As far as the linguistic evidence goes, it would appear that the Incas used to spare their Aymaran allies more than any other nation in the Middle Andes region, leaving their language and customs largely intact. In that perspective, the retraction of the Aymaran languages on behalf of Quechuan may have been a product of colonial language policy rather than the continuation of a pre-contact trend.

How to Cite

ADELAAR, W. Reconstruction beyond proto-languages in the middle Andes. Cadernos de Linguística, [S. l.], v. 1, n. 1, p. 01–13, 2020. DOI: 10.25189/2675-4916.2020.v1.n1.id274. Disponível em: Acesso em: 4 oct. 2023.



© All Rights Reserved to the Authors

Cadernos de Linguística supports the Opens Science movement

Collaborate with the journal.

Submit your paper