Tuesday, September 13, 2011

Genetic Substrates In Afro-Asiatic Language Speaking Populations

What follows is a collection of factual observations about the population genetics of Afro-Asiatic language family speakers and populations in some ways related to or distinct from them, and some analysis of those facts.  It is a work in progress towards making sense of the hard to fit together puzzle pieces of a complex linguistic family's origins.

There are four Y-DNA haplogroups associated with Afro-Asiatic language speaking populations to some extent or another: E1b1b, the exclusive distinctive Y-DNA haplogroup in Berbers which is found is all but Chadic speakers, R1b-V88 associated with Chadic speakers, J associated most strongly with Semitic speakers, and T associated most strongly with Cushitic, Ethiosemitic and Coptic speakers (including Arabic or Berber language speakers of Egyptian descent).

R1b-V88, J and T, when found in Africa, are back migrations. E1b1b is East African (probably Ethiopian) in origin. With a few exceptions, the combined set of Afro-Asiatic Y-DNA haplogroups account for 70% or more of all sampled Afro-Asiatic populations available to the public, and in some cases that percentage exceeds 90%. But, apart from one modest sized sample of Algerian Arabs, there are also Y-DNA haplogroups in the population beyond those associated with Afro-Asiatic populations.

The overall Afro-Asiatic percentage and the nature of the "background" haplogroups varies by region. In the Levant and Egypt, the largest components of the background tends to be Y-DNA haplogroups G and I (suggestive of Anatolian and Caucasian influences). In Mesopotamia and Arabia, the largest component of the background tends to be Y-DNA haplogroup R1a (suggestive of West Asian influences). In Chadic, Berber and North African Arab populations, the largest component of the background tends to be Y-DNA haplogroup E1b1a (suggestive of West African influences). In East Africa (Ethiosemtic, Cushitic and Omotoic and well as some Sudanese populations), the largest component of the background tends to be Y-DNA haplogroups A and B (suggestive of a Khoisan language macrofamily and Pygmy macrofamily affinity). The background for Jews tends to reflect their ancestral geographic neighbors (e.g. Europeans).

Of these backgrounds, I find the East African background, which I would have naively expected to be E1b1a dominated, to be the most interesting. It suggests that haplogroups A and B were predominant in the region (and locally, it is mostly heavily one or the other of them), prior to the later in time expansion of E1b1b, and provides inferential evidence against a hypothesis that Niger-Congo languages or something fairly closely related may have been spoken in East Africa before the rise of Afro-Asiatic languages.

The two populations of Afro-Asiatic population descent showing the largest backgrounds in Africa and the Middle East are East Africans and Jews, each of which have some communities background percentages as high as 40%-50% of the male population sampled. The large background in East Africa could either indicate the petering out of a wave of advance at the periphery in the face of indigenous resistance, or could indicate the limited ability of E1b1b bearers to surmount already well established local competition in their expansion.

The "signal" in the case of Berbers and Chadic populations is strongly form a single Y-DNA haplogroup (E1b1b and R1b-V88 respectively), but in other populations the Afro-Asiatic markers tends to include some mix of haplogroups. Isolated traces of haplogroup J in Berbers is probably attributable to Arab/Bedouin influences. Haplogroup T, for example, while strongly associated with Afro-Asiatic populations in Africa, is almost never the exclusive Y-DNA marker associated with an Afro-Asiatic population that is found in a sample.

The great distinction in both signal and background between the populations on either side of the Gate of Tears at the Southern extent of the Red Sea suggests that the back migrations of Eurasian haplogroups that are associated with Afro-Asiatic populations took place via the Levant and across the Sinai or along its coast (and from there either down the Egyptian coast of the Red Sea or via the Nile to its source or parallel to the Nile during wet Sahara periods, or along the North African coast), and to a lesser extent via contacts between North Africans and Iberia (and to a less extent the entire Southern European coast during classical times), rather than at the Gate of Tears location often associated with the Out of Africa event.

It is a bit hard to tell if the Somolian Y-DNA haplogroup T population has coastal origins, or if it followed the Blue Nile to its source and then made the short hop from the Blue Nile basin to the Indian Ocean drainage basin. Y-DNA haplogroup T could have gone Southeast to Egypt and Northeast to Europe, while Y-DNA haplogroup L could have split from T and K somewhere in the Fertile Crescent and migrated to Pakistan as part of the Harappan population (along with R2 which also has a well defined Indus River Basin distribution).

The relatively clear delineation between Chadic populations and their neighbors with religious and food production similarities (e.g. with the Fulani and Saharan subgroups of Nilo-Saharans), relative to other families of Afro-Asiatic populations, suggests to me that they are quite young among the main linguistic subgroups of Afro-Asiatic. Ethiosemitic is also known in an approximately Bronze Age time frame to have emerged from a single language of Southwest Asian origins upon a Cushitic/Coptic substrate. Arabic's expansion likewise, took place in historic times in the 1st and 2nd millenia.

We know that Semitic was thriving as the language of the Akkadian and subsequent Assyrian empires in the Bronze Age. We know from recovered and dated artifacts that there was trade from Arabia (probably via Yemen) as far as Zanzibar as far back as at least 2,400 BCE, that the Indian Ocean trade was operating in the 1st century CE connecting Zanzibar, Arabia and South Asia, at least, that the African groups that fused with Austronesians to settle Madagascar were very likely East African Bantus, and that Zanzibar probably acquired iron age Bantu settlement around the 9th century CE. An island off Kenya that was part of the same trading system as Zanzibar is called Lemu (a name which is suggestive of the legendary homeland of the Tamils called Lemuria and may have been named in the same place naming tradition although the sleepy island is an unlikely homeland for anyone). We know that there was a healthy Atlantic maritime trade network in the pre-Celtic megalithic culture starting around the 4th century BCE and until roughly Bronze Age collapse ca. 1200 BCE, as a proof of existence of technology at that time.

The subhaplogroup structure found in E1b1b suggests that this haplogroup has its origins in Ethiopia, where diversity is greatest and closer to the origins of E1b1a and E2, while Berbers are at the periphery.

Berbers are also at the tail end of expansions of mtDNA M1 and U6, which are probably back migrations to Africa despite the fact that they are largely confined to Afro-Asiatic Africa. The mtDNA haplogroup L3* found in Berbers is old, rather than of recent origin in the Transafrican slave trade. About 10%-20% of Berber mtDNA is Subsaharan African in origin, while about 60% involves haplogroups usually associated with Caucasians of European and Middle Eastern origins. The mtDNA pool of both Chadic, Cushitic and Omotic populations is composed to a great extent of mtDNA haplogroups L2 and L3 similarly to their neighbors who speak different languages, suggesting a male dominated arrival pattern of Afro-Asiatic language expansion.

Ancient DNA shows genetic continuity between Berbers and North African populations from 12,000 years ago.

The notion of Omotic is an admixture of Nilo-Saharan and Cushitic influences is plausible and could explain its out group status.

Also, genetically, some Nilo-Saharan populations look more like Afro-Asiatic populations than they do like Niger-Congo or prototypical Nilo-Saharan populations.

There are several key historical transitions that it would be nice to be able to link to linguistic expansions in Africa.

The arrival of Fertile Crescent food production technologies in Egypt and Egyptian domestication of the donkey (one or two thousand years after they were developed, ca. 6000-7000 BCE). The arrival of certain Fertile Crescent herd animals in North Africa, East Africa and the Sahel (a few centuries later). The Afro-Asiatic linguistic groups seem primarily connected to the Fertile Crescent rather than to the Sahel agricultural complex. Egyptian trade routes which can be corroborated to some extent not only by historical accounts but by the products that they exported to Egypt may have extended as far as modern Uganda and surely reached Ethiopia, at least.

The expansion of Sahel agriculture (by some accounts contemporaneous with the Fertile Crescent agricultural complex and by some accounts several thousand years later, perhaps as late as 4000 BCE). The domestication of select East African domesticates (e.g. coffee and certain Ethiopian grains). There is an argument that Sahel agriculture only came into its own when East African domesticates and Sahel domesticates merged.

It is probably fair to associate the expansion of E1b1a and the Niger-Congo languages and mtDNA haplogroup L2 with the development of Sahel agriculture, and to associate the later Bantu expansion starting 3,000 BCE with the development of tropical agriculture (with some crops of Austronesian sources) and Bantu iron metallurgy.

The proto-Nilo-Saharans, the proto-Chadic peoples, and the proto-Berbers all appear to have been nomadic pastoral cultures. The transition of the Sahara from its last wet phase in the Holocene when Lake Chad was very large to the current arid phase, was probably formative for at least some of them. There was a major regionally disruptive drought around 2000 BCE and another not quite as bad accompanied by other disasters like volcanos as well around 1200 BCE.

The origin stories of the Semitic peoples are also nomadic pastoral, a tradition that may have persisted at least prior to the Akkadian empire, Assyrian empire, and Phoenicians. Biblical accounts of the early Hebrews portray them as an initially nomadic pastoral culture that starts to transition to agriculture as they settled into the Levant after their exile in Egypt (an event that is suggestive, at least, of having some connection with the Hyskos era and the monotheistic Pharaoh of Egypt, although much of Genesis draws on Mesopotamian legends, something not necessarily inconsistent with the Hyskos).

The Coptic historical record is the oldest in existence apart from the Sumerian one, dating to about 3000 BCE, and is corroborated by the Sumerian historical record which is slightly earlier, ca. 3500 BCE.

The Coptic Egyptian record is less than clear about the origins of the Berbers, Cushitic, Omotic, Chadic and Semitic peoples on their fringes, and archaeology helps only a little to fill the gap.

The Afro-Asiatic languages have considerable time depth relative to the Indo-European and at least some Asian and Altaic language family expansions. They were preceded in Mesopotamia by Sumerian, which was also very likely the source for the Harappan civilization in modern day Pakistan at about the same time as the Egyptian civilization, although the linguistic affinity of Harappan is highly disputed.

The Afro-Asiatic language families do not show a clear linear family tree relationship to each other; almost every combination of groupings has been suggested by legitimate professional linguists. The Northern tier of Afro-Asiatic languages (Berber, Coptic and Semitic) are non-tonal. Cushitic and Omotic and Chadic do have grammatical tone. This suggests either substrate influences or areal influences.

My intuition is that Afro-Asiatic languages either arise from a proto-Afro-Asiatic language in and around Jericho that spread to Egypt and from Egypt to the other Afro-Asiatic language families, or that it radiates from Egypt in all directions. Thus, I am ambivalent about the direction of the Semitic-Coptic link. The close connection to the earliest food production centers makes a Levantine origin for Afro-Asiatic attractive, but the early adoption of a written language could have given Coptic and edge and could have been an indigenous fishing population language (as fishing populations were the most culturally advanced societies prior to food production technologies). The lack of a monolithic Y-DNA or mtDNA signature suggests that some of the transitions were predominantly cultural transfers while others were demic.

My intuition is that the Berber E1b1b/mtDNA L3* combination, quite possibly as a unit, arrived in North Africa in the pre-Neolithic times (ca. 12,000 BCE) as a hunter-gather population that in some way made a leap that set it apart from prior cultures in the area and caused it to expand from East Africa before the Afro-Asiatic languages arose, and Berbers then transitioned culturally with very little genetic impact to a nomadic pastoralist society with an Afro-Asiatic language derived from Coptic when herd animals arrived from Egypt ca. 5000-6000 BCE. The mtDNA M1/U6 could be fellow travellers with Y-DNA E1b1b or with T that would have also been a back migration.

My intuition is that Cushitic is the product of the expansion of Coptic society towards the source of the Blue Nile at about the time of agricultural technology and trade influences from Egypt, with quite heavy substrate influence, and that Omotic is basically Cushitic under heavy Nilo-Saharan influence on the border of the two language groups.

My intuition is that Y-DNA J in Africa is a late in time influence (Ethiosemitic and later) driven by Semitic peoples, and that Y-DNA haplogroup T is an ancient one (quite possibly the most ancient Afro-Asiatic marker given its presence in both the Levant and Europe and Egypt and Cushitic areas). Of course, J is almost surely in Southwest Asia much earlier. I suspect that J1 is originally more closely associated with Semitic languages (possibly in connection with of after Y-DNA T), but that J2 introgresses to some extent into the mix. A specific time depth of J in Southwest Asia and West Asia is for another day.

R1b-V88 should probably be similar in time to the expansion of R1b elsewhere to break off from it basally, but if it was present in a refugium in the Dead Sea area, for example, it could have formed a basis for the Chadic peoples later in time - I suspect that the Chadic peoples could have origins in the early period of Egyptian written history and went almost unmentioned or could have origins shortly before Egyptian written history. There is some indication from their distribution that Chadic peoples may have arrived via oasis hopping in a wetter Sahara Holocene period parallel to but West of the Nile, rather than down the Nile, although tracing the White Nile to its source and then hopping into the Chad basin would also make sense.

Note: This post is light on links and actual data, because I want to get my analysis down and my data are in my favorites bar and a hand written journal, neither of which is prone to memory lapses.

6 comments:

Maju said...

Totally in disagreement, sorry. J1 is too widespread to be old and trying to claim E1b as Egyptian ("Coptic"?, what the heck, that's a Christian sect!) is misleading and surely held only on race prejudices.

E1b originated at the Nile, Sudan or Ethiopia. And so did Afroasiatic languages. However I do not think all E1b correlates with Afroasiatic but only E1b1b1a1-M78. Other E1b is surely much older.

Andrew Oh-Willeke said...

I'm not sure you are understanding me correctly. I date J in Africa to ca. 2500 BCE-1500 BCE ish and in SW Asia to more or less the formative period of the Semitic languages which is older than that but very vague. Certainly J is common in Semitic language speakers (particularly J1).

I don't disagreee that E1b1b originated in Ethiopia or thereabouts, or that E1b is older than E1b1b.

Coptic was the name of the ancient Egyptian language long before it was the name of the Christian sect that uses the ancient Egyptian language as its liturgical language. The notion of using the word "Coptic" is to focus on populations that were ancestrally part of the greater Egyptian sphere of influence as opposed to the citizenry of the modern Egyptian nation-state. For example, Greek speaking minorities and immigrant Arabs from Saudi Arabia might be "Egyptians" but not ethnically Coptic.

E1b1b correlates with and is found in most branches of Afroasiatic, and is not found in most non-Afroasiatic populations. E1b1a is predominantly West African and Southern African (via Bantu mostly), is decidely not Afroasiatic, let alone Egyptian.

My analysis was focused on looking at what non-Afro-Asiatic Y-DNA haplogroups are most common in populations that are mostly Afro-Asiatic from different language groups. E1b1a, for example is the most common non-Afroasiatic Y-DNA type in the predominantly E1b1b Berbers.

The place of origin of the Afroasiatic languages is a highly controversial area that I'm assembling facts for the purposes of figuring out.

I am trying to figure out which of the Y-DNA hgs found at elevated levels in Afroasiatic peoples is common in Egypt, which has a mix of E1b1b and T and a lot of other stuff.

My conclusion, basically, is that E1b1b made its spread from Ethiopia through the region currently inhabited by Afroasiatic language speakers prior to the advent of Afroasiatic languages, and that Afroasiatic languages, which came later, spread with Egyptian or Levantine food production technology.

Since I conclude that E1b1b was preexisting in North Africa, the Nile and Ethiopia, my inclination is to conclude that T was likely the earliest and leading playing in the earliest spread of Afroasiatic, but that it infused itself in an existing E1b1b rich population that had spread through the region before hand for different reasons.

I honestly don't recall talking much about the origins of E1b at all.

Maju said...

"I'm not sure you are understanding me correctly. I date J in Africa to ca. 2500 BCE-1500 BCE"

I'm sure I am understanding that I cannot agree with that. First, there's no known archaeological flow from West Asia to North Africa in those dates and, second, the North African cluster of J1 looks very deep within the clade following Semino 04 (fig. 4). Of course if there has been more recent research in the matter I am interested on hearing of it but I'm unaware so far.

"E1b1b correlates with and is found in most branches of Afroasiatic, and is not found in most non-Afroasiatic populations".

Indeed. However I'd make a distinction between E1b1b1a1-M78, which has a distribution that strongly parallels that of Afroasiatic languages and other clades, notably E1b1b1b1-M81, which looks so specifically NW African that I cannot associate it with anything but some old settlement of this region, long before Capsian culture and Afroasiatic languages.

"The place of origin of the Afroasiatic languages is a highly controversial area that I'm assembling facts for the purposes of figuring out".

I do not think it is controversial at all, just some racist prejudice reluctant to see a "white" language and associated cultural elements originating in Black Africa. If you can be neutral re. this matter there is no reasonable doubt that Afroasiatic languages must have coalesced by the Nile, probably in modern Sudan (Nubia or further South?) expanding Northwards with Epipaleolithic cultural flows of which the most notable one is Capsian culture (Berber) and maybe Kebaran/PPNA (or rather the desert-oriented parts of it: the Harifian). Later the Circum-Arabian Pastoralist Complex brewed Semitic, which would expand in the 4th millennium BCE (this is partly historical already, because Sumerians documented it as "the flood": amaru, which is a playword on amurru or a-maru (Semitic peoples). This is of course the same "flood" later inserted in the Bible starring Noah.

"I am trying to figure out which of the Y-DNA hgs found at elevated levels in Afroasiatic peoples is common in Egypt, which has a mix of E1b1b and T and a lot of other stuff".

I can be an interesting research indeed.

"My conclusion, basically, is that E1b1b made its spread from Ethiopia through the region currently inhabited by Afroasiatic language speakers prior to the advent of Afroasiatic languages, and that Afroasiatic languages, which came later, spread with Egyptian or Levantine food production technology".

I am not in agreement. Afroasiatic highest diversity is and seems to have always been in the Upper Nile. Also there is no possible "vehicle" for an expansion as the one you say. I think you are, like so many other people, mystifying Neolithic without any proper archaeological or genetic support.

Kevin Borland said...

I've also been pondering where Afroasiatic matches up on the y-dna phylogenetic tree. I find the possibility that T may represent Afroasiatic very interesting. That would make its closest linguistic neighbor Dravidian, I guess. I especially like this possibility since, out of the choices, T is the only haplogroup I haven't matched with a language group or groups. The only problem I have is that the geographic distribution of T doesn't seem to match the geographic distribution of Afroasiatic languages. But if a new technology made it advantageous for xT populations to learn the T language as a second language, that could do it. I believe that E1b1b corresponds to Nilo-Saharan, E1b1a Niger-Congo and Mande, J1 Northeast Caucasian and J2 Kartvelian.

Andrew Oh-Willeke said...

There are clearly some problems with that classication. As noted, Afroasiatic does not have the relatively consistent genetic markers that some other language groups do. It has a group of about four haplogroups whose frequencies are such that no one of them can account for all of the Afroasiatic linguistic clades. Thus, some large population, at some point in time, must have experienced a major language shift.

J1, which you associate with Northeast Caucasian, is also a dominant hg in some Semitic populations. J2 has at least as much of an Anatolian root as it does a Kartvelian one. Hg T, if I were to associate it with any language family, I would associate with Sumerian, with Hg T joining a larger pan-Neolithic mix at low frequencies early on and as Sumerian became a dead language in favor of Afro-Asiatic Akkadian.

The fact that the Sahel crops central to the South Indian Neolithic, which coincides closely in time with the rise of Dravidian languages, are native to Africa and were domesticated there before they appeared in already domesticated form in South India, is compelling evidnece in favor of an African connection with the proto-Dravidians.

Hg T is in my view the only plausible candidate for Dravidian if it indeed has origins outside of India. It has the right distribution geographically in India, both in where it is found and where it is not found, to be a proto-Dravidian marker there (although clearly T originates somewhere closer to Mesopotamia than India).

Dravidian languages and cultures have no real similarities to Afroasiatic languages or cultures at all, while showing some significant similarities to Niger-Congo languages linguistically which are buttressed by other cultural linkages to Niger-Congo cultures in addition to the founder crops.

Yet, the only extant Niger-Congo population with any meaningful Y-DNA hg T presence is the Fulani, particularly the Fulani from the North Cameroon region.

Moreover, the analysis that I conducted in this post tends to disfavor a Niger-Congo speaking substrate in parts of Africa where hg T is found, and the formative era of expansion of the Dravidian appears to precede the point in time at which the Bantu expansion had reached hg T rich parts of East Africa. The Y-DNA hg A/B substrate in hg T rich areas of Africa are suggestive, instead, of a hunter-gatherer Khoisan or Pygmy-like substrate in East Africa.

Yet, there is no evidence of maritime navigation that extended as far as Senegal via the Mediterranean or around the Cape of Good Hope to West Africa, ca. 2500 BCE to 1500 BCE when it would have to have existed for the Fulani to arrive in India other than via East Africa, and the Sahara would have been pretty much impassable around that point in time.

Anonymous said...

language family is the sign of the success of military-cultural form,so the direction of the family diffusion is from the center of the culture- ME.Egyptian roots are there too,though the evidence was described as "racist".Semites are the historicaly known most ancient language in ME(the first Sumerian kings had Semitic names).I think Semitic is the root language of the family and it`s diffusion of the cultural influence does not match with the genes of the populations.