Friday, December 9, 2011

New Study Supports ANI/ASI Description of South Asian Whole Genomes

Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette.

Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP.

Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians.

From here (open access) via here.

The paper itself goes on to state in some portions that I excerpt below (greater than and less than signs transliterated with words because of their impact on html formating in a blog post; internal references to sources and figures omitted):

Reich et al. have also made an argument for a sizeable contribution from West Eurasia to a putative ancestral north Indian (ANI) gene pool. Through admixture between an ancestral south Indian (ASI) gene pool, this ANI variation was found to have contributed significantly to the extant makeup of not only north (50%–70%) but also south Indian populations (greater than 40%). This is in contrast with the results from mtDNA studies, where the percentage of West Eurasian maternal lineages is substantial (up to 50%) in Indus Valley populations but marginal (less than 10%) in the south of the subcontinent. . . .

[W]e used the model-based structure-like algorithm ADMIXTURE that computes quantitative estimates for individual ancestry in constructed hypothetical ancestral populations. Most South Asians bear membership in only two of the constructed ancestral populations at K = 8. These two main ancestry components—k5 and k6, colored light and dark green—are observed at all K values between K = 6 and K = 17. These correlate (r > 0.9; p < 0.00001) perfectly with PC4 and PC2 in West Eurasia, respectively. Looking at the Pakistani populations (0.51) and Baluchistan (Balochi, Brahui, and Makrani) in particular (0.59), the proportion of the light green component (k5) is significantly higher than in the Indian populations, (on average 0.26). Importantly, the share of this ancestry component in the Caucasus populations (0.50) is comparable to the Pakistani populations.

There are a few populations in India who lack this ancestry signal altogether. These are the thus-far sampled Austroasiatic tribes from east India, who originated in Southeast Asia and represent an admixture of Indian and East Asian ancestry components, and two small Dravidian-speaking tribes from Tamil Nadu and Kerala. However, considering the geographic spread of this component within India, there is only a very weak correlation (r = 0.4) between probability of membership in this cluster and distance from its closest core area in Baluchistan. Instead, a more steady cline (correlation r = 0.7 with distance from Baluchistan) of decrease of probability for ancestry in the k5 light green ancestral population can be observed as one moves from Baluchistan toward north (north Pakistan and Central Asia) and west (Iran, the Caucasus, and, finally, the Near East and Europe).

If the k5 light green ancestry component originated from a recent gene flow event (for example by a demic diffusion model) with a single center of dispersal where the underlying alleles emerged, then one would expect different levels of associated haplotypic diversity to suggest the point of origin of the migration. . . . Our simulations show that differences in haplotype diversity between source and recipient populations can be detected even for migration events that occurred 500 generations ago (∼12,500 years ago assuming one generation to be 25 years). For alleles associated with k5, haplotype diversity is comparable among all studied populations across West Eurasia and the Indus basin. However, we found that haplotypic diversity of this ancestry component is much greater than that of those dominating in Europe (k4, depicted in dark blue) and the Near East (k3, depicted in light blue), thus pointing to an older age of the component and/or long-term higher effective population size. Haplotype diversity flanking Asian alleles (k7) is twice greater than that of European alleles—this is probably because the k7 ancestry component is a composite of two Asian components ([at] K > 10).

In contrast to widespread light green ancestry, the dark green ancestry component, k6 is primarily restricted to the Indian subcontinent with modest presence in Central Asia and Iran. Haplotype diversity associated with dark green ancestry is greatest in the south of the Indian subcontinent, indicating that the alleles underlying it most likely arose there and spread northwards. It is notable that this ancestry component also exhibits greater haplotype diversity than European or Near Eastern components[.] . . .

[G]enetic diversity among Pakistani populations (average pairwise FST 0.0056, although this measure excludes the Hazara, who show substantial admixture with Central Asian populations) is less than one third of the diversity observed among all South Asian populations (0.0184), even when excluding the most divergent Austroasiatic and Tibeto-Burman speaking groups of east India. . . . all South Asian populations, except for Indian Tibeto-Burman speakers, show lower FST distances to Europe than to East Asia. This could be either because of Indian populations sharing a common ancestry with West Eurasian populations because of recent gene flow or because East Asian populations have relatively high pairwise FST with other non-African populations, probably because of their history of genetic bottlenecks.

Similarly, the clines we detect between India and Europe (e.g., PC1 and PC2) might not necessarily reflect one major episode of gene flow but be rather a reflection of complex demographic processes involving drift and isolation by distance. Nevertheless, the correlation of PC1 with longitude within India might be interpreted as a signal of moderate introgression of West Eurasian genes into western India, which is consistent with previous studies on uniparental and autosomal markers.

Overall, the contrasting spread patterns of PC2 and PC4, and of k5 and k6 in the ADMIXTURE analysis, could be seen as consistent with the recently advocated model where admixture between two inferred ancestral gene pools (ancestral northern Indians [ANI] and ancestral southern Indians [ASI]) gave rise to the extant South Asian populace. The geographic spread of the Indian-specific PC2 (or k6) could at least partly correspond to the genetic signal from the ASI and PC4 (or k5), distributed across the Indus Valley, Central Asia, and the Caucasus, might represent the genetic vestige of the ANI. However, within India the geographic cline (the distance from Baluchistan) of the Indus/Caucasus signal (PC4 or k5) is very weak, which is unexpected under the ASI-ANI model, according to which the ANI contribution should decrease as one moves to the south of the subcontinent. This can be interpreted as prehistorical migratory complexity within India that has perturbed the geographic signal of admixture.

Overall, the locations of the Indian populations on the PC1/PC2 plot reflect the correlated interplay of geography and language. In concordance with the geographic spread of the respective language groups, the Indian Indo-European- and Dravidic-speaking populations are placed on a north to south cline. The Indian Austroasiatic-speaking populations are, in turn, in agreement with their suggested origin in Southeast Asia drawn away from their Indo-European speaking neighbors toward East Asian populations. In this respect, it is interesting to note that, although represented by only one sample each, the positions of Indo-European-speaking Bhunjia and Dhurwa amidst the Austroasiatic speakers probably corroborates the proposed language change for these populations.

[I]t was first suggested by the German orientalist Max Müller that ca. 3,500 years ago a dramatic migration of Indo-European speakers from Central Asia (the putative Indo Aryan migration) played a key role in shaping contemporary South Asian populations and was responsible for the introduction of the Indo-European language family and the caste system in India. A few studies on mtDNA and Y-chromosome variation have interpreted their results in favor of the hypothesis, whereas others have found no genetic evidence to support it.

However, any nonmarginal migration from Central Asia to South Asia should have also introduced readily apparent signals of East Asian ancestry into India. Because this ancestry component is absent from the region, we have to conclude that if such a dispersal event nevertheless took place, it occurred before the East Asian ancestry component reached Central Asia. The demographic history of Central Asia is, however, complex, and although it has been shown that demic diffusion coupled with influx of Turkic speakers during historical times has shaped the genetic makeup of Uzbeks (see also the double share of k7 yellow component in Uzbeks as compared to Turkmens and Tajiks), it is not clear what was the extent of East Asian ancestry in Central Asian populations prior to these events.

Another example of an heuristic interpretation appears when we look at the two blue ancestry components that explain most of the genetic diversity observed in West Eurasian populations (at K = 8), we see that only the k4 dark blue component is present in India and northern Pakistani populations, whereas, in contrast, the k3 light blue component dominates in southern Pakistan and Iran. This patterning suggests additional complexity of gene flow between geographically adjacent populations because it would be difficult to explain the western ancestry component in Indian populations by simple and recent admixture from the Middle East.

Several aspects of the nature of continuity and discontinuity of the genetic landscape of South Asia and West Eurasia still elude our understanding. Whereas the maternal gene pool of South Asia is dominated by autochthonous lineages, Y chromosome variants of the R1a clade are spread from India (ca 50%) to eastern Europe and their precise origin in space or time is still not well understood. In our analysis we find genetic ancestry signals in the autosomal genes with somewhat similar spread patterns. Both PC2 and k5 light green at K = 8 extend from South Asia to Central Asia and the Caucasus (but not into eastern Europe).

In an attempt to explore diversity gradients within this signal, we investigated the haplotypic diversity associated with the ancestry components revealed by ADMIXTURE. . . our current results indicate that the often debated episode of South Asian prehistory, the putative Indo-Aryan migration 3,500 years ago falls well within the limits of our haplotype-based approach. We found no regional diversity differences associated with k5 at K = 8. Thus, regardless of where this component was from (the Caucasus, Near East, Indus Valley, or Central Asia), its spread to other regions must have occurred well before our detection limits at 12,500 years. Accordingly, the introduction of k5 to South Asia cannot be explained by recent gene flow, such as the hypothetical Indo-Aryan migration. The admixture of the k5 and k6 components within India, however, could have happened more recently—our haplotype diversity estimates are not informative about the timing of local admixture.

Both k5 and k6 ancestry components that dominate genetic variation in South Asia at K = 8 demonstrate much greater haplotype diversity than those that predominate in West Eurasia. This pattern is indicative of a more ancient demographic history and/or a higher long-term effective population size underlying South Asian genome variation compared to that of West Eurasia. Given the close genetic relationships between South Asian and West Eurasian populations, as evidenced by both shared ancestry and shared selection signals, this raises the question of whether such a relationship can be explained by a deep common evolutionary history or secondary contacts between two distinct populations. Namely, did genetic variation in West Eurasia and South Asia accumulate separately after the out-of-Africa migration; do the observed instances of shared ancestry component and selection signals reflect secondary gene flow between two regions, or do the populations living in these two regions have a common population history, in which case it is likely that West Eurasian diversity is derived from the more diverse South Asian gene pool.

Most of this analysis makes sense (although it is extremely equivocal). But, I disagree with the conclusion that a lack of regional differences in genetic diversity between regions implies a time depth of more than 12,500 years. It is more plausible, in my view, given Indo-European historical and pre-historical evidence from multiple other sources to assume a recent and simultaneous dispersal of a large population to many different regions, with the size of the population entering South Asia being somewhat larger than elsewhere, thus sustaining simmilar levels of diversity in different regions with South Asia. In part, their conclusion seems driven by probably inaccurate assumptions about the time depth of the principle European and Near Eastern autosomal components (k3 and k4). Further, if k5 is Indo-Aryan and it had a significant Caucusian source, it may have had considerable age prior to Indo-Aryan expansion during which it could have accumulated its diversity.  The weak geographic pattern in k5 within India may reflect a fairly complete subcontinental imposition of a high caste ANI ruling class.

I am also inclined to think that the West Asian/ANI components (k3, k4 and k5) probably represent at least two waves of migration, at least one of which is Harappan (probably k3), and at least one of which is Indo-Aryan (probably k5 with a minority component of k4).

The observation that Central Asia has East Asian components now that it may not have earlier (the current component appears from historical evidence and ancient DNA to have its origins in the last 2,000 years), because they did not enter the South Asian gene pool from Central Asia, is notable and probably correct.

The study also confirms prior studies in finding that Tibeto-Burman populations are more distinct genetically (and hence probably more recent arrivals) than other South Asian populations.

The autosomal profiles show significant instances of African origins in Pakistani populations (and a Dravidian speaking Brahui population no different genetically from its Indo-European language speaking neighbors) but strong ASI components and no discernable African components in Dravidian speakers, even in Andhra Pradesh where Y-DNA haplogroup T frequencies are highest. Andhra Pradesh also has very low percentages of Near Eastern (k3) or European (k4) components in this breakdown, although the sample is small enough that it may not be representative for relatively recently appearing, relatively moderate frequency genetic types that have not reached fixation in the area.

No comments: