Self-Identified vs. Genetic Race
Racial reality. In all cases, emphasis added.
Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.
So, self-identified race/ethnicity (SIRE) is going to be compared to “population-specific genetic architecture.” If race is a “social construct” one would expect a poor correlation between those entities. If on the other hand, race is primarily a biological construct, the correlation will be good.
An example of ethnicity-specific locus is CD36 (MIM: 173510) for high-density lipid cholesterol (HDL), for which the putative causal variant (rs2366858) is only polymorphic in populations of African descent.
But, but, but…I thought we are all exactly the same, and race is purely a social construct?
A well-known example of heterogeneous genetic effect is the APOE (MIM: 107741) e4 allele, which is polymorphic in many populations but confers greater risk of Alzheimer disease in Asians compared to other populations.
Ditto.
To date, most GWASs stratify on SIRE and adjust GIA within SIRE as covariates. The stratification by SIRE often implicitly occurs at the recruitment or genotyping stages, which focus on populations described by a single SIRE, such as Hispanics, Europeans/European Americans, African Americans/Afro-Caribbean, or East Asians, among others.
OK.
…GIA—in the form of principal components or admixture proportions—can be estimated for every GWAS participant.
Of course, dependent on how they do it, the accuracy of the admixture proportions should not be considered as definitive; as they write:
…as the 1000 Genomes individuals included in this analysis did not fully represent ancestry diversity in MVP, various model assumptions in ADMIXTURE were violated; therefore, we caution quantitative interpretation of the estimated admixture proportions.
Back to the main points:
Previous population genetic studies have demonstrated that GIA and self-identified racial/ethnic information have a high correlation, but one does not unambiguously determine the other.
High correlation = biological construct.
As regards “but one does not unambiguously determine the other” that is because of admixture between biologically defined groups:
Specifically, in admixed groups such as African Americans and Hispanics, genetic ancestries vary continuously among individuals along axes that represent admixture proportions; defining strata based on GIA requires thresholds that are often ad hoc. Conversely, the distribution of ancestry proportions may partially overlap between different racial/ethnic groups and cannot be separated based on GIA alone.
Moving on:
Motivated by these practical challenges, we propose a supervised learning algorithm that defines a categorical stratification variable in a multi-ethnic GWAS. The variable, termed HARE (harmonized ancestry and race/ethnicity), uses GIA to refine SIRE for genetic association studies in three ways: identify individuals whose SIRE is likely inaccurate, reconcile conflicts among multiple SIRE sources, and impute missing racial/ethnic information when the predictive confidence is high.
That is their basic strategy; it seems fairly reasonable, at least at first glance.
To select causal variants, we considered rare and common causal variants separately because the LD pattern around these causal variants are likely to differ. For rare causal variants, we randomly selected 125 unlinked SNPs such that the minor allele frequency (MAF) was less than 1% in one HARE minority strata while absent in all other HARE strata; these included 105 variants that were polymorphic only in non-Hispanic black and 20 that were polymorphic only in Hispanics.
I thought we were all exactly the same?
Of 351,820 individuals, all but 6,257 (1.78%) were assigned to one of the four non-overlapping HARE groups: Hispanics, non-Hispanic white, non-Hispanic black, and non-Hispanic Asian…
Which means that 98.22% were so assigned.
…As expected, the ancestries of individuals in the non-Hispanic black group varied along PC1 that described the difference among European ancestry and African ancestry (Figures 2B, 3B, and S3B). Likewise, Hispanic individuals showed varying proportions of European, African, and Native American ancestry (Figures 2C, 3C, and S3C).
As expected, due to the known biological histories of these groups.
The non-Hispanic Asian group consisted of two components, corresponding to the East and South Asian populations, respectively, according to the admixture analysis (Figures 2D, 3D, and S3D). Interestingly, European admixture (greater than 20%) were inferred in 12% (n = 364) of the individuals in the HARE non-Hispanic Asian group. Among this group, 46% (n = 166) individuals had “Asian” as the only SIRE information; an additional 25% (n = 91) indicated both Asian and European ancestries. This likely reflected recent admixture between Asian Americans and European Americans.
Is that admixture due to yellow fever fetishism, and many HAPAs identifying only as Asian?
Although it would have been feasible to train the support vector machine to learn East Asian and South Asian as two separate HARE categories, we chose to group them into one stratum because the statistical power of subsequent genetic association analysis would likely be low in this group due to relatively small sample size (n = 3,054).
With a larger sample size, these Asian groups would be separated. In any case, South Asians are appropriately binned into Asia, contra the “Aryan” fantasies of Desis and Traditionalists.
Among nearly 202,000 individuals with SIRE, 1,079 (0.53%) had GIA strongly indicating a different racial/ethnic group.
This is the main point, as this means that 99.47% of individuals had, more or less, a match between SIRE and GIA. Self-identified race matches genetics, as has been previously shown with other studies. Race is primarily a biological construct.
Among the nearly 150,000 individuals whose SIRE was missing or not used in the training procedure, 144,711 (96.55%) were assigned into one of four HARE groups.
Even when SIRE was not available, people could be binned at a rate of 96.55%, and I am confident that binning would match the racial perceptions we would have when considering those individuals.
A total of 372 distinct loci reached genome-wide significance for height in one of the HARE groups; as expected, the number of significant loci was positively related to the sample size...Of these, 21 loci were found in exactly one HARE group and would have been missed in the mega-analysis of the entire MVP cohort (Table S3). Nineteen of these loci were found in the non-Hispanic white group…
But I thought we were all the same?
HARE combines genetic ancestry and race/ethnicity information and is motivated by the empirically observed correlation between continental level genetic ancestry and major race/ethnicity.
"...the empirically observed correlation"…the “race is a social construct” crowd are LYING to you.
Our study focuses on stratified analysis by major race/ethnicity (as defined by the US Census), currently the most commonly adopted stratifying unit used in multi-ethnic GWASs. It is well appreciated that finer-scale structure exists within each race/ethnicity; researchers may wish to focus on strata defined within a race/ethnic group. For example, Conomos et al. aims to perform association studies within the Hispanics by defining strata corresponding to Cuban, Dominican, Puerto Rican, Mexican, Central American, or South American.
Fair enough. Groups can be further subdivided. Among Europeans, we know that the first major split is North/South, followed by East/West. That of course doesn’t alter the sharp distinctions between the major continental population groups (races).
Labels: admixture, phenotype vs. genotype, population genetics, reality of race, science and technics
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home