Genetic Ancestry and Natural Selection Drive Population Differences in Immune Responses to Pathogens

Yohann Nédélec, Joaquín Sanz, Golshid Baharian, Zachary A. Szpiech, Alain Pacis, Anne Dumaine, Jean-Christophe Grenier, Andrew Freiman, Aaron J. Sams, Steven Hebert, Ariane Pagé Sabourin, Francesca Luca, Ran Blekhman, Ryan D. Hernandez, Roger Pique-Regi, Jenny Tung, Vania Yotova, Luis B. Barreiro

Cell, october 2016


What do you mean by expression QTL, response eQTL and alternative splicing QTL ?

We studied the impact of a genetic variant (SNP in our case) on different gene regulatory traits:

  • Expression levels (expression QTL or eQTL): These refer to genetic variants that affect the expression levels of a gene.
  • Response to infection (response QTL or reQTL): These refer to genetic variants that are associated with the magnitude of change in expression levels after infection.
  • Transcript isoform usage (alternative splicing QTL or asQTL): These refer to genetic variants that affect the ratio of alternative isoforms used for the same gene (i.e., the percentage usage of each of the isoforms).

Which tool and parameters were used for QTL analyses ?

All linear regressions were performed using the R package Matrix eQTL.

We tested SNPs with a minor allele frequency above 5% and falling in a +/- 100 Kb region around each gene (in our data, it represents ~ 3M SNPs).

Why is my gene or SNP of interest missing from the results?

A SNP can be missing because:

  • it had a MAF < 5% in our sample (all individuals combined);
  • if that SNP did not pass our quality control checks post-imputation.

A gene can be missing because :

  • it was very lowly or not expressed in any of the experimental conditions tested (we required a median TPM value above 0.5 in at least one of the three conditions);
  • because no SNPs were available to test in the cis window around the gene.

Why only a single p-value is reported for populations differences in isoform usage (in contrast to one for each of the isoforms)?

To detect differences in isoform usage between African-Americans and European-Americans, we applied a multivariate generalization of the Welch’s t-test, which allowed us to test for differences in the distribution of gene isoform usage for all isoforms at once.

What does the column footprint refer to?

To investigate the regulatory mechanisms that account for immune QTL, we next profiled the genome-wide chromatin accessibility landscape of non-infected, Listeria and Salmonella-infected cells using ATAC-seq. Using ATAC-seq footprinting we identified transcription factor (TF) binding motifs likely to be occupied by their respective TFs in each of the experimental conditions studied. For each SNP we report the list of TF binding sites it overlaps (if any).

What are mean PST and ΔPST values?

PST is the phenotypic analog of the population genetic parameter FST, providing a measure of the proportion of overall gene expression variance explained by between-population phenotypic divergence. PST values range from 0 to 1, with values close to 1 implying that the majority of a gene’s expression variance is due to differences between populations. ΔPST quantifies the proportion of ancestry-associated expression level differences that stem from the strongest cis-associated variant.