Genetic Complexity of Crohn’s Disease in Two Large Ashkenazi Jewish Families

Background & Aims Crohn’s disease (CD) is a highly heritable disease that is particularly common in the Ashkenazi Jewish population. We studied 2 large Ashkenazi Jewish families with a high prevalence of CD in an attempt to identify novel genetic risk variants. Methods Ashkenazi Jewish patients with CD and a positive family history were recruited from the University College London Hospital. We used genome-wide, single-nucleotide polymorphism data to assess the burden of common CD-associated risk variants and for linkage analysis. Exome sequencing was performed and rare variants that were predicted to be deleterious and were observed at a high frequency in cases were prioritized. We undertook within-family association analysis after imputation and assessed candidate variants for evidence of association with CD in an independent cohort of Ashkenazi Jewish individuals. We examined the effects of a variant in DUOX2 on hydrogen peroxide production in HEK293 cells. Results We identified 2 families (1 with >800 members and 1 with >200 members) containing 54 and 26 cases of CD or colitis, respectively. Both families had a significant enrichment of previously described common CD-associated risk variants. No genome-wide significant linkage was observed. Exome sequencing identified candidate variants, including a missense mutation in DUOX2 that impaired its function and a frameshift mutation in CSF2RB that was associated with CD in an independent cohort of Ashkenazi Jewish individuals. Conclusions In a study of 2 large Ashkenazi Jewish with multiple cases of CD, we found the genetic basis of the disease to be complex, with a role for common and rare genetic variants. We identified a frameshift mutation in CSF2RB that was replicated in an independent cohort. These findings show the value of family studies and the importance of the innate immune system in the pathogenesis of CD.

1 Supplementary Methods

Ancestry assessment
The Ashkenazi Jewish (AJ) ancestry of all individuals was confirmed by principal component analysis (PCA) using snpStats (v1.14.0 [1]) with a reference dataset of 471 unrelated individuals with four AJ grandparents [2] and non-Jewish populations (CEU (Utah residents with North and Western European ancestry) and TSI (Toscani in Italy)) from HapMap [3]. Related individuals and poorly genotyped samples and SNPs were removed. Common SNPs were extracted and pruned for LD (r 2 <0.2) in each dataset separately.

Marker set for linkage analysis
AJ specific RAFs for linkage were obtained using SNP data in 1,502 individuals of AJ ancestry, confirmed by PCA (as above) and extracted from a larger cohort on dbGAP [4] (phs000448.v1.p1) [5]. SNPs shared with those genotyped in the families were pruned for LD in the reference data at r 2 <0.2. The heterozygosity and RAF were computed using PLINK and the SNP with the highest heterozygosity sequentially within sliding windows of 0.1 and 0.5 cM were selected for the linkage map using a custom Python script

Linkage analysis
Linkage analysis was performed using Switflink. To account for the unknown disease penetrance, only affected individuals were consdiered. The Switflink MCMC was parallelised on four core-processors using default parameters and the average of ten replicates taken.

Samples used for exome sequencing
In addition to the affected individuals from the families, exome sequencing was undertaken on from a selection of unaffected family members and AJ controls. Specifically, in Family A, 23 unaffected family members were sequened comprising eight non-founder parents of affected individuals, six founders with no affected children and nine founders with one or more affected children. In Family B, 18 unaffected family members were sequenced comprising two non-founder parents of affected individuals, four founders The genetics of Crohn's disease in two large Ashkenazi Jewish families Levine AP, et al. with one or more affected children and 12 unaffected siblings or cousins of affected individuals. In addition, 31 unrelated AJ controls were sequenced.

Haplotype flow reconstruction
The full pedigrees, a total of 322 individuals in Family A and 132 individuals in Family B were manually divided into non-overlapping subfamilies of ≤ 28 bits (twice the number of founders minus the number of non-founders) such that each subfamily was within the computational capabilities of Merlin [6]. For Family A there were 21 subfamilies and for Family B, seven. Within each subfamily, the most likely pattern of gene flow was estimated by Merlin using the 0.1 cM SNP map. For each non-founder within each subfamily, the founder source of each allele for each marker was thus determined. However, as the maternal/paternal classification of founder alleles is random (as the parents are unobserved) reassembling the split haplotype flow data is not straightforward. This is further complicated when founders are ungenotyped. To achieve haplotype founder source matching, hypotheses representing the different haplotype matching scenarios may be tested by comparing the sum of the observed probabilities of identical-by-descent inheritance for all individuals sharing each haplotype for each marker with that expected assuming they do indeed match. Pairwise identical-by-descent probabilities for each marker were estimated across all pairs of individuals in each full pedigree using a multiple splitting approach similar to that described by Thomson et al. [7]. Sub-pedigrees for all x(x − 1)/2 pairs of individuals across the entire (pre-split) pedigree containing just the two individuals of interest, one genotyped sibling (if available, to assist with phasing) and their connecting ancestral relatives were generated; a total of 33,411 sub-pedigrees for Family A and 4,465 for Family B. Identical-bydescent probabilities using these sub-pedigrees were estimated using Merlin. In cases in which pairs of individuals appeared in multiple sub-pedigrees, the average identical-by-descent probabilities per marker across all observations of that pair of individuals was computed. Utilising these probabilities, haplotype matching was performed across the subfamilies progressively building up by adding one subfamily at a time. The maximum number of affected individuals sharing a founder haplotype within a particular family or subfamily was subsequently computed. These results were verified using Combinatorial Conflicting Homozygosity analysis [8].

Within-family imputation
For each variant, the imputation proceeded by first assigning the founder haplotypes for all wild type and homozygote individuals as reference and alternate, respectively. Next, all heterozygote individuals were considered and if one of their founder haplotypes was reference or alternate, the other founder haplotype was assigned as the opposite, respectively. For each variant, this was repeated until no more founder haplotypes were updated. Consistency checks were performed at each step to verify that the two founder haplotypes and genotypes for each individual were compatible. If the allele frequency of the variant in ExAC and in the AJ control data (AJex) was <0.01, it was assumed that unobserved founder haplotypes would be wild type for the variant. Finally, the genotypes of all individuals for which both founder haplotypes were known were imputed. A similar approach has been described by Song et al. [9]. When an imputation conflict arose (in which the imputed genotype differed from that observed by direct genotyping for those individuals from whom it was available or when a haplotype was assigned as harbouring both the reference and alternate alleles), all imputed genotypes for that variant were discarded and only the genotypes in those individuals directly sequenced or genotyped were retained.

Candidate variant genotyping
The DUOX2 and CSF2RB variants were genotyped in a selection of sequenced and imputed individuals for validation purposes. This was done by Sanger sequencing following PCR amplification of the flanking sequence. For the CSF2RB variant (chr22:37333972 GC/G, p.S709LX22), the forward primer was GTGGGAGGACAGGACCAAAA and the reverse was GGGAACTAGGGAGACAGACG yielding a product of 150 bp. For the DUOX2 variant (chr15:45402883 G/C, rs151261408, p.P303R), the forward primer was GCTGGAGAGATTTCCCTACTAAGC and the reverse primer was TCCTGTCTGAGTTGCTTCTCC yielding a product of 600 bp. In both cases, PCR was conducted using an annealing temperature of 60°C. tions were 5'-gactacaaggacgacgatgacaagGCACTCTCACTGCCCTGGGA-3' and 5'-cttgtcatcgtcgtccttgtagtcGTCCTGACTGCCCGATGGA-3' (FLAG encoding sequence in lower case). The products of the fusion PCRs were directionally cloned into the KpnI and PshAI sites of the HA-DUOX2 plasmid. The DUOXA2, DUOXA1 and DUOXA2-EGFP expression vectors were prepared as previously described [10]. All constructs were verified by bidirectional DNA sequencing.

Cell culture and transfection
HEK 293 were maintained in DMEM (Life Technologies, Carlsbad, CA, USA) supplemented with 10% heat inactivated fetal bovine serum. Adherent cells were transfected at 50-60% confluence using FuGENE 6 reagent (Promega, Madison, WI, USA). Plasmids encoding DUOX2 maturation factor (either DUOXA2 or DUOXA2-EGFP) were cotransfected at 13 ng/cm 2 of cell monolayer, whereas the amount of DUOX2 encoding plasmids was varied from 2.1-21 ng/cm 2 . Under these conditions, DUOXA2 is available in significant excess and does not limit DUOX2/DUOXA2 heterodimerization [11,12]. In all experiments, the total amount of DNA transfected per square centimeter of cell monolayer was kept constant by adjusting with empty pcDNA3.1 vector.

Hydrogen peroxide production assay
Release of hydrogen peroxide was determined by reaction with cell-impermeable 10-acetyl-3,7-dihydroxyphenoxazine (Amplex Red reagent; Life Technologies) in the presence of excess peroxidase, producing fluorescent resorufin. Briefly, cell monolayers in 24-well plates were incubated in HBSS/10 mM Hepes (pH 7.4) supplemented with 50 µM Amplex Red reagent and 0.1 U/ml horseradish peroxidase for one hour at 37°C. For stimulation of DUOX2 NADPH-oxidase activity, ionomycin (1 µM) and 12-O-tetradecanoylphorbol-13-acetate (TPA; 400 nM) were included in the reaction buffer. Fluorescence (ex/em, 530/595 nm) of the medium was measured within the linear range of the hydrogen peroxide concentration response curve and corrected for Amplex Red oxidation in wells containing cells transfected with empty pcDNA3.1 vector only. As internal control for transfection efficiency, Renilla luciferase activity from cotransfected pRL-Tk plasmid (2 ng/well) was determined in the remaining cells.
The genetics of Crohn's disease in two large Ashkenazi Jewish families Levine AP, et al.

Superoxide release assay
Extracellular superoxide release of cells resuspended in Krebs-Ringer-HEPES buffer (pH 7.4) was detected using Diogenes reagent (National Diagnostics, Atlanta, GA, USA). Chemiluminescence was recorded following the addition of TPA/ionomycin to 2 × 10 5 cells/200 µl reaction. Cells co-transfected with DUOX2 and DUOXA1 plasmids were used as positive control for O − 2 release [13,14]. Specificity of the assay for superoxide was ascertained by including superoxide dismutase (10 µg/ml) in parallel reactions.

Flow cytometry
To detect surface-expressed epitopes, cells were washed twice in PBS, and incubated for 30 min (15 min at room temperature, 15 min on ice) with either rat anti-HA (0.5 ng/ul of clone 3F10, Roche) or mouse anti-FLAG (1 ng/µl of clone M2; Sigma) diluted in HBSS/10 mM HEPES pH 7.4/1% BSA. Cells were washed twice in cold PBS, detached in 2 mM EDTA and fixed at 4°C in 0.75% formaldehyde. Bound antibodies were detected using Alexa Fluor 647-conjugated anti-rat or anti-mouse IgG, respectively. Cytometry data for 100,000 events per sample were acquired on a BD Accuri C6 Flow Cytometer (BD Biosciences) and appropriate FSC/SSC gates were employed to exclude cellular debris. Data were analyzed using FlowJo 8.8.7 software. Relative surface expression of DUOX2 was determined by calculating differences in total fluorescence intensity between the samples and an equal-sized population of control cells overexpressing DUOX2 but not its maturation factor. Without the latter, DUOX2 could not be detected in non-permeabilized cells ( Figure 4D). For detection of intracellular DUOX2, detached cells were first fixed in 0.75% formaldehyde/PBS at 4°C, then washed and permeabilized with 0.2% saponin in PBS/0.1% BSA. Binding of antibodies was done as above, but in the presence 0.2% saponin. Total DUOX2 expression was determined by calculating differences in total fluorescence intensity between the samples and an equal-sized population of control cells transfected with empty pcDNA3.1 plasmid. To test whether expression of the 303R variant interferes specifically with the surface expression of wild type (303P) DUOX2, cells were transfected with equal amounts of HA-tagged and FLAG-tagged DUOX2 constructs of either the 303P and/or 303R variants and the surface expression of the HA-tagged variant determined under each condition. Surface expression of the FLAG-tagged DUOX2 constructs is depicted in Supplementary Figure 4  For each experiment, the values for the 303R-DUOX2 variant were normalized to those for wild type (303P) DUOX2 (set to 100%). Bars indicate mean(± SD) from n=4 independent transfection experiments. *** p<0.001, ** p<0.01 (ratio paired t-test).