Mucosal Genomics Implicate Lymphocyte Activation and Lipid Metabolism in Refractory Environmental Enteric Dysfunction

Background & Aims Environmental enteric dysfunction (EED) limits the Sustainable Development Goals of improved childhood growth and survival. We applied mucosal genomics to advance our understanding of EED. Methods The Study of Environmental Enteropathy and Malnutrition (SEEM) followed 416 children from birth to 24 months in a rural district in Pakistan. Biomarkers were measured at 9 months and tested for association with growth at 24 months. The duodenal methylome and transcriptome were determined in 52 undernourished SEEM participants and 42 North American controls and patients with celiac disease. Results After accounting for growth at study entry, circulating insulin-like growth factor-1 (IGF-1) and ferritin predicted linear growth, whereas leptin correlated with future weight gain. The EED transcriptome exhibited suppression of antioxidant, detoxification, and lipid metabolism genes, and induction of anti-microbial response, interferon, and lymphocyte activation genes. Relative to celiac disease, suppression of antioxidant and detoxification genes and induction of antimicrobial response genes were EED-specific. At the epigenetic level, EED showed hyper-methylation of epithelial metabolism and barrier function genes, and hypo-methylation of immune response and cell proliferation genes. Duodenal coexpression modules showed association between lymphocyte proliferation and epithelial metabolic genes and histologic severity, fecal energy loss, and wasting (weight-for-length/height Z < -2.0). Leptin was associated with expression of epithelial carbohydrate metabolism and stem cell renewal genes. Immune response genes were attenuated by giardia colonization. Conclusions Children with reduced circulating IGF-1 are more likely to experience stunting. Leptin and a gene signature for lymphocyte activation and dysregulated lipid metabolism are implicated in wasting, suggesting new approaches for EED refractory to nutritional intervention. ClinicalTrials.gov, Number: NCT03588013. (https://clinicaltrials.gov/ct2/show/NCT03588013)


Supplementary Methods
-page 4  Anthropometry data were collected monthly. Child length was measured from birth to 24 months, and we refer to length/height throughout. Blood, urine and fecal samples were collected at 9 months of age. Nutritional intervention according to Pakistan's Community Management of Acute Malnutrition protocol using high calorie AchaMun therapeutic food [520 kcal, 13g protein (10%), 29g fat (50%) and essential fatty acids (EFA)] and close monitoring was offered to 189 cases with WHZ < − 2 at age 9-10 months up to the age of 12months.
The subset of malnourished children (63 children) who failed to respond to educational and nutritional interventions was evaluated by Esophagogastroduodenoscopy (EGD) at the AKU hospital to identify treatable causes of malnutrition and biopsy specimens were obtained for detailed assessment of histopathology, gene expression, and methyl-chip to better characterize the pathophysiology of EED. Histology was evaluated centrally at AKU (Table S2)

Immunohistochemistry
Formalin fixed and paraffin embedded (FFPE) tissue samples were obtained. The AKU tissue samples were placed in tissue microarrays using a semi-automated apparatus (TMArrayer,

Biomarkers
Blood Cytokine/Chemokine Assays: The MILLIPLEX MAP Human Cytokine / Chemokine panel (EMD Millipore corporation, Billerica, MA). Serum samples were diluted 1:100 in a serum matrix provided in the kit. The coating beads are pre-mixed and were used as per manufacturer instruction. Beads for nine analytes were multiplexed on magnetic beads (IFNγ, TNF-α). Beads were mixed and washed with bead diluent. Samples, and standards were incubated with capture beads at room temp for 2hrs, followed by addition of detection antibodies (steptavidin-phycoerythrin). The intensity of fluorescence is directly proportional to the amount of analyte concentration. Both Mean MFI and concentration of analytes were calculated using standard curve (10,000 -3.2 pg/ml). Plates were run on Bioplex 200 platform using exponent software.

Blood Pre-Albumin
Pre-albumin is a marker of malnutrition. A concentration less than 10 mg/dL is associated with malnutrition 4 . This test measures pre-albumin level based on immunoturbidimetric assay or nephlometry techniques. Pre-albumin is detected in blood using goat polyclonal antibody specific for human pre-albumin. Concentration of pre-albumin determined in mg/dL by measuring the turbidity of immune complex at 340nm. The distribution of intensity of the scattered light depends on the ratio of the particle size of the antigen-antibody complexes to the radiated wavelength.
Blood Alpha-1-acid Glycoprotein (AGP), Ferritin, and C-Reactive Protein (CRP) AGP or Orosomucoid, Ferritin, and CRP were analyzed on automated biochemistry analyzer, Roche/Hitachi 902 (Basel, Switzerland). The principle of assay is the agglutination of analyte using specific antibodies present in commercial reagents. The sensitivity of AGP assay is 10mg/dL.
The lower limit of detection of Ferritin is 0.1 ng/mL and for CRP is 0.001 mg/dL. All three inflammatory biomarkers were analyzed previously in Pakistani EED cohort 5 .
Blood Insulin Like Growth Factor-1 (IGF-1) IGF-I was analyzed on LIAISON Diasorin (Italy). The principle of the assay is "Chemiluminescence technology" with paramagnetic microparticle solid phase. The detection limit was 3 ng/ml 5 .
Blood Glucagon Like Peptide 2 (GLP2) GLP2 is a marker of enterocyte proliferation and regeneration. GLP2 was measured in neat serum sample using competitive inhibition ELISA (Cloud-Clone Corp. Fern Hurst Dr., Katy, TX, USA).  In those children who underwent endoscopy at AKUH, fecal calorimetry (6200 Isoperibol Calorimeter; Parr Instrument Company, Moline, IL, USA) was performed (n=47) to obtain macronutrient specific determination of fecal energy. Caloric energy content of a fecal aliquot was calculated. All samples were run in duplicate and the final value (cal/g) was averaged. For each run, the calorimeter was calibrated using a Benzoic Acid tablet (known energy=6318 cal/g). Benzoic Acid tablet was used as a spike-in with each stool sample, and samples were run with and without spike-in and the Avg cal/g (minus Benzoic Acid) was calculated.

TAC giardia
For the subset of AKU cases that underwent endoscopic evaluation (n=50) a duodenal aspirate was obtained. Duodenal aspirates were measured for presence of giardia (TAC Assay, CT<35 was and RNA targets, respectively for validation and monitor the efficiency of extraction, reverse transcription, and amplification steps. TAC protocol was performed as described previously 10 .
Briefly, a total of 100 μL of TNA was eluted from the MagNa Pure analyzer, 40 μL of TNA was added with 60 μL of Agpath one-step RT PCR master mix, containing nuclease free water, 2X Agpath buffer and enzyme (Thermofisher Scientific, US), followed by loading on a TAC card. The card was sealed and ran on QuantStudio 7 Flex platform (Applied Biosystems, Thermo Fisher). A total of eight samples were run on a single card, extraction blanks (consisting of nuclease-free water, spiked with PhHV and MS2) and PCR blank (consisting of nuclease-free water only) were also ran after every 3rd card to rule out false positivity/negativity aspects. Sample was considered valid positive if 1) the sample's target Ct value was less than 35.0, 2) the reference extraction blank was negative for each target, and 3) the internal controls (PhHV, MS2) had a Ct value less than 35.0. These TAC cards were customized to detect common enteropathogens. Of the 50/52 that had TAC analyses of their duodenal aspirates; 32 (64%) were positive for giardia, 6 (12%) were positive for campylobacter, and 5 (10%) for cryptosporidium, while other pathogens showed less frequent positive results.

Transcriptome analysis
The duodenal biopsy global pattern of gene expression was determined using RNAseq on the Illumina platform as previously described 12,13 . In short, RNA/DNA were isolated from one duodenal biopsy obtained during diagnostic esophagogastroduodenoscopy using the Qiagen The mean(SD) RPKM APOA1 gene expression at diagnosis was equal to 927(1469) in Crohn disease and was equal to 3012(3080) in healthy controls. We anticipated similar differences between environmental enteropathy and healthy controls in the current study. Based on these results, 30 healthy controls and 50 environmental enteropathy subjects without a specific treatable infection provided 90% power to detect such a difference with α = 0.05.

Identifying co-expressed gene modules and intramodular hubs
Weighted gene co-expression network analysis (WGCNA) was implemented to identify modules of co-expressed genes 21 . The WGCNA framework identifies co-expressed gene clusters using pairwise correlations (eq. 1) between gene expression profiles across all the samples. We have used a signed version of the WGCNA framework to distinguish between positively and negatively correlated genes. Additionally, negatively correlated genes often belong to different biological categories and therefore should be placed in different modules. The gene co-expression similarities are converted to signed adjacency values using the power adjacency function (eq. 2). The adjacency matrix representing the connection strengths between genes is non-negative. , where , is the Pearson correlation coefficient between the expression profiles of a pair of genes and . The parameter of the power function is usually selected based on the scale-free topology criterion. In this study, we found the optimal value of the power parameter to be equal to 5. The adjacency matrix is then used to calculate topological overlaps (eq. 3) between each gene pair.
The topological similarities represent the interconnectedness of two genes in a given gene co- where denotes the expression profile of a gene and is the eigengene of module .
Similarly, the gene significance (GS) score is the correlation value between the expression vector and the clinical trait (disease diagnosis in this case) (eq. 5) , where is the clinical trait of choice. We computed for each gene as the harmonic mean of module membership and gene significance scores (eq. 6).
Any gene with a high hub score is not only central to a given module but also strongly correlated with disease diagnosis. Finally, we identified hub genes with strong (top 10%) MM and GS scores within each of the candidate modules. These hub genes are hypothesized to be the key driver genes associated with the disease and were used for additional downstream analyses.  (Table S3) and each batch was analyzed separately. PCA was done as part of the quality control (QC) with no separation on PCA plot by gender after excluding sex chromosomes with separation by diagnosis observed for both batches (EED and controls, Figure   S7). Identification of rDMRs: Following identification and annotation of significantly differentially methylated regions (DMRs) and differentially expressed genes or co-expression hub genes, rDMRs were defined as the overlap of both groups according to their unique gene symbol annotation.

Statistical methods
SEEM is reported as per the STROBE statement for observational cohort studies. The primary objective of this study was to identify the association of clinical risk factors and biomarkers with the HAZ at 24 months, the outcome of interest in this study. In addition, we investigated factors linked with 24 months WAZ, and WHZ at 24 months.
Sample size: we planned to enroll 350 children from 0 to 6 months with weight for length/height Z score (WHZ) <-2 at the time of enrollment and 50 children of the same age who would be growing normally, with WHZ > 0, to serve as controls. This cohort will serve as a validation set of the promising biomarkers identified in our previous Pakistani EED cohort 5   *Biomarkers measured at 9 months were measured in blood unless indicated elsewhere. § Entry refers to requirement around birth. AKU; Aga Khan University, HAZ; length/height-for-age Z score, WAZ; weight-for-age Z score; WHZ; weight-for-length/height Z score. Data are expressed as median (25 th , 75 th percentile) for all continuous variables.

Figure S5. Duodenal Histology Scores for the Cincinnati Controls and Celiac Disease
Patients and the AKU-EED Participants. A. Cincinnati control (n=10), Celiac Disease (n=10), and AKU-EED (n=52) duodenal biopsies were read centrally by two AKU pathologists using the EED histology scoring system as in Table S2. All controls were scored as zero for the four characteristic features of EED shown, with the AKU-EED duodenal biopsies exhibiting an intermediate score between celiac disease and controls for intra epithelial lymphocytes and villus architecture, and higher scores for reduced Paneth cell density. Differences between groups were tested using Kruskal-Wallis with Dunn's multiple comparisons test, *p<0.05, **p<0.01, ***p<0.001. B. Representative images are shown illustrating villous blunting in celiac disease and EED cases. The bar represents 1000 microns. Figure S6. Duodenal Immunohistochemical analysis for the Cincinnati Controls and Celiac Disease Patients and the AKU-EED Participants. Immunohistochemistry was performed using antibodies against DUOX2 (yellow chromogen) and LCN2 (teal chromogen) in a dual stain, and against GNZB (brown chromogen) in a separate single stain. A. Immunohistochemical analysis for LCN2 and DUOX2 in control show no staining. Original magnification x200. B. Immunohistochemical analysis for GZMB in control, Celiac Disease, and EED. Original magnification x200, and the inlet area for each image is shown. C. Data for the GZM cell count normalized against the total area of tissue in each sample, are shown for controls (n=10), celiac disease (n=10), and EED (n=57). Poisson regression, sample type change from EE to celiac expected count decreases by 88% and from EED to control by 89%, **p<0.01.  (Table S3) and each batch was analyzed separately. Unsupervised principal component analysis (PCA) to 2 components (PC1 and PC2) shows separation by diagnosis in both batches and no separation by gender as expected after excluding XY genes.