|dc.description.abstract||Two Castanea species (C. dentata, the American chestnut, and C. pumila, var. pumila, the Allegheny chinkapin, and var. ozarkensis, the Ozark chinkapin) are native to the USA. It has been difficult to differentiate the species based on morphological characters because of intra-specific variability and the incidence of chestnut blight, which has prevented trees from maturing to the point of flower and fruit production. To develop species-specific markers and to infer historical processes associated with the geographical distribution of plant populations, chloroplast DNA, nuclear DNA and 454 sequences were generated, with special emphasis on one Castanea population at Ruffner Mountain, Alabama.
The Ruffner Mountain Castanea tree population was analyzed based on leaf morphology, and sequencing analysis using several chloroplast DNA regions (trnT-L, trnL, ndhF, ndhC, orf62, and rpL16), and two informative nuclear regions (17, and 126). Comparative analysis with C. dentata, C. pumila var. pumila and C. pumila var. ozarkensis populations was conducted to infer the biogeographic history of the AL population. A total of 5 cpDNA haplotypes were detected at the Ruffner Mountain population, which can be used to divide the population into two main groups: C. dentata type and C. pumila var. pumila type group. Some mutational sites (two deletions at trnT-L region, one indel at ndhF region, one deletion at region ndhC, one SNP in region rpl16, one SNP in nuclear region 17 and one SNP in nuclear region 126) can be considered as species-specific markers to varying degrees. However, species identification had
better be based on morphology and combined sequence analyses. Phylogenetic analyses of the cpDNA data provided some evidence of the relationship among samples from different Castanea populations in North America, Moreover, the phylogenetic analyses of the nuclear data showed the possible origin of hybrid taxa.
To obtain more species-specific markers, cDNA from leaves of 5 individual C. pumila trees was isolated and sequenced using the 454 GS-FLX at the Genomics Core Facilities of Penn State University. A total of 1221540 reads, about 372 Mb of cDNA, was generated. The read length is between 36-603 bp, with an average length of 305 bp. A total of 47565 contigs and 77547 singletons, and 125112 unigenes were obtained from the 454 sequencing analyses. Through alignment of the individual reads against contigs from the assembly, 143792 SNPs were detected in the contigs, with average length of 222 bp per SNP. The proportion of transition nucleotide substitutions (29273, 21%) is much less than the proportion of transversions (109775, 78.9%). In addition, there are 2415 complex SNPs (variations of more than two nucleotides). Upon alignment of C. dentata and C. pumila contigs using Model SNP of the CLC genomic workbench software, a total 267874 inter-SNPs were detected. Nineteen contigs with possible species-specific markers were analyzed and two were preliminary validated. Multi-alignment of the C. pumila and C. dentata contigs with 3714 Arabidopsis single copy genes was conducted. Contigs from both species with a good match to a single copy gene were selected and re-aligned. Ten possible species-specific marker sites were examined and two showed species-specificity. More species-specific markers can be obtained this way. Gene ontology analysis of the C. pumila assembly showed high similarities to transcriptomes of other Castanea species with known genome sequences in the NCBI database.||en_US