This Is AuburnElectronic Theses and Dissertations

Understanding Teleost Genome Structure and Organization: Alternative Splicing, Gene Duplication, and Whole Genome Assembly

Date

2012-06-19

Author

Lu, Jianguo

Type of Degree

dissertation

Department

Fisheries and Allied Aquacultures

Abstract

We conducted both same-species and cross-species analysis utilizing the Genome Mapping and Alignment Program (GMAP) and an AS pipeline (ASpipe) to study AS in four genome-enabled species (Danio rerio, Oryzias latipes, Gasterosteus aculeatus, Takifugu rubripes) and one species lacking a complete genome sequence, Ictalurus punctatus. AS frequency was lowest in the highly duplicated genome of zebrafish (17% of mapped genes). The compact genome of the pufferfish showed the highest occurrence of AS ( 43% of mapped genes). An inverse correlation between AS frequency and genome size was consistent across all analyzed species. Approximately 50% of AS genes identified by same-species comparisons were shared among two or more species. We have analyzed gene duplication patterns and duplication types among the available teleost genomes and found that a large number of genes were tandemly and intrachromosomally duplicated, suggesting their origin of independent and continuous duplication. This is particularly true for the zebrafish genome. Further analysis of the duplicated gene sets indicated that a significant portion of duplicated genes in the zebrafish genome were of recent, lineage-specific duplication events. Most strikingly, a subset of duplicated genes is enriched among the recently duplicated genes involved in immune or sensory response pathways. Because of the rapid improvements in cost and quality of sequencing data, de novo sequencing and assembly is possible not only in large sequencing centers, but also in small labs. This project addressed the Message Passing Interface (MPI) version assembler software, MPI-Velvet. It can process high coverage data sets and quickly reconstruct the underlying sequences. The catfish genome database, cBARBEL(abbreviated from catfish Breeder And Researcher Bioinformatics Entry Location) is an online open-access database for genome biology of ictalurid catfish (Ictalurus spp.). It serves as a comprehensive, integrative platform for all aspects of catfish genetics, genomics and related data resources. cBARBEL provides BLAST-based, fuzzy and specific search functions, visualization of catfish linkage, physical and integrated maps, a catfish EST contig viewer with SNP information overlay, and GBrowse-based organization of catfish genomic data based on sequence similarity with zebrafish chromosomes.