This Is AuburnElectronic Theses and Dissertations

Comparative genome analysis of related catfish species

Date

2014-12-12

Author

Zhang, Jiaren

Type of Degree

dissertation

Department

Fisheries and Allied Aquacultures

Abstract

Catfish belong to the order Siluriformes. They are one of the largest orders of teleost fish, with over 5,000 living species, accounting for approximately 15% of teleost fish species, or 7.8% of all vertebrates. Catfish are important commercial aquaculture species in many countries. Particularly, channel catfish (Ictalurus punctatus) is the leading aquaculture species in the United States, accounting for over 60% of all U.S. aquaculture production, and representing a two-billion-dollar industry in the United States. With the purpose of enhancing the quality and quantity of catfish production, genetic and breeding programs have been established. Selective breeding, and genetic enhancement programs such as strain evaluation, selection, intraspecific crossbreeding and interspecific hybridization have been utilized to improve various traits of catfish to enhance the quality and quantity of catfish production. Major genomic resources have been developed in catfish. These resources included ESTs, RNA-Seq datasets, various molecular markers, genetic linkage maps, physical maps, and BAC end sequences. These genomic resources supported the effort to generate the whole genome reference sequence, and could be utilized for genetic enhancement programs and functional studies. Recently, the channel catfish genome was sequenced from a double haploid template with coverage over 60X (unpublished data), and a total of 29,704 genes were predicted and annotated in catfish genome. Accuracy, completeness, contiguity and connectivity are four parameters most often used for the evaluation of whole genome assemblies. Accuracy refers to the percentage of correct base calling. Completeness refers to the percentage of the genome sequences being assembled. Contiguity refers to the length of contigs. Connectivity refers to scaffolding of contigs in the chromosome level. Draft genome sequences of related catfish species, ictalurid catfish (channel catfish and blue catfish), common plecostomus (Hypostomus plecostomus) and striped Raphael catfish (Platydoras armatulus) were generated and compared to assess the completeness of the channel catfish genome sequences. The two selected catfish species, common plecostomus and striped Raphael catfish, harbor scute-type scales while the channel catfish and blue catfish (Ictalurus furcatus) are both scaleless. Through comparative genome analysis, especially through comparison of their genes, insights into the possible genes or gene pathways involved in the scale formation were inferred. A total of 565.11 million and 326.33 million 100 bp paired-end reads were obtained through Illumina HiSeq2000 platform for common plecostomus and striped Raphael catfish, repectively. Genome size of two armored catfish species was predicted as 1.1 Gb for common plecostomus and 0.8 Gb for striped Raphael catfish. ABySS assembler was applied to provide a high quality de novo assembly. In common plecostomus genome, a total of 472,063 contigs were assembled with N50 contig size of 4,151 bp, and average contig size of 2,156 bp. Similarly, in striped Raphael catfish genome, a total 381,982 contigs were assembled, with a N50 of 4,413 bp, and average contig size of 2,086 bp. After the annotation, a total of 28,122 and 26,190 unique genes were obtained for common plecostomus and striped Raphael catfish, respectively. Twelve fish genomes were used for comparative genome analysis of their gene contents. These included zebrafish (Danio rerio), Atlantic cod (Gadus morhua), medaka (Oryzias latipes), tilapia (Oreochromis niloticus), fugu (Takifugu rubripes), cave fish (Astyanax mexicanus), stickleback (Gasterosteus aculeatus), spotted gar (Lepisosteus oculatus), coelacanth (Latimeria chalumnae), platyfish (Xiphophorus maculatus), tetraodon (Tetraodon nigroviridis), and ictalurid catfish. Genes from these species were compared with those of zebrafish, a species with relatively complete genome sequences. The number of genes that were identified from the zebrafish genome but not from the genome of other species varied from 780 (channel catfish) to 2,594 (fugu). Compared with the ictalurid catfish genome, the zebrafish genome ‘missed’ the smallest number of genes (987) followed by cave fish (1,537), spotted gar (1,559) and platyfish (1,853), and tilapia (1,882). Such comparative genome analysis could indicate that the ictalurid catfish genome sequences is as complete as the zebrafish genome, and better than other ten fish genomes. After filtering through all ictalurid catfish genomic resources and conducting a reverse blast search, a total of 167 genes were identified as armored catfish-specific genes. Among these, 93 genes were found from common plecostomus genome assembly, while 91 genes were found form striped Raphael catfish genome and 17 genes were overlap in both databases. Of these 167 genes, 11 gene functions can be searched based on their identities. Conserved syntenic analysis was applied to determine whether any armored catfish-specific genes were absent in the ictalurid catfish genome. As a result, of the 167 armored catfish-specific genes, there are 22 genes were found in a highly conserved syntenic block, suggesting that they might be truly missing from the catfish genome. Of these 22 genes, interphotoreceptor matrix proteoglycan 1, osteocrin-like, and protein pitchfork were annotated as functional genes. The osteocrin gene was reported to modulate osteoblastic differentiation, while pitchfork protein is highly related to Shh signaling pathway, thus these two genes may play important role in scale formation. In conclusion, ictalurid catfish genome is more complete than other fish species, and close to the completeness of zebrafish genome. Through the comparative genome analysis, a total of 167 armored catfish specified genes were identified in this dissertation. Conserved syntenic analyses indicated that 22 genes were present in zebrafish or other genomes but were absent from the channel catfish conserved syntenic blocks, providing stronger evidence that they might be truly missing from the catfish genome, but additional analyses are required to confirm this. This study is the first comparative genomics study among catfish species. It will supplement catfish genomic resources for researchers and provide a comparative genome analysis model for evolutionary studies.