This Is AuburnElectronic Theses and Dissertations

Assembly and Annotation of the Channel Catfish Transcriptome and Assessment of Pervasive Expression




Jiang, Chen

Type of Degree



Fisheries and Allied Aquacultures


For a long period of time, it was widely believed that only a small fraction of the genome is transcribed, and researchers were mainly focused on studying expression of protein-coding genes. In the last a few years, pervasive transcription became highly apparent from mounting evidence, therefore increasing attention has been given to the entire transcriptome including both coding and non-coding regions. However, the ability to assess the extent of pervasive transcription and expression depends on continuous and thorough analysis of transcriptome over space, time and various conditions. Catfish is the primary aquaculture species in the United States. However, its transcriptome has not been well characterized. In recent years, a number of RNA-Seq studies have been conducted, most of which have had a focus of expression profiling under specific stress conditions such as after disease infection, under high temperature exposure, or under hypoxic conditions. With these studies, a large number of RNA-Seq reads became available, totaling approximately six billion reads. These datasets make it possible to assemble a reference transcriptome for channel catfish. At the same time, these RNA-Seq were conducted using various tissues, allowing analysis of transcriptome level of expression profiling among various tissues. Likewise, systematic analysis of stress-induced expression is possible using these RNA-Seq datasets obtained after various stress treatment. The objectives of this study were to 1) assemble a reference transcriptome using all channel catfish RNA-Seq datasets; 2) annotate the protein-coding genes from the transcriptome; 3) identify a set of full length transcripts from the transcriptome; 4) analyze expression patterns of protein-coding gene along the channel catfish genome; 5) identify long non-coding RNAs and determine their expression patterns; 6) assess the extent of pervasive transcription in channel catfish; and 7) identify correlated expression of protein-coding genes and long non-coding RNAs. The reference transcriptome was assembled using both the de novo and the genome-guided assembly approaches. A total of 27,448 protein-coding genes were identified, of which 25,489 were homologous genes to known genes in other species, and 1,959 were unknown genes. Of the 27,448 protein-coding genes, 800 genes were not included in the catfish genome. Of all the protein-coding genes, full length transcripts were reconstructed for 20,371 genes. In addition to the protein-coding genes, a total of 36,266 long non-coding RNAs were also assembled and identified. Through mapping of all the short reads, coding or non-coding, to the catfish genome, 79.7% of the channel catfish genome was found to be transcribed. Mapping of the short reads to the genome allowed analysis of tissue-specific and stress-induced protein-coding genes as well as lncRNAs. A total of 1,455 genes and 2,599 lncRNAs were observed to be expressed in a tissue-specific manner, while 8,560 genes and 748 lncRNAs were differentially expressed after stress treatments such as disease infections, high temperature and short-term deprivation. Notably, the expression of 45 co-induced co-localized genes and lncRNAs sets were identified in this study, suggesting coordinated regulation of the protein-coding genes and lncRNAs. My dissertation work accomplished the set goals. The reference transcriptome will be a great resource for functional research and digital gene expression analysis in catfish. The full length transcripts will provide further assistance for improvement of genome annotation and constructions of intense phylogenetic analysis or structural analysis of orthologies. The identified set of tissue-specific genes and lncRNAs enabled greater understanding of organismal development, complexity at the system level. The identification of the lncRNAs followed with the initial characterization of expression profiles along with the protein-coding genes could contribute to the future understanding of the function and mechanisms of lncRNAs.