This Is AuburnElectronic Theses and Dissertations

Development of the Catfish 250K SNP Array for Genome-Wide Association Studies




Liu, Shikai

Type of Degree



Fisheries and Allied Aquacultures


Catfish is the primary aquaculture species in the United States. In recent years, the catfish industry has encountered unprecedented challenges including increased feeding costs, devastating diseases and fierce international competition. Traditional selective breeding have been conducted for catfish breeds with improved traits of fast-growth, high feed efficiency and high level of disease resistance. Genomic selection that utilizes whole genome-based markers to assist selective breeding holds premise with increased selection intensity and accuracy. However, genome-scale genetic markers are required for its implementation, which has been a major limitation to most farming animals including aquaculture species. In recent years, next-generation sequencing technologies have enabled efficient and cost-effective identification of genome-scale genetic markers, such as single nucleotide polymorphisms (SNPs). With the availability of a large number of SNPs, the challenge then is how to genotype these SNPs efficiently and economically. One of the most efficient approaches is to design a high-density array that includes hundreds of thousands of SNPs covering the entire genome. Toward genomic selection in catfish, my research, as presented here, encompasses these two major progresses with the generation of genome-scale SNPs and the development of a high-density SNP array in catfish. Using the RNA-Seq approach, over two million gene-associated SNPs were identified from channel catfish and blue catfish, two of the most important catfish species, providing a large pool of SNP resources for designing SNP array. Criteria-based filtering resulted in hundreds of thousands of quality SNPs that are intra-specific in channel catfish, intra-specific in blue catfish, and inter-specific between the two species. This is the first application of next-generation sequencing technology in catfish for genome-wide SNP identification. With the large number of SNPs, it’s important to select SNPs to represent each gene because SNPs within same genes are always linked. In addition, pseudo-SNPs can be detected due to misalignment of paralogous sequences from duplicated genes. To generate the reference gene transcript sequences for SNP selection and to detect potential pseudo-SNPs derived from duplicated genes, the catfish transcriptome assembly and annotation was conducted by RNA-Seq of a doubled haploid channel catfish, which harbors two identical sets of chromosomes and therefore there should be no variations. A comprehensive set of catfish transcript sequences was obtained including over 14,000 full-length transcripts. A set of genes putatively duplicated in catfish genome were identified, which aided the detection of pseudo-SNPs. With these resources, the catfish 250K SNP array was developed with the state-of-the-art Affymetrix Axiom technology with inclusion of over 250,000 SNPs. This is the first high density SNP array developed for catfish, which should be valuable for both the catfish industry and research such as in genomic selection, genome-wide association studies, fine linkage mapping and haplotype analysis.