The Computer‐Assisted Sequence Annotation (CASA) workflow for enzyme discovery

Gemma R. Takahashi; Franchesca M. Cumpio; Carter T. Butts; Rachel W. Martin

doi:10.1002/aps3.70009

Applications in Plant Sciences (Jul 2025)

The Computer‐Assisted Sequence Annotation (CASA) workflow for enzyme discovery

Gemma R. Takahashi,
Franchesca M. Cumpio,
Carter T. Butts,
Rachel W. Martin

Affiliations

Gemma R. Takahashi: Department of Molecular Biology and Biochemistry University of California Irvine 92697‐3900 California USA
Franchesca M. Cumpio: Department of Molecular Biology and Biochemistry University of California Irvine 92697‐3900 California USA
Carter T. Butts: Departments of Sociology, Statistics, Computer Science, and Electrical Engineering and Computer Science University of California Irvine 92697 California USA
Rachel W. Martin: Department of Molecular Biology and Biochemistry University of California Irvine 92697‐3900 California USA

DOI: https://doi.org/10.1002/aps3.70009
Journal volume & issue: Vol. 13, no. 4
pp. n/a – n/a

Abstract

Read online

Abstract Premise With the advent of inexpensive nucleic acid sequencing and automated annotation at the level of basic functionality, the central problem of enzyme discovery is no longer finding active sequences, it is determining which ones are suitable for further study. This requires annotation that goes beyond sequence similarity to known enzymes and provides information at the sequence and structural levels. Methods Here we introduce a workflow for generating highly informative, richly annotated sequence alignments from protein sequence data. Computer‐Assisted Sequence Annotation (CASA) is a freely available Python‐based workflow designed to automate portions of novel protein characterization, while producing a human‐interpretable final output. Results We demonstrate CASA using one enzyme from the Drosera capensis genome. The workflow generates detailed annotations providing comparisons to known reference sequences. In addition to sequence similarity and predicted function, user‐specified features such as active site residues, disulfide bonds, and substrate‐binding residues can be displayed, and these can then be combined with downstream analyses to gain new insights into enzyme structure and function. Discussion This work demonstrates the utility of detailed annotations and protein structure prediction for choosing protein targets for biochemistry or structural biology from nucleic acid sequence data. The toolchain is freely available along with instructions and representative examples.

Published in Applications in Plant Sciences

ISSN: 2168-0450 (Online)
Publisher: Wiley
Country of publisher: United States
LCC subjects: Science: Biology (General); Science: Botany
Website: https://bsapubs.onlinelibrary.wiley.com/journal/21680450

About the journal

Abstract

Keywords