Applications in Plant Sciences (Jul 2025)

The Computer‐Assisted Sequence Annotation (CASA) workflow for enzyme discovery

  • Gemma R. Takahashi,
  • Franchesca M. Cumpio,
  • Carter T. Butts,
  • Rachel W. Martin

DOI
https://doi.org/10.1002/aps3.70009
Journal volume & issue
Vol. 13, no. 4
pp. n/a – n/a

Abstract

Read online

Abstract Premise With the advent of inexpensive nucleic acid sequencing and automated annotation at the level of basic functionality, the central problem of enzyme discovery is no longer finding active sequences, it is determining which ones are suitable for further study. This requires annotation that goes beyond sequence similarity to known enzymes and provides information at the sequence and structural levels. Methods Here we introduce a workflow for generating highly informative, richly annotated sequence alignments from protein sequence data. Computer‐Assisted Sequence Annotation (CASA) is a freely available Python‐based workflow designed to automate portions of novel protein characterization, while producing a human‐interpretable final output. Results We demonstrate CASA using one enzyme from the Drosera capensis genome. The workflow generates detailed annotations providing comparisons to known reference sequences. In addition to sequence similarity and predicted function, user‐specified features such as active site residues, disulfide bonds, and substrate‐binding residues can be displayed, and these can then be combined with downstream analyses to gain new insights into enzyme structure and function. Discussion This work demonstrates the utility of detailed annotations and protein structure prediction for choosing protein targets for biochemistry or structural biology from nucleic acid sequence data. The toolchain is freely available along with instructions and representative examples.

Keywords