BMC Bioinformatics (May 2005)

BioCreAtIvE Task1A: entity identification with a stochastic tagger

  • Kinoshita Shuhei,
  • Cohen K Bretonnel,
  • Ogren Philip V,
  • Hunter Lawrence

DOI
https://doi.org/10.1186/1471-2105-6-S1-S4
Journal volume & issue
Vol. 6, no. Suppl 1
p. S4

Abstract

Read online

Abstract Background Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system 12. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-based part-of-speech tagger 3. Based on careful error analysis, we implemented a set of post-processing rules to correct both false positives and false negatives. We participated in both the open and the closed divisions; for the open division, we made use of data from NCBI. Results Our base system without post-processing achieved a precision and recall of 68.0% and 77.2%, respectively, giving an F-measure of 72.3%. The full system with post-processing achieved a precision and recall of 80.3% and 80.5% giving an F-measure of 80.4%. We achieved a slight improvement (F-measure = 80.9%) by employing a dictionary-based post-processing step for the open division. We placed third in both the open and the closed division. Conclusion Our results show that a part-of-speech tagger can be augmented with post-processing rules resulting in an entity identification system that competes well with other approaches.