Genomics & Informatics (Jun 2020)

Improving accessibility and distinction between negative results in biomedical relation extraction

  • Diana Sousa,
  • Andre Lamurias,
  • Francisco M. Couto

DOI
https://doi.org/10.5808/GI.2020.18.2.e20
Journal volume & issue
Vol. 18, no. 2
p. e20

Abstract

Read online

Accessible negative results are relevant for researchers and clinicians not only to limit their search space but also to prevent the costly re-exploration of research hypotheses. However, most biomedical relation extraction datasets do not seek to distinguish between a false and a negative relation among two biomedical entities. Furthermore, datasets created using distant supervision techniques also have some false negative relations that constitute undocumented/unknown relations (missing from a knowledge base). We propose to improve the distinction between these concepts, by revising a subset of the relations marked as false on the phenotype-gene relations corpus and give the first steps to automatically distinguish between the false (F), negative (N), and unknown (U) results. Our work resulted in a sample of 127 manually annotated FNU relations and a weighted-F1 of 0.5609 for their automatic distinction. This work was developed during the 6th Biomedical Linked Annotation Hackathon (BLAH6).

Keywords