E3S Web of Conferences (Jan 2024)
Bayesian Belief Networks to handle NLP problems
Abstract
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, by a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. If rule –based algorithms are extremely complicated and expensive because they require a lot of rules to be taken into account, the stochastic algorithms seem to be more appropriate. POS tagging is the first step for named entities tagging, which is important for understanding the semantics of text. Recently, many deep learning models for POS tagging have emerged. Most of them are based on supervised learning and require a lot of processing power and time to obtain weights that allow you to get the right results for new data. Is it possible to use another probabilistic model for these purposes without training and on small data? We believe Bayesian Belief Networks could be such a model.
Keywords