Tagger

9/18/2023

During training the N-Gram-Tagger determines for each combination of word plus \(N-1\) preceding PoS-Tags in the corpus which PoS-Tag is associated most often with the word. Training an N-Gram-Tagger: As for the Unigram-Tagger a large PoS-tagged corpus is required. \] Figure: A 3-Gram-Tagger determines the PoS-Tag of the current word, by taking into the account the current word and the PoS-Tags of 2 preceiding words. (PoS(word_),word_i) \rightarrow PoS(word_i), \quad \forall word \in V, Unigram-Tagging is erroneous whenever a PoS applies to a word, which is not the PoS that appeared most often with this word in the training corpora. Consequently a word is always tagged with the same PoS, independent of its context. However, it suffers from the drawback that only the word itself, but not its context is applied to determine the tag. Properties: A unigram tagger is simple to learn and to apply. This table can be applied to tag each word with its PoS. Tagging: The learned mapping is the two-column table of word and associated most frequent PoS. The result of the training is a table of two columns, the first column is a word and the second the most-frequent PoS of this word: Therefore English Tagsets differentiate these two cases.ĭuring training the Unigram-Tagger determines for each word in the corpus which PoS-Tag is associated most often with the word in the training corpus. However, in English there is such a rule (append 's), which is applicable in nearly all cases. Therefore the Stuttgart-Tübingen-Tagset does not distinguish these two noun-categories. For example in German there is no unique rule for the differentiation in noun-singular and noun-plural. The language: If a language is quite irregular, it does not make sense to distinguish PoS in a fine-grained manner, because a tagger would implement all these irregular cases, what may be too complex. The NLP tasks: for some tasks a fine-grained differentiation is not required Some tagsets distinguish quite a lot different tags, some only a few.

NLTK) sometimes a Universal Tagset for English is applied:

Depending on the language and the NLP task different tagsets can be applied.

0 Comments

Tagger

Leave a Reply.

Author

Archives

Categories