In computational lexicography, why does parsing a corpus to automatically extract candidate headwords often yield inaccurate frequency counts for morphologically complex languages such as Turkish?

A)Ambiguous part-of-speech tagging is resolved randomly.

B)Rare word senses bias overall frequencies.

C)Suffix stripping distorts semantic analysis.

D)Agglutination creates spurious independent entries.✓

💡 Explanation

Agglutination in languages like Turkish creates many surface forms from a single root, causing a basic frequency count of headwords to inflate the occurrence of these surface forms as if they were independent lexical items; because the morphological analyzer treats each inflected word as distinct, therefore the frequency of the root is underestimated, rather than semantic analysis being the primary issue.

🏆 Up to £1,000 monthly prize pool

Ready for the live challenge? Join the next global round now.
*Terms apply. Skill-based competition.

⚡ Enter Arena

In computational lexicography, why does parsing a corpus to automatically extract candidate headwords often yield inaccurate frequency counts for morphologically complex languages such as Turkish?

💡 Explanation

Related Questions