A linguist analyzes a large corpus of historical medical texts. Which failure mode most likely corrupts the accuracy of frequency-based analysis if OCR misinterprets similar-looking archaic characters?

A)Syntax errors increase exponentially

B)Semantic drift amplifies noise

C)Type-token ratio skews significantly✓

D)Part-of-speech tagging becomes inconsistent

💡 Explanation

OCR errors lead to incorrect identification of word tokens. Because the type-token ratio measures vocabulary richness, systematic misinterpretation of specific characters skews this ratio significantly; therefore, this ratio changes artificially, rather than syntax or semantic errors directly altering results.

🏆 Up to £1,000 monthly prize pool

Ready for the live challenge? Join the next global round now.
*Terms apply. Skill-based competition.

⚡ Enter Arena

A linguist analyzes a large corpus of historical medical texts. Which failure mode most likely corrupts the accuracy of frequency-based analysis if OCR misinterprets similar-looking archaic characters?

💡 Explanation

Related Questions