Why does statistical machine translation quality degrade when translating legal texts using a corpus trained primarily on news articles?

A)News corpora lack named entities

B)News articles utilize simpler morphology

C)Domain mismatch skews word co-occurrence✓

D)Legal texts avoid common collocations

💡 Explanation

Statistical machine translation relies on word co-occurrence patterns learned from the training corpus; because legal and news domains exhibit different co-occurrence distributions, the domain mismatch causes the models to assign incorrect probabilities. Therefore, translation quality degrades rather than being maintained or improved, because the learned statistical relationships are invalid for legal language.

🏆 Up to £1,000 monthly prize pool

Ready for the live challenge? Join the next global round now.
*Terms apply. Skill-based competition.

⚡ Enter Arena

Why does statistical machine translation quality degrade when translating legal texts using a corpus trained primarily on news articles?

💡 Explanation

Related Questions