Language Identification for Multilingual Machine Translation
Abstract
Machine translation is the process of
translating a text in one natural language into another
natural language using computer system. Translating a
document containing a single source language contents
is easy but when the information in the source
document is given in multilingual format then there is a
need to identify the languages that are involved in such
multilingual document. Language identification is the
task in natural language processing that automatically
identifies the natural language in which the content in
given document are written in. Language identification
is the fundamental and crucial step in many NLP
applications. In this paper, n-gram based and machine
learning based language identifiers are trained and
used to identify three Indian languages such as Hindi,
Marathi and Tamil present in a document given for
machine translation. The inclusion of language
identification component in machine translation
improved the quality of translation. Even google
translator is used for translation of identified language
to English.
