GSoC/GCI Archive
Google Code-in 2012 Apertium

Investigate how to extract parallel text from Armenian-Russian-English website

completed by: Ulysses

mentors: Francis Tyers, Jonathan

This site appears to have many articles in Armenian, English and Russian:

http://www.aravot.am/

The task will be to investigate the multilingual structure of the site and recommend how the articles can be best identified, downloaded and aligned.

e.g. consider:

http://www.aravot.am/2012/12/04/136979/

http://www.aravot.am/ru/2012/12/04/136979/

http://www.aravot.am/en/2012/12/04/136979/

 

 

OSZAR »