+ Reply to Thread
Results 1 to 10 of 10

Thread: My "Moses Starter Kit": Easy Machine Translation Implementation

 
  1. #1
    Forums BodyGuard AriSanguinetti's Avatar
    Join Date
    Sep 2006
    Age
    30
    Posts
    31
    Rep Power
    100

    Cool My "Moses Starter Kit": Easy Machine Translation Implementation

    Hi!

    Since I'm doing some research on Machine Translation, I would like to share with all of you what I call a "Moses Starter Kit":

    First of all you need to install the Moses Decoder, I recommend DoMY (Do Moses Yourself) because it's really easy to install and has a lot of potential with good documentation. (http://www.precisiontranslationtools...=26&Itemid=22, login needed).

    Probably you will need tools to handle bilingual data (clean, filter, sort, etc) and to export from TMX or XLIFF to txt (DoMY only works with txt files to translate).

    1. Arpentium Tools: Apertium is a toolbox to build open-source shallow-transfer machine translation systems, especially suitable for related language pairs: it includes the engine, maintenance tools, and open linguistic data for several language pairs. I've downloaded only the tools: (svn co https://apertium.svn.sourceforge.net...pertium-tools/)
    2. OKAPI Framework: The Okapi Framework is a cross-platform and free open-source set of components and applications that offer extensive support for localizing and translating documentation and software. (Downloads - okapi - Cross-platform framework for localization and translation tools - Google Project Hosting)
    3. M4Loc: The goal of this project is to provide tools to translate localization-specific formats with Moses and to integrate Moses in localization workflows (http://code.google.com/p/m4loc/downloads/list). THIS TOOL REQUIRES OKAPI FRAMEWORK INSTALLED.
    4. LFAligner: LF Aligner helps translators create translation memories from texts and their translations. It relies on Hunalign for automatic sentence pairing. Input: txt, doc, docx, rtf, pdf, html. Output: tab delimited txt, TMX and xls. With web features (LF Aligner - Browse Files at SourceForge.net)

    My own "Moses Starter Kit" is running Ubuntu 10.04 on a VirtualBox Virtual Machine with 2 processors and 1gb of RAM. It tooks like 5 hours to train the ONU + EC corpus (3 n-grams EN-ES).

    Hope this help and you are welcome to suggest other Machine Translation related tools.

    Saludos!

  2. #2
    Administrator
    Join Date
    Jul 2007
    Posts
    12
    Rep Power
    118

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    Excellent post!! Will try the tools and give you feedback!

    Kind Regards,
    James

  3. #3
    New Member
    Join Date
    Jan 2012
    Posts
    5
    Rep Power
    63

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    Thank you for recommending DoMY CE, Leonardo. We continue our work to improve the program and add new features. For example, one task on our roadmap is to integrate M4Loc into the DoMY workflow. That will require developing an automated installation for the Okapi Framework, which is also a significant task. DoMY ships with the Champollion sentence aligner, although how to use it is not documented in the manual. I also noticed you mentioned a 3-gram model. This is fine for testing, but for real work, you'll likely have to increase to 7 or 8, which means your VirtualBox environment will probably reach its limit.

    As a free open source project, we are looking to build a community that can contribute in all areas of our development. So, anyone with the interest and skills is welcome. There is also a professional upgrade for $50 which includes better data selection algorithms, huge file support. Soon. it will also include automatic downloads of EuroParl corpus for users to build a corpus after using the demo system.

    Thanks again,
    Tom Hoar
    Managing Director
    Precision Translation Tools

  4. #4
    New Member
    Join Date
    Jan 2012
    Posts
    4
    Rep Power
    0

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    Hi Ari and others

    Thanks a lot for the "Moses starter kit". To me it seems that all of its components are open source, including the operating system. Would it be possible to share/publish the whole virtual machine that contains the whole package? That would make using Moses even simpler!

    Best regards


    Niko
    MULTILIZER

  5. #5
    New Member
    Join Date
    Jan 2012
    Posts
    5
    Rep Power
    63

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    @Niko, Interesting you should ask. We had considered publishing a VMWare Server network appliance, but then VMWare deprecated the host environment.

    Digital Silk Road created an Amazon Web Services (EC2) image of Moses (not DoMY) that's still available. One of our users created their own EC2 image for a project last year. A few other customers use DoMY on a variety of VM architectures.

    IMO, a virtual environment is a great start for a learning platform. At some point, a serious user will need to invest in hardware. I remember one report on moses-support's listserv that one user ran tests on a host machine with 48 CPU's (I think that's a bit extreme).

    Likewise, I find it hard to believe recent blog references that some new moses users have invested $10,000 in hardware only to have an unusable system. Our DoMY Pro users receive a platform specification with 6 CPU's with excellent performance that tops out at about U$1,500.

    Nonetheless, virtual environments have their place. What VM architecture do you suggest? Maybe we can put it back on our roadmap this year.

  6. #6
    Forums BodyGuard AriSanguinetti's Avatar
    Join Date
    Sep 2006
    Age
    30
    Posts
    31
    Rep Power
    100

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    @Niko I will try to export my VirtualBox VM and share it with the Machine Translation Community. It won't be easy at all to upload it, with all the data and trainings I've made (now I'm playing with n-grams) its overall size is 90GB approx.

    @Tom as I mention before, I'm playing with n-grams to see the difference between each output, when you suggest 7/8 n-grams, it's for the lm, for the tm or for both? I'm still having issues running the 'Translate' graph (I followed your instructions on my thread at the DoMY forum but nothing happens), actually I'm only using DoMY for cleaning data and training engines.

    Regarding the VM architecture, I think the best option would be OpenStack, it's very easy to implement, Open Source and runs over Ubuntu Server.

    Saludos!

  7. #7
    New Member
    Join Date
    Jan 2012
    Posts
    4
    Rep Power
    0

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    Hello!

    Thanks for your message. I was thinking about publishing the very virtual machine that Ari created. On VirtualBox, that is. I personally have experience only on VMWare, though.

    For me, important point in using virtual machines is that one can so easily upgrade the hardware, without touching the virtual machine.


    Niko

  8. #8
    New Member
    Join Date
    Jan 2012
    Posts
    4
    Rep Power
    0

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    Oh, I can see that Ari just replied too (when I was writing my own reply). Just ignore my previous post.

    Ari, that would be great! It would surely make using Moses a lot easier. Maybe you can upload it as compressed file.

  9. #9
    New Member
    Join Date
    Jan 2012
    Posts
    5
    Rep Power
    63

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    @Ari, Some users (but not all) have reported a problem with the translate graph. So, we are working on that problem.

    Re the n-gram values. The manual includes short descriptions for the train-* commands' --lmgrams (-n) and tmgrams (-g) values. Yes, they can be and often are different values. The optimal values vary significantly based on language pairs and language styles (children's books with short sentences or DGT legislation with long sentences). So, you need to work with them for every new corpus to find the optimal combinations.

    Finally, our website user forum is horrible. We need a better forum package to share these exchanges and tips with our users. Does anyone know a good package, or does anyone have time to help us re-work the forum? Thanks.
    Last edited by tahoar60; 01-10-2012 at 12:24 AM.

  10. #10
    New Member
    Join Date
    Jan 2012
    Posts
    4
    Rep Power
    0

    Default Re: My "Moses Starter Kit": Easy Machine Translation Implementation

    @tahoar60, can you use Facebook as your user forum? Or LinkedIn?

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •