How to create parallel texts for language learning – Part 1

I’d like to say a bit about ways to make parallel texts. I think parallel texts to be a very valuable learning resource, as I’ve mentioned in the past. They enable you to learn a language much faster than from textbooks, because they make an enormous amount of content instantly comprehensible.

Unfortunately, it’s nearly impossible to find parallel texts. The most common commercially available ones seem to be books of poetry and “classic” works of literature. Call me uncultured, but I usually get easily bored by books from the 1800s. I want something with an *interesting* plot, and I’ve been known to read a lot of fantasy and sci-fi, for which there are basically zero parallel texts commercially available. Also, the commercial ones are not usually sentence-aligned or even paragraph-aligned…at best they’re page-aligned, if that. For easy learning, you want all the little translated bits right beside each other for easy comparison.

So, for that reason, it’s more realistic to assume that you’re going to have to either make your parallel texts yourself, or get someone else to make them for you. To this end, I’ll give you a bit of info about how I do it, so that you can perhaps give it a try.

Ok, first the basics. What you’re going to start with is two ebooks. I don’t care where you get them, that’s not my problem. You might find public domain works at Project Gutenberg, or maybe you buy modern ebooks from online booksellers (for example, I found some Danish ebooks and mp3 audiobooks for sale here). Or maybe you borrow them from a friend. Ideally you want a place that doesn’t sell crippled files, like the bastards at audible.com. I really really want to buy a lot of their audiobooks, but I just can’t play them on my operating system due to their crippling DRM. Some places sell ebooks with DRM as well, which make them only viewable on certain devices, and prevent you from sharing them with your neighbour. This is bad…you should help your neighbour 🙂

Anyway, back to ebooks. So you need an ebook in your target language, and another one in a language that you understand really well (hopefully your native language, if such a translation exists). The next step is that you probably want a text format version of these ebooks, since that’s much easier to process than things like PDF and EPUB. There are some software programs that will convert between several different ebook formats, but I just use a document viewer called Okular, which is able to view a PDF or EPUB and then “export to text” to give me a clean file.

Next, you need a way to align these texts. What this means is that you’re going to create a file in which the equivalent paragraphs or sentences will match up with each other. For example, the one I’m currently reading has individual Dutch sentences on the left-hand column, and each sentence is matched with its English translation in the right-hand column. There are two main ways to achieve this. One is more time-consuming but technically very simple, and the other involves a bit of computer know-how but is much more time-efficient.

In this article, I’ll be describing the “easy” way, and then my next article will be for the people who know what I mean when I say things like “emacs”, “regular expressions”, and “Makefile”. You know who you are. For those who don’t recognize the software terms, but are still keen to put your growing computer skills to the test, be sure to take a look at that article when I publish it in a few days. That method requires much less manual repetitive work. But for now, the less-technical way!

First you change all empty lines (ie [ENTER][ENTER] ) in the book to something unique (like a weird character like Ĉ that doesn’t exist in that language) in order to save the paragraph breaks. Then you remove all remaining [ENTER]s from the document so it’s all one line. Now you go back and restore the paragraph breaks by changing Ĉ to [ENTER], which means each paragraph is now on a separate line. Do this for both copies of the book. This step got rid of a bunch of [ENTER]s that were just breaking individual sentences into a bunch of pieces unnecessarily. You only want the paragraphs to be divided, in this method.

Now that you have a collection of separate paragraphs, you open up a spreadsheet program (such as openoffice.org, gnumeric, koffice, or maybe that famous one from Microsoft, if you’re desperate), and you create a table with two columns and one row. Paste one language on the left-hand cell, and the other language on the right-hand cell. Now you just have to make sure that each paragraph lines up with its appropriate neighbour by adding extra [ENTER]s to make them even out. Sometimes you may have to bust 1 paragraph into smaller ones to do that.

At the end, once you know that they all line up, then you remove excess lines by changing [ENTER][ENTER] into just [ENTER] (perhaps multiple times if necessary), and now you have one paragraph per line. Now you copy-paste as table cells with one line per row, so the whole text of each language is still in one column. Now that each paragraph has its own row, then the matching paragraphs show up beside each other!

Now, just as a disclaimer, I’ve never actually done this method myself, so you might have to experiment a bit if you get stuck. I just wanted to mention a method that doesn’t require tons of in-depth computer hackery. I heard about this method from people who have used it successfully many times, and I’ve seen the result of their work (such as a paragraph-aligned Chinese / English Harry Potter, for example), so I know it can work well for some people.

Next time I’ll elaborate on my more automated process, but first I want to try and automate it a bit more. I think I can save a couple of steps in the sentence-dividing stage by using another little script, so then I might be able to automate the whole thing from start to finish. Hopefully this will also make it a bit more accessible to others as well.

Until then, keep reading!

24 Responses to How to create parallel texts for language learning – Part 1

  1. WC says:

    I haven’t tried to create my own parallel text before. Do you find they usually match up fairly well?

    I would think you could always rely on the chapters lining up, almost always rely on the paragraphs, and often rely on the individual sentences.

    If the grammar is a lot different, I think it would be harder to line up, too.

    • doviende says:

      The translations that I’ve seen for modern books have been very close. There are times when two sentences get merged into one, or vice-versa, but that doesn’t hinder me at all. I’ve been very satisfied with my sentence-aligned texts that I’ve made, and they’ve been really helpful.

      For something like Chinese with English, I would guess that it might be better paragraph-aligned, but honestly I’ve never tried yet. I’m sure some other people could comment about that here.

  2. Brian says:

    I am awaiting the second post as I want to be more familiar with your more advanced method. The way that I line up texts is by using a program called LF_aligner. Seems to do the job fairly well, but is far from perfect. I am looking forward to trying your method!

  3. Judith says:

    Just a quick note: you won’t want to change all these [ENTER]’s by hand. In Microsoft Word or other office programs, one [ENTER] is represented as ^p . So you can use the search & replace function to change ^p^p to ^p for example. In other tools, [ENTER] is the equivalent of \n or (if the file is Windows-encoded) \r\n .

    I’ll release a video of this on Youtube soon.

  4. […] to create parallel texts for language learning, оригинал которой находится здесь. Повествование от первого лица мужского пола […]

  5. durak says:

    There are tools to align texts.
    ABBYY Aligner, hunalign, etc.

    For Japanese and Chinese, you have to do it manually, but it is fast, if you know the two languages.

    The fastest way, though, is to make somebody else to do it. I have hundreds, if not thousands, of parallel novels, the majority of them aligned by somebody else.

    Aligning is not a problem, proofreading is a HUGE problem.

    I know nothing about computers, so I do everything the old fashioned way.

    But…
    if you want to have a job done really well, do it yourself.

    Anyway, the most important thing is how fast you read in your native language (between the lines included)and how fast you process what you hear.
    Parallel texts are very useful at the beginning, but after a while, they are not necessary, you just read the L1 text and listen to the L2 recording, even the L2 text is not necessary then.
    No scanning, no proofreading, no aligning! Just a book and a recording. What a relief!

    For Japanese and Chinese, parallel e-texts are a must, though, or else you’ll get stuck. And they are EXTREMELY useful to learn kanji and hanzi.

    • sleepymeepy says:

      Duran, you are way ahead of me. I think I need children’s books – perhaps up to HSK 4 vocabulary.

      I’m just wondering if you have already made some parallel ebooks at this level and whether I could buy from you.

      I’m severely disabled. My concentration and memory are both poor, so I will definitely have to get someone else to do it for me

      JJ

  6. Max says:

    @ author and commenters: You guys wouldn’t happen to remember where you came across those Chinese/English parallel texts? 😀

  7. Juzekk says:

    I just found nice tool doing everything quick and quite well: based on mentioned above hualign..

  8. Andrew says:

    Are there places where people share and trade their own parallel texts? You mentioned the English/Chinese Harry Potter one, I’m thinking there just must be a website or torrent tracker where people exchange these things.

    Cheers,
    Andrew

  9. Judith says:

    http://www.bilingual-texts.com has some, if you click on “Library”. I don’t think anyone will openly post Harry Potter parallel texts on such a website though (even though people commonly create these for their own use) because copyright enforcers are quick to warn people posting the text of the Harry Potter novels online. Best figure out who might have the parallel texts you’re looking for and then arrange for a private exchange.

  10. Aaron says:

    I love the idea of parallel texts, but this seems like a lot of work. Especially for someone like me who isn’t too technically adept. What about just buying the English and the target language copy of your favorite book and reading a chapter at a time? They are not side by side, but it’s easy enough to read a bit in one, and then skip back to read the other.

    I do like the idea of a community generated library of texts though that Andrew mentions. I’ll have to check out Judith’s site.

    • durak says:

      Aaron,
      you won’t be able to tell the difference until you actually try vertically aligned parallel novels with good audio. Add to that a mouse-over pop-up dictionary (and line-by-line audio).

  11. McSalty says:

    Something worth looking into is using Amazon’s Mechanical Turk (mturk) for laborious time-intensive tasks like this. Basically how mechanical turk works is you post a task and how much you’re willing to pay for it ($0.01 to $10). The thousands of people who use the website to find work then scan through tasks, and decide if it’s worth the price to complete it. It’s a really cheap way to outsource simple tasks.

    I’d be willing to pay a couple bucks just to have the bulk of this work done for me.

  12. Lynn says:

    Have you considered watching some of the thousands and thousands of Ted talks that are translated into dozens and dozens of languages?
    http://www.ted.com/pages/287
    I found this a great way to learn expressions not presented in a text book, e.g., “this blows me away.”
    The range of topics of the “ideas worth spreading” is phenomenal
    http://www.ted.com/themes
    and the speakers are impressive, including such folks as Bill Clinton, Bono, Jane Godell, the co-foundes google, etcetera
    http://www.ted.com/speakers
    The Tedx events are conducted independantly in countries through-out the world under the TED guidelines. But watching these in your target language (particularly if translated to your native language) is helpful for your language learning.
    Importantly, for language learners, when you achieve a high level you can challenge yourself by becomming a volunter translator yourself – a notable goal, that can be measured, while “giving back,” contributing to the world of “ideas worth spreading.”

  13. atzaatza says:

    Well the best thing is to aligne texts manualy (because auto alignment tools are never 100% accurate)
    I use Nova Text aligner it has all the tools (I manage to aligne novel sized text in less than 1 hour and it is perfectly aligned) BNest is you can export in ebook formats like epub,mobi or pdf:
    http://www.supernova-soft.com/c5/index.php/products/text_aligner/

  14. Yanis Batura says:

    You might me interested in ParallelBook (an ebook format for parallel texts) and Aglona Reader, a program for reading and aligning (manually) parallel texts in this format. The main feature of this format is that it allows to set correspondence not only between sentences, but also between consecutive parts of longer sentences. Aglona Reader is a free and open source program for Windows+.NET 4.0. More details here: http://goo.gl/7VMEK

  15. George says:

    Isn’t there a simple way, in Word, to have 2 columns with parallel text in translation? I do not find a template on Office.com and the Help function doesn’t help.

  16. alex says:

    take a look at nova text aligner,the best tool I have found up today
    http://www.supernova-soft.com/c5/index.php/products/text_aligner/

  17. Michael says:

    You can translate texts from a foreign language into your native language in writing, and then you can orally translate sentences from your native language into the foreign language you learn.
    I specialise in issues of learning and teaching English and took English as an example on the value of oral translation into English.

    Have you noticed that interpreters have to possess the most thorough knowledge of a foreign language, especially of conversation, vocabulary and grammar? Perhaps foreign learners of English can achieve fluency in English also through oral translation from their native language into English. It is possible to check oneself this way when practising speaking in English every sentence in ready-made materials with both a native language and English versions. I also believe that the value of oral translation from a native language into English with self-check is underestimated by English teaching specialists for self-study and self-practice of English conversation, vocabulary and grammar. Oral translation practice should cover English grammar, conversation and vocabulary. Thematic dialogues, questions and answers on conversation topics, thematic texts (informative texts and narrative stories), grammatical usage sentences and sentences with difficult vocabulary on various topics, especially with fixed phrases and idioms can be used in practising English through oral translation from one’s native language into English.

    I firmly believe that oral translation from a native language into English is effective in practising English speaking, vocabulary and grammar on one’s own with ready-made materials using self-check in a more logical, thorough, in-depth way as to content than casual talking to native English speakers. Practising English on one’s own through oral translation into English with self-check may be a quicker way for developing fluency in speaking English than casual talking to native English speakers with limited content.

    Of course everyday long-term talking to native English speakers on a multitude of topics is a top priority and a paramount factor for developing good English speaking skills by learners of English. Exercises in listening, speaking and reading in English that also cover English pronunciation, grammar, vocabulary and conversation on various topics belong to major English learning and teaching activities. I do not advocate oral translation into English as the only or the most important method in learning English grammar, vocabulary and speaking.

    However self-study and practising English on one’s own are indispensable, and substantially accelerate success in English. Communication with native English speakers can’t encompass all aspects of mastering English adequately and thoroughly, especially vocabulary, grammar, potential in-depth content of conversations suitable for real life needs of students for using English. It’s possible and effective to practise English (including listening comprehension and speaking) on one’s own through self-check using transcripts, books, audio and video aids.

    Oral translation into English allows speaking a wide variety of sentences on a multitude of topics with sophisticated important content (sentences) that are rarely widely used in daily life because of limited opportunity and limited content of communication of foreign learners with native speakers of English. Oral translation from a native language into English is very important and effective for foreign learners of English because oral translation into English creates solid additional extensive practice of English that is rarely possible in terms of comprehensive content in daily communication with native speakers of English.

  18. Linas says:

    I would suggest you to try my project for this: http://interlinearbooks.com/

  19. Henri Verschelde says:

    Does anyone have a Russian English parallel or Russian Dutch or Russian French parallel text for The Brothers Karamazov? I made some Russian-Dutch parallel texts myself and found some on internet which I can share but did not find the brothers Karamazov.

Leave a comment