I'm an (American) English speaker, but I watch quite a bit of non-English movies and TV. Although I can ask for the location of the library or train station in multiple languages, I can only catch the occasional French or German word when watching TV.
I need the English-language subtitles. Quite often, they are included with the program material. In that case, they're usually the best choice. (I fully understand why they might not match a literal translation of the spoken dialog or even the foreign language subtitles, which might also not match a literal translation of the spoken dialog. And they also often don't match the spoken dubbed English; I don't like watching dubbed content.)
(I happen to know the difference between subtitles, closed captions, open captions, and subtitles for the deaf and hard of hearing. For the sake of simplifying the discussion, especially for readers who don't know the difference and don't care, I'm referring to them all as subtitles in this document even though most of them are not technically subtitles at all.)
If I can get a subtitle file in the original language of the show, there are numerous websites (for example) that can translate them to English for me. Although this translation does avoid a subset of the mismatches mentioned above, it has minor glitching of its own. For example, it often makes mistakes in pronouns when translating gendered languages, and sometimes a word with multiple meanings in the original language will be translated into the wrong choice in English. I can live with this small amount of noise.
I recently became interested in a TV series called The Danish Woman, aka Danska konan (not to be confused with the unrelated 2015 movie The Danish Girl). It's a French and Icelandic co-production. It takes place in Iceland. The spoken dialog in that series is Icelandic, Danish, and English. So far, I have not seen any French dialog. Getting a good set of subtitles in English so I can watch it without losing too much is quite challenging for this particular series because of the mixture of spoken languages and the video production treatment of them.
The rest of this essay is a description of how I worked through it to get "good enough" English subtitles.
Before I found out it was a French/Icelandic co-production, I assumed the best version would be what they show in Denmark. I obtained a copy of that, and boy was that incredibly wrong. It did have Danish subtitles, and I got a passable English translation of those. The audio track was the problem. When Danish was spoken, things were as you would expect. When Icelandic or English was spoken, the dialog was dubbed, but it was dubbed in a way I had never experienced before. The same dubbing voice was used for all characters, and the volume level of the dubbing was much louder than the non-dubbed Danish dialog. It was really distracting to try to watch that. I don't know if that's normal for Danish foreign-language content or if this is some weird one-off.
I went in search of something better. It was at that point that I learned that it was a French/Icelandic co-production, so I obtained the French and Icelandic versions of the series.
The French version's French subtitles gave a complete and usable English translation, but I didn't find that out until after I had spent quite a bit of time working through obstacles with Icelandic version.
In the Icelandic version, for reasons I don't fully understand, selected bits of Icelandic dialog were burned into the video itself. In other words, they were not subtitle tracks or closed caption tracks that could be turned on or off. They were always present on the screen. That's distracting, but even worse was that the Icelandic subtitles did not include those burned-in bits, so they were not included in the files I sent through translation to English. I used ccextractor to read those burned-in subtitles. It took a couple of minutes to process a 1.4gb file and worked reasonably well. It found 268 fragments. Some of those were extraneous OCR noise. Indeed, the original Icelandic subtitle file had only 233 fragments.
$ ccextractor --hardsubx --conf-thresh 60 --out=webvtt-full somefile.mp4
Now I had a file with just the burned-in dialog (plus some noise) and a file with everything except the burned-in dialog. My plan was to use Subtitle Edit to combine those into one file. I could either combine the Icelandic files and then translate that to English, or I could combine the English translations of both files. I don't think it makes any difference which way it's done.
That was starting to seem like a lot of bother, so before going further I decided to take another look at the French version that I had set aside earlier. I had assumed that it would be dubbed in French, which would be undesirable for me. In that case, I might use the translated French subtitle files with the Icelandic video files. I would have to adjust the time codes with some offset, which is no big deal with widely available tools like Subtitle Edit. I'd still have the problem with the burned-in fragments of Icelandic dialog.
Those turned out to be wrong assumptions. The spoken dialog was not dubbed at all. Furthermore, the French version did not have the burned-in Icelandic fragments. That dialog was included in the French subtitle files. So, in the end, I just translated the French subtitles to English and used the French video files. Sheesh, why didn't someone tell me that in the first place? Of course, the subtitles will now have been translated from Icelandic, Danish, and English to French, with that translation done with by humans or at least some video production process. Then the translation from French to English was done by some AI model or something, which has the warts mentioned above. Would it be better to translate the Icelandic subtitles to English and merge them with the translated French subtitles? Perhaps, but I am currently too lazy to do that experiment.
Of the Icelandic versions, two episodes did have alternative versions that already had English subtitles. I used those as-is since I expect them to be the best versions. I wouldn't be able to turn the subtitles off since they burned-in, but that's OK for my use case.
WJCarpenter
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.