Close

Translating Word files to Braille

A project log for BrailleRAP DIY Braille embosser

An Open source Braille embosser in the spirit of RepRap

stephaneStephane 06/08/2023 at 22:450 Comments

Since we have tried MusicXML files a few days ago, i was wondering if there is a pandoc module available in python.

Pandoc is a well known open source command line software to convert file format, you can use it to convert html to pdf or Markdown to html ...

The main issue with word processor formats is that they can contain many features that are not really available in Braille. Different font and font size are not available in Braille, just because Braille characters is a normalized fixed size matrix of 6 or 8 dots depending of the Braille standard. 

So to convert an Open Office .odt file or a word .doc you need a tool to extract  plain text from these files. This is were pandoc can be useful,  pandoc as a feature to extract plain text from many file format.

After some internet search i found pypandoc, a python module to bridge pandoc with python software.

So i start a little test with AccesBrailleRAP.  Just like we already done it with MusicXML, i add a bacend python function which ask a file to the user, convert the file to plain text with pandoc, and return the result to the javascript frontend.

@eel.expose 
def import_pandoc():
    js =""
    root = tk.Tk()
    
    fname = tkinter.filedialog.askopenfilename(title = "Select file",filetypes = (("all files","*.*"),))
    #print ("fname", fname)
    root.destroy()
    if fname != "":
         
        linel = int (app_options['nbcol'])-1
        
        data = pypandoc.convert_file(fname, "plain+simple_tables", extra_args=(), encoding='utf-8', outputfile=None)
        #print (data)
        js = json.dumps(data)
    
    return js

and i just give it a try with a little openoffice test, just a text line with some format and a little table.

Starting AccesBrailleRAP i test the new import button

and select our open office .odt test file

Not bad, we get all the text, and the table is conserved

Converting that to Braille

Gotcha, we have a proof of concept. we definitely need to build an installer to include all the needed software to work with AccessBrailleRAP (drivers, pandoc, ...) but this a promising feature allowing anybody to open a word processor file, convert it into Braille, and emboss it. You don't even need to know anything about Braille !

Discussions