About a week ago I was working on another piece of software that uses this library (which I may release if it ever works) and I needed to make a shape fit in a very small box (< 1 unit square) because I hadn't set up the camera yet. (That's an OpenGL technicality.) So, I decided that instead of scaling an SVG and using CAM on that, I would instead tell the CAM to use twice as many pixels per inch in order to halve the size of the model, theoretically. As it turns out, that causes the output file to be huge. I was trying to process it when I saw that the program had been running for five minutes with no results. Confused, I looked at the file with Atom only to see that it also took forever to open it. When I looked at the file size I noticed that it was 35MB, which would explain why everything involving it was so slow. I wanted to fix that, because the parser got through 1.6x10^6 lines before crashing with an unfortunate segmentation fault. Thus, multithreading support was under way.In order to understand why this is so difficult, one must understand something about how computers, and program optimization works. Generally speaking, as I do not want to go into incredible detail, the rule is that a program can be fully optimized for CPU efficiency, or memory efficiency, but not both. Compromises must be made. There are two "models" if you will, for these optimizations.
The stream model is memory efficient, but CPU intensive. Rather than store anything in memory, this model works on one item at a time, using only as much memory as is absolutely needed to work on the input. There is no input buffer, and no output buffer, and therefore compute time is based solely on CPU power and other intrinsic factors (ie bus speed, MMU latency, etc.) Before multithreading was added, this was the model used by this program.
The block model is CPU efficient, but memory intensive. Basically, it does the opposite of the stream model, and it buffers everything needed in memory, which allows the program to do things "concurrently" with an actor/event model, as well as switch states and resume without the need to complete one task before moving to the next. Many, many programs use this model because it is easier than the stream model.
Since multithreading is CPU efficient, it must make use of the block model. Unfortunately, that means that this program/library was unable to be multithreaded without an overhaul.
I opted for what most people would consider an overly complicated re-write because rather than going about it with a "get 'er done" attitude a la Larry the Cable Guy, I worked with an idealistic approach with the hopes of allowing the original goals of the program and now library I wrote to be met while using multithreading to take advantage of modern computer systems. I split the original RS274 class into three new classes, RS274Tokenizer, RS274Worker, and RS274. Each one builds on the previous one, where RS274Tokenizer is just like the old RS274 class, which is stream-based, and contains only extra data, namely the regex strings needed to do the search and parse operation. It is extremely lightweight. RS274Worker contains the tokenizer, plus a buffer for data to work on, and utility functions to set up and run. Finally, RS274 now contains only automation, making loading, creating threads, moving data, parsing, reconstruction, and finally cleanup a simple process.
The functionality remains the same, and the per-thread parsing speed is unchanged so far as I can tell - 125,000 lines of gcode in 33 seconds or so on a Sandybridge 3.1GHz processor. Now, the program can saturate system resources (processor wise) and use all available threads to work in parallel. Watching it work on a 2.6GHz Ivybridge 8-core i7 is amazing.
Most of the bugs are squashed at this point, I think. Now I'm just trying to make it "memcheck clean."