The lossy text compressor consists of a Perl script and accompanying thesaurus. On startup, the script opens the thesaurus file, and constructs a hash of "word" to "shortest synonym of that word". Some words are filtered out, e.g. abbreviations and 2-letter synonyms, because those don't meet quality standards.
After this is simple substitution of every word in the supplied text. This should result in a reduction of file size, while still being readable to most people. Further compression levels can be reached by running the compressor repeatedly - but note that text degradation ("generational loss") can result in an unreadable document.
The thesaurus format used is that of the Moby thesaurus, found here: http://icon.shef.ac.uk/Moby/mthes.html
A companion decompressor script works similarly but uses the longest synonym in place of the shortest.
Made a very naive awk script to readably compress huge log files. Not funny, but useable
https://github.com/tibolpol/logshort