Is there a memory-efficient way to compress/average/combine time series data in a non-linear way?
Simon Merrett wrote 04/15/2020 at 15:51 • 0 pointsA mouthful of a .stack title - sorry about that.
I'm recording temperature and I'd like to show the most recent readings in reasonably high resolution but also be able to see readings taken a long time ago in much lower resolution, for trend spotting etc. Perhaps a semi-log graph would suit this.
Does anyone have a good way to calculate the plot values while storing as few variables in an array as possible? The principle being recent readings are shown with high time resolution and as the age of a reading increases, the resolution along the time axis decreases (just like a log scale, but it doesn't have to be a log).
For example, the first pixel would show the current second's reading (t0), the second pixel would show the mean value over the last 5 seconds (t1-t5), the third pixel would show the mean reading of the previous 10 seconds (t6 - t15). The problem I see is that once I calculate means for longer periods, the type of scaling from say 1 second to 12 hours over 64 pixels/points means there isn't a neat way to shunt values into the next column over. So it looks like I need to store the underlying data at a higher resolution (thus more memory) than I would be presenting it at.
There must be a time series compression approach that works for this. It could use bit twiddling, it could use a similar approach to an R-2R ladder DAC. But has anyone got any tips or links to similar problems and solution approaches?
Thank you.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
Hi @Simon Merrett , this is an interesting topic and could open up a research filed in itself. But if I try to stay pragmatic and think about all existing solutions that try to tackle the issues we have, this feels like the same idea that sparkled the initiation of a so called "time series database". Yes, your use case is specific to time series, and yes an SQL or graph or a no-SQL database would be a waste of time.
Time series database try to be efficient, but they might not be as efficient as you can program yourself in c++ for a very specific use case such as logarithmic temporal resolution or something more exotic.
Yet, I must admit, I was in the same situation as you are and I did develop my own c++ server app for sensor data using websockets and poco netwroking library. It took me quite sometime to dare read about influx db and its usage.
I do not say that influx db is the best solution, but I do believe that a time series database is the best solution because it will abstract the task from your communication protocol and provide you usable tools.
Such time series database do have so called kernels, that runs on the server to whom you describe in your query which sort of data you want with which resolution, you can create rules to store data in different resolutions.
Here an example of a simple python scripts that sends data to influx db and result can be obtained with grafana
https://github.com/HomeSmartMesh/raspi/blob/master/py/influx/influx_client.py
But I understand that it does not answer your question on the logarithmic storage of data, in practice, I have not seen examples showing time distorted data, as it might also be harder for users to comrehend.
Yet, what I suggest is to look on the graphics compression of textures and the mip-mapping concept which is in itself a multi-resolution format, maybe a similar format but uni-dimensional could be what you're looking for.
Are you sure? yes | no
@Wassim wow, great stuff - thank you. I will definitely check those out. I think I have found a way to approximate log10 using the round robin ring buffers idea that @Krzysztof pointed me to. Mip-mapping is a new one to me and looks very interesting. And I will definitely take into account the challenge of a UI that needs to help users comprehend the time distortion aspect.
For anyone else reading wanting to know my thoughts on the ring buffers, I'll reply with a link as soon as I'm able to convey it.
Are you sure? yes | no
If you want to go timeseries way (do not do it for less than 10 sensors) I suggest timescaledb:
https://blog.timescale.com/blog/what-the-heck-is-time-series-data-and-why-do-i-need-a-time-series-database-dcf3b1b18563/
And covering your case:
https://blog.timescale.com/blog/how-to-proactively-manage-long-term-data-storage-with-downsampling/
Are you sure? yes | no
I admit, some decision has to be made to make your design simple. In case you accept to destroy data and only keep lower rest for older ones. If you want to keep all the data but only access older one in lower res, or if you want to access all the data, with a zoom effect having the center data as full scale and left right with log compression.
I reacted with a long answer because I thought about this when I started my sensors db, then did not follow it up.
Another complication that might hinder you is if you want to split a database on a server / client architecture, or if everything runs locally. The latter is obviously much simpler, and even trivial if your data can fit in RAM which I assume is a reasonable assumption.
After giving it another read of your description, there could be a way where you don't need to store more data then you show, or slightly more but proportional and not more than double, if you chain the buffer with coefficients
pix2<= (pix0 + pix1)/2
pix1<=pix0
pix0<= new_val,
And then by using coefficients, you could have state variables and you assign the pixels only an interpolation from state variables depending on the pixel position relative to the state variable.
The challenge is that it would be easy to cluster your window with different zones having different resolutions, but that will create a stair effect that emerge from that while the real curve is smooth.
I think that a log buffer combined with an interpolation of pixels to show could be something, but warning , I'm not a mathematician, there could be a similar concept to find that does that.
an example (but without the div by two pipeline, just the rolling average one)
https://www.mathworks.com/help/dsp/ug/sliding-window-method-and-exponential-weighting-method.html
"Logarithmic timeline" is another keyword popular in history of the earth and stock analysis
Are you sure? yes | no
size - how many values (pixels on graph) you want to store in buffer.
window (or step) - how long in time one pixel encompasses
size*window - how long in time whole buffer stores
so, you want to create buffers:
A - size:60, window 1 (last minute, pixel every second)
B - size:60, window 60 (last hour, pixel every minute)
C - size:24, window 3600 (last day, pixel every hour)
D - size:30, window 86400 (last month, pixel every day)
Those buffers can be longer, they may overlap if you need, they are independent.
Then you insert your current value into each of those buffers, but each one updates it's window (or current sample) independently. For more, please read https://oss.oetiker.ch/rrdtool/doc/rrdupdate.en.html and https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html , I don't know any more than this, I've only seen this tool used and have knowledge of general concepts.
Are you sure? yes | no
Thanks so much - I'll go look at those links.
Are you sure? yes | no
https://en.wikipedia.org/wiki/RRDtool that database is used in munin. It's a series of circular buffers which are averaged or use other functions and each buffer consolidates data with different frequency. So you can have hourly, daily, monthly, yearly buffers and they consolidate data as packets with different timesteps, for example yearly buffer averages all data for a whole day and stores only 365 last days.
Are you sure? yes | no
@Krzysztof thank you - that's definitely an implementation I could borrow from. If I'm correct, to do e.g. log10, you'd need a circular buffer of size = 10 for each order of magnitude you plan to represent, and then just map the number of pixels to some representative width - ie 1 to 2 takes up more screen space than 8 to 9. But within 10 to 20, you just get the same value across all pixels, rather than a compressed plot of the previous 10 hours, without yet another circular buffer?
Are you sure? yes | no
No, size of buffer is number of pixels. If you need log 10, you make buffers with bigger windows (each next one with 10x bigger window). When you insert data, you insert it into each buffer, but they insert and average those data each with it's own window size. One window=one pixel.
Are you sure? yes | no
@Krzysztof thanks for the clarification. I'm a bit confused though. If I have a 60 pixel wide plot, I make a buffer A with size = 60. But how do I make windows of say 1, 10, 100 on that buffer? 1 and 10 are fine but 100 breaks it. Or do I need to cascade them and have buffer B with 0-9 (size 10), buffer C with 10-99 (size 10), buffer D with 100-999 (size 10), buffer E with 1000-9999 (size 10). Then buffers B-E feed buffer A, interpolated where required?
Are you sure? yes | no