Techniques Review
We are going to consider only time series analysis in this section. There are reviews of "complexity measures" (Feldman and Crutchfield 1998, Daw, et.al. 2002). Kuusela, et.al. (2002) compare the results of ten different complexity measures on heart rate and blood pressure time series.
- Information theory estimates of complexity
In general the entropy of a discrete probability distribution is given by
H = sumi(-pi*log(pi))
wherei
indicates one of the discrete states andsumi
means to sum over all the discrete states. The entropy is larger if each discrete state has about the same probability of occurence. If you use a base-2 log, then the entropy represents the number of bits of information which can sent by a signal consisting of one state. As an example, if you have 4 possible states of a signal, all with equal probability, then the entropy isH = -4*(1/4*log2(1/4)) = -4*(1/4*(-2)) = 2
bits. So sending one of the states (perhaps the one of letters A-D) would transmit 2 bits of information. Sleigh et.al. compare five measures of entropy used to classify EEG.- Approximate Entropy
Pincus (1991) introduced Approximate Entropy as a complexity measure. Given groups of N points in a series, the approximate entropy is related to the probabilty that two sequences which are similar for N points remain similar at the next point. He used this technique to characterize three chaotic systems.
- EEG
Rezek and Roberts (1998) compare Fourier entropy (see below) and approximate entropy and conclude that they both can distunguish anesthesia depth (as judged by agent concentration). Deeper anesthesia means lower complexity. Approximate entropy may have been slightly more sensitive to changes than Fourier entropy. Bhattacharya (2000) used approximate entropy to characterize the EEG of pathological groups compared with healthy groups. The degree of linear complexity is significantly reduced for the in the seizure group for most of the electrodes, whereas a distinct discrimination between the maniac and healthy
groups based on these nonlinear measures was not evident. - Respiration
Burioka et.al. (2003) used approximate entropy to measure the complexity of EEG and respiratory motion diring different stages of sleep. They found that the entropy of the two signals was related and both decreased during deep sleep.
- EEG
- Sample entropy
A modification of the Approximate Entropy introduced by Richman and Moorman (2000). They claim that approximate entropy as defined by Pincus is biased by including self-matchs. They also simplify the calculation somewhat. There is a matlab function available to compute the sample entropy. Authors are DK Lake (dlake@virginia.edu), JR Moorman and Cao Hanqing.- Neonatal heart
Lake et.al. (2002) use sample entropy to predict neonatal sepsis from heart rate data. They suggest ways to evaluate the optimal setting for the parameters (length of sequence, error tolerance) used in sample entropy. They found that sepsis could be predicted by the sample entropy, but that much of the difference was related to spike-like outliers in the data.
- Neonatal heart
- Fourier entropy
To compute the Fourier entropy, a power-spectral-density of a time series (or portion thereof) is computed. The PSD is normalized to produce a probability-like distribution, and the entropy calculated as shown above. Fourier entropy seems to be the baseline against which other methods are shown to be better, but quite often is the easiest compute if the data sets are large. - Wavelet entropy
Rosso et.al. (2003) use wavelet transforms (as well as other measures) to characterize seizure states from EEG recordings. Jones et.al. (2002) used a 2D wavelet transform of 2D recordings of head surface potentials. The complexity measure was related to the number of wavelet components needed to fit the signal. - Renyi Entropy (Gonzalez, et.al. 2000)
This scheme computes the...
- Approximate Entropy
The problems with these techniques likely come from attempting to determine complexity from data that is a mixture of signals. For this to work, you need to first separate the data into
component signals, then identify and measure complexity aspects of each separately.
For clarity of explanation, consider a simplistic view of EKG as a wavelet (the beat) periodically
added to white noise. Define "span" as the difference between the maximum and minimum value in the data.
The entropy of the data is the log of the span, which is the average number of bits needed to
store each data point. (The number of bits needed to store the data in exponential form is twice the number of bits of the average (expressed as a number), plus twice the number of bits of the span (expressed as a number), plus twice the encoding kernel length (expressed as a number), plus the log of the span times the number of samples. In the limit as the number of samples approaches infinity, the bits from the constant terms become negligible in comparison to the number of data bits, so the total approaches the number of bits in the exponential encoding: the log of the span times the number of samples. The average is that number divided by the number of samples, which is the log of the span.)
If you write an algorithm that subtracts the wavelet at points in the data stream and inserts a
token which has the value of (reduced span plus 1) as an indicator of "a heartbeat starts here", you decrease the span by a lot and increase it by 1 to represent the token. You also add tokens to the data as if they are extra samples.
The new encoding has extra constant bits added at the beginning (the wavelet subtracting algorithm) and this extra does not depend on the number of samples, so will be negligible as the samples goes to infinity.
If the new encoding uses fewer bits than the original encoding, then the number of bits saved tells the likelihood that the original data was actually composed of wavelets plus noise. The probability that the original signal was *not* so composed is 1/(2**saved-bits), so if the algorithm saves 6 bits the probability that this occurred by random chance is 1-in-64, and so on. It's not uncommon to save several tens of bits, making the new encoding astronomically likely to be the correct encoding.
Suppose the example data stream is 100 samples per second, and an AGC circuit automatically
amplifies the signal to 10 bits of span (supposing, for example, a 12-bit A/D converter with feedback to amplify the beat to span 10 bits of signal). The base data storage estimate is then 10
bits per sample for 100 samples per second, or 1000 bits for each second of data.
Now suppose an algorithm subtracts beat wavelets from the data and inserts an extra data point whose value is "reduced span +1" where it finds a beat. If this reduces the span of the data from 1024 (10 bits) to 1/4 of that value, the new span is 256 and the data can be stored as 8 bits per sample, and every 100 samples (once a second-ish) an extra token is added to indicate the beat. In this case the data storage is slightly more than 8 bits per sample (the reduction to 1/4 of the signal, plus 1 to indicate the token) times 100 samples, plus just over 8 bits to indicate the token. (8.0056 * 100 samples + 8.0056 ⇒ about 808 bits for 1 second). The algorithm saves almost 200 bits of storage per second of data, which is a very good indicator that the signal is real.
Once you have separated the data into component signals (noise and wavelet heartbeat in the simple example), you can take complexity measurements of the individual signals separately.
To continue the example, if the remaining data contains no other signals, then the entropy of the
noise is the log of the span (Shannon entropy). You can measure differences in timing between
individual beat tokens and might discover that the "jitter" between heartbeats is a normal distribution, which has an average and a variance. These numbers (average and variance) are
parameters of the entropy formula for a normal distribution, and are the results you want.
The complexity of your original data is therefore a mash-up of complexities derived from the
component signals in different dimensions of measurement. In the example given, the noise has
complexity of value, while the jitter has complexity in time.
For your application, you want to first separate the data into component signals, then determine
the complexity type and parameters for each.