Detect

Drift

indsl.detect.drift(data: Series, long_interval: str = '3d', short_interval: str = '4h', std_threshold: float = 3, detect: str = 'both')

Drift

Detects data drift (deviation) by comparing two rolling averages, short and long interval, of the signal. The deviation between the short and long term average is considered significant if it is above a given threshold multiplied by the rolling standard deviation of the long term average.

Parameters
  • data – Time series.

  • long_interval – Long length. Length of long term time interval.

  • short_interval – Short length. Length of short term time interval.

  • std_threshold – Threshold. Parameter that determines the signal has changed significantly enough to be considered drift. The threshold is multiplied by the long term rolling standard deviation to take into account the recent condition of the signal.

  • detect – Type. Parameter to determine if the model should detect significant decreases, increases or both. Options are: “decrease”, “increase” or “both”. Defaults to “both”

Returns

Boolean time series. Drift = 1, No drift = 0.

Return type

pandas.Series

Oscillations

indsl.detect.oscillation_detector.oscillation_detector(data: Series, order: int = 4, threshold: float = 0.2) Dict

Oscillations

Identifies if a signal contains one or more oscillatory components. Based on the paper by Sharma et. al. 1.

The method uses Linear Predictive Coding (LPC) and is implemented as a 3 step process:

  1. Estimate the LPC coefficients from the prediction polynomial. These are used to estimate a fit to the data

  2. Estimate the roots of the LPC coefficients

  3. Estimate the distance of each root to the unit circle in the complex plane

If the distance of any root is close to the unit circle (less than 0.2) the signal is considered to have an oscillatory component

Parameters
  • data – Time series

  • order – Polynomial order Order of the prediction polynomial. Defaults to 4.

  • threshold – Threshold Maximum distance of a root to the unit circle for which the signal is considered to have an oscillatory component. Defaults to 0.2

Returns

Dictionary with the following keys and values:

{
    "roots": np.ndarray,
    "distances": np.ndarray,
    "PSD": dict:  {"f": np.ndarray, "Pxx": np.ndarray},
    "fit": dict: {"time": np.ndarray, "data": np.ndarray},
    "oscillations": bool,
    "peaks": dict: {"f": np.ndarray, "amplitude": np.ndarray,}
}

Return dictionary elaboration:

  • roots -> roots of the predicted LPC coefficients

  • distances -> distance of each root to the unit circle

  • PSD -> Power spectral density, frequency and power vector

  • fit -> fitted data using the LPC prediction polynomial

  • oscillations -> (1) Oscillation detected, (0) no oscillatory component detected

  • peaks -> Peak frequencies and corresponding amplitudes in original data.

Return type

dict

Warning

Large variations in sampling time may affect the proficiency of the algorithm. Works best on time series with uniform sampling frequency. If non-uniformly sampled, we suggest using a resampling method to fill the missing data.

Raises

RuntimeError – Length of interpolated data does not match predicted data.

References

1

Sharma et al. “Automatic signal detection and quantification in process control loops using linear predictive coding.” Eng. Sc. & Tech. an Intnl. Journal 2020.

Change Point detector: ED-PELT

indsl.detect.cpd_ed_pelt(data: Series, min_distance: int = 1) Series

Change Point Detection

Detect change points in a time series. The time series is split into “statistically homogeneous” segments using the ED Pelt change point detection algorithm while observing the minimum distance argument.

Parameters
  • data – Time series

  • min_distance – Minimum distance Specifies the minimum point wise distance for each segment that will be considered in the Change Point Detection algorithm.

Returns

Time series Binary time series.

Return type

pandas.Series

Change Point detector: CUSUM

indsl.detect.cusum(data: Series, threshold: Optional[float] = None, drift: Optional[float] = None, detect: Literal['both', 'increase', 'decrease'] = 'both', predict_ending: bool = True, alpha: float = 0.05, return_series_type: Literal['cusum_binary_result', 'mean_data', 'positive_cumulative_sum', 'negative_cumulative_sum'] = 'cusum_binary_result') Series

Cumulative sum (CUSUM)

This technique calculates the cumulative sum of positive and negative changes (g+t and g−t) in the data (x) and compares them to a threshold. When this threshold is exceeded a change is detected (ttalarm) and the cumulative sum restarts from zero. To avoid the detection of a change in absence of an actual change or a slow drift, this algorithm also depends on a parameter drift for drift correction. To get better results, it is recommended to remove extreme standalone outliers before using this technique.

Typical procedure to use this function:

  1. Set the type of series to return to “mean_data” to visualize the smoothed data. Leave the rest of the parameters to their default values.

  2. Adjust the alpha parameter to get the desired smoothing for the data.

  3. Set the type of series to return to “positive_cumulative_sum” or “negative_cumulative_sum” to visualize the cumulative sum of the positive or negative changes.

  4. Adjust the threshold and drift accordingly to get the desired number of change points.

  5. Set the type of series to return to “cusum_binary_result” to visualize the detected changes.

Parameters
  • data – Time series.

  • threshold – Amplitude threshold. Cumulative changes are compared to this threshold. Defaults to None. When this is exceeded a change is detected and the cumulative sum restarts from 0. If the threshold is not provided, it is assigned to 5 * standard_deviation of the data.

  • drift – Drift term. Prevents any change in the absence of change. Defaults to None. If fewer false alarms are wanted, try to increase drift. If the threshold is not provided, it is assigned to (2 * data_standard_deviation - data_mean) / 2.

  • detect

    Type of changes to detect. Options are:

    • ”both” for detecting both increasing and decreasing changes in the data (default)

    • ”increase” for detecting increasing changes in the data

    • ”decrease” for detecting decreasing changes in the data

  • predict_ending – Predict end point. Prolongs the change until the predicted end point. Defaults to True. If false, single change points are detected.

  • alpha – Smoothing factor. Value between 0 < alpha <= 1. Defaults to 0.05.

  • return_series_type

    Type of series to return. Defaults to “cusum_binary_result”. This option allows the user to visualize the intermediate steps of the algorithm. Options are:

    • ”cusum_binary_result” returns the cusum results as a binary time series. Change detected = 1, No change detected = 0.

    • ”mean_data” returns the smoothed data.

    • ”positive_cumulative_sum” returns the positive cumulative sum.

    • ”negative_cumulative_sum” returns the negative cumulative sum.

Returns

Time series.

Specified in the return_series_type parameter.

Return type

pd.Series

Raises
  • UserTypeError – If a time series with the wrong index is provided.

  • UserValueError – If an empty time series is passed into the function.

References

https://nbviewer.org/github/demotu/detecta/blob/master/docs/detect_cusum.ipynb

Steady State detector: change point

indsl.detect.ssd_cpd(data: Series, min_distance: int = 15, var_threshold: float = 2.0, slope_threshold: float = - 3.0) Series

Steady State Detection (CPD)

Detect steady state periods in a time series based on a change point detection algorithm. The time series is split into “statistically homogeneous” segments using the ED Pelt change point detection algorithm. Then each segment is tested with regards to a normalized standard deviation and the slope of the line of best fit to determine if the segment can be considered a steady or transient region.

Parameters
  • data – Time series.

  • min_distance – Minimum distance. Specifies the minimum point wise distance for each segment that will be considered in the Change Point Detection algorithm.

  • var_threshold – Variance threshold. Specifies the variance threshold. If the normalized variance calculated for a given segment is greater than the threshold, the segment will be labeled as transient (value = 0).

  • slope_threshold – Slope threshold. Specifies the slope threshold. If the slope of a line fitted to the data of a given segment is greater than 10 to the power of the threshold value, the segment will be labeled as transient (value = 0).

Returns

Binary time series. Steady state = 1, Transient = 0.

Return type

pandas.Series

Steady State detector: variance filter

indsl.detect.ssid(data: Series, ratio_lim: float = 2.5, alpha1: float = 0.2, alpha2: float = 0.1, alpha3: float = 0.1)

Steady state (variance)

Steady state detector based on the ration of two variances estimated from the same signal 2 . The algorithm first filters the data using the factor “Alpha 1” and calculates two variances (long and short term) based on the parameters “Alpa 2” and “Alpha 3”. The first variance is an exponentially weighted moving variance based on the difference between the data and the average. The second is also an exponentially weighted moving “variance” but based on sequential data differences. Larger Alpha values imply that fewer data are involved in the analysis, which has a benefit of reducing the time for the identifier to detect a process change (average run length, ARL) but has a undesired impact of increasing the variability on the results, broadening the distribution and confounding interpretation. Lower λ values undesirably increase the average run length to detection, but increase precision (minimizing Type-I and Type-II statistical errors) by reducing the variability of the distributions increasing the signal-to-noise ratio of a TS to SS situation.

Parameters
  • data – Time series.

  • ratio_lim – Threshold. Specifies the variance ratio threshold if it is in steady state or not. A variance ratio greater than the threshold labels the state as transient.

  • alpha1 – Alpha 1. Filter factor for the mean. Value should be between 0 and 1. Recommended value is 0.2. Defaults to 0.2.

  • alpha2 – Alpha 2. Filter factor for variance 1. Value should be between 0 and 1. Recommended value is 0.1. Defaults to 0.1.

  • alpha3 – Alpha 3. Filter factor for variance 2. Value should be between 0 and 1. Recommended value is 0.1. Defaults to 0.1.

Returns

Binary time series. Steady state = 0, transient = 1.

Return type

pandas.Series

References

2

Rhinehart, R. Russell. (2013). Automated steady and transient state identification in noisy processes. Proceedings of the American Control Conference. 4477-4493. 10.1109/ACC.2013.6580530

Steady State detector: variable moving average

indsl.detect.vma(series: Series, window_length: int = 10) Series

Steady state (vma)

This moving average is designed to become flat (constant value) when the data within the lookup window does not vary significantly. It can also be state detector. The calculation is based on the variability of the signal in a lookup window.

Parameters
  • series – Time series.

  • window_length – Lookup window. Window length in data points used to estimate the variability of the signal.

Returns

Moving average. If the results has the same value as the previous moving average result, the signal can be considered to be on steady state.

Return type

pandas.Series

Unchanged signal identification

indsl.detect.unchanged_signal_detector.unchanged_signal_detector(data: Series, duration=Timedelta('0 days 01:00:00'), min_nr_data_points: int = 3) Series

Unchanged signal detection

Detect periods of time when the data stays at a constant value for longer than a given time window. duration threshold.

Parameters
  • data – Time series.

  • duration – Time window. Length of the time period to check for unchanged time series values. Defaults to ‘minutes=60’.

  • min_nr_data_points – Data points. The least number of data points to avoid alerting on missing data. Defaults to 3.

Returns

Time series.

The returned time series is an indicator function that is 1 where the time series value has remained unchanged, 0 if it has changed.

Return type

pd.Series

Raises
  • UserTypeError – data is not a time series

  • UserValueError – data is empty