Detect

Drift

indsl.detect.drift(data: Series, long_interval: Timedelta = Timedelta('3 days 00:00:00'), short_interval: Timedelta = Timedelta('0 days 04:00:00'), std_threshold: float = 3, detect: Literal['decrease', 'increase', 'both'] = 'both') Series

Drift.

This function detects data drift (deviation) by comparing two rolling averages, short and long interval, of the signal. The deviation between the short and long term average is considered significant if it is above a given threshold multiplied by the rolling standard deviation of the long term average.

Parameters:
  • data – Time series.

  • long_interval – Long length. Length of long term time interval.

  • short_interval – Short length. Length of short term time interval.

  • std_threshold – Threshold. Parameter that determines if the signal has changed significantly enough to be considered drift. The threshold is multiplied by the long term rolling standard deviation to take into account the recent condition of the signal.

  • detect – Type. Parameter to determine if the model should detect significant decreases, increases or both. Options are: “decrease”, “increase”, or “both”. Defaults to “both”.

Returns:

Boolean time series.

Drift = 1, No drift = 0.

Return type:

pandas.Series

Oscillations

indsl.detect.oscillation_detector.oscillation_detector(data: Series, order: int = 4, threshold: float = 0.2) Series

Oscillations.

This function identifies if a signal contains one or more oscillatory components. It is based on the paper by Sharma et al. [1]. The method uses Linear Predictive Coding (LPC) and is implemented as a 3 step process:

  1. Estimate the LPC coefficients from the prediction polynomial. These are used to estimate a fit to the data.

  2. Estimate the roots of the LPC coefficients.

  3. Estimate the distance of each root to the unit circle in the complex plane.

If the distance of any root is close to the unit circle (less than 0.2) the signal is considered to have an oscillatory component

Parameters:
  • data – Time series

  • order – Polynomial order. Order of the prediction polynomial. Defaults to 4.

  • threshold – Threshold. Maximum distance of a root to the unit circle for which the signal is considered to have an oscillatory component. Defaults to 0.2

Returns:

Oscillation region.

Regions where oscillations were detected. Ocillations detected =1, no detection =0.

Return type:

pd.Series

Warning

Large variations in sampling time may affect the proficiency of the algorithm. The algorithm works best on time series with uniform sampling frequency. If non-uniformly sampled, you can use a resampling method to fill the missing data.

Raises:

RuntimeError – Length of interpolated data does not match predicted data.

References

Change Point detector: ED-PELT

indsl.detect.cpd_ed_pelt(data: Series, min_distance: int = 1) Series

Change Point Detection.

This function detects change points in a time series. The time series is split into “statistically homogeneous” segments using the ED Pelt change point detection algorithm while observing the minimum distance argument.

Parameters:
  • data – Time series

  • min_distance – Minimum distance. Specifies the minimum point wise distance for each segment that will be considered in the Change Point Detection algorithm.

Returns:

Time series. Binary time series.

Return type:

pandas.Series

Change Point detector: CUSUM

indsl.detect.cusum(data: Series, threshold: float | None = None, drift: float | None = None, detect: Literal['both', 'increase', 'decrease'] = 'both', predict_ending: bool = True, alpha: float = 0.05, return_series_type: Literal['cusum_binary_result', 'mean_data', 'positive_cumulative_sum', 'negative_cumulative_sum'] = 'cusum_binary_result') Series

Cumulative sum (CUSUM).

This technique calculates the cumulative sum of positive and negative changes (g+t and g−t) in the data (x) and compares them to a threshold. When this threshold is exceeded, a change is detected (ttalarm), and the cumulative sum restarts from zero. To avoid the detection of a change in absence of an actual change or a slow drift, this algorithm also depends on a parameter drift for drift correction. Remove extreme standalone outliers before using this technique to get a better result.

Typical uses of this function:

  1. Set the type of series to return to “mean_data” to visualize the smoothed data. Leave the rest of the parameters to their default values.

  2. Adjust the alpha parameter to get the desired smoothing for the data.

  3. Set the type of series to return to “positive_cumulative_sum” or “negative_cumulative_sum” to visualize the cumulative sum of the positive or negative changes.

  4. Adjust the threshold and drift accordingly to get the desired number of change points.

  5. Set the type of series to return to “cusum_binary_result” to visualize the detected changes.

Parameters:
  • data – Time series.

  • threshold – Amplitude threshold. Cumulative changes are compared to this threshold. Defaults to None. When this is exceeded a change is detected and the cumulative sum restarts from 0. If the threshold is not provided, it is assigned to 5 * standard_deviation of the data.

  • drift – Drift term. Prevents any change in the absence of change. Defaults to None. If fewer false alarms are wanted, try to increase drift. If the threshold is not provided, it is assigned to (2 * data_standard_deviation - data_mean) / 2.

  • detect – Type of changes to detect. Options are: * “both” for detecting both increasing and decreasing changes in the data (default) * “increase” for detecting increasing changes in the data * “decrease” for detecting decreasing changes in the data

  • predict_ending – Predict end point. Prolongs the change until the predicted end point. Defaults to True. If false, single change points are detected.

  • alpha – Smoothing factor. Value between 0 < alpha <= 1. Defaults to 0.05.

  • return_series_type

    Type of series to return. Defaults to “cusum_binary_result”. This option allows the user to visualize the intermediate steps of the algorithm. Options are:

    • ”cusum_binary_result” returns the cusum results as a binary time series. Change detected = 1, No change detected = 0.

    • ”mean_data” returns the smoothed data.

    • ”positive_cumulative_sum” returns the positive cumulative sum.

    • ”negative_cumulative_sum” returns the negative cumulative sum.

Returns:

Time series.

Specified in the return_series_type parameter.

Return type:

pd.Series

Raises:
  • UserTypeError – If a time series with the wrong index is provided.

  • UserValueError – If an empty time series is passed into the function.

References

https://nbviewer.org/github/demotu/detecta/blob/master/docs/detect_cusum.ipynb

Steady State detector: change point

indsl.detect.ssd_cpd(data: Series, min_distance: int = 15, var_threshold: float = 2.0, slope_threshold: float = -3.0) Series

Steady State Detection (CPD).

Detect steady state periods in a time series based on a change point detection algorithm. The time series is split into “statistically homogeneous” segments using the ED Pelt change point detection algorithm. Then each segment is tested with regard to a normalized standard deviation and the slope of the line of best fit to determine if the segment can be considered a steady or transient region.

Parameters:
  • data – Time series.

  • min_distance – Minimum distance. Specifies the minimum point-wise distance for each segment that will be considered in the Change Point Detection algorithm.

  • var_threshold – Variance threshold. Specifies the variance threshold. If the normalized variance calculated for a given segment is greater than the threshold, the segment will be labeled as transient (value = 0).

  • slope_threshold – Slope threshold. Specifies the slope threshold. If the slope of a line fitted to the data of a given segment is greater than 10 to the power of the threshold value, the segment will be labeled as transient (value = 0).

Returns:

Binary time series. Steady state = 1, Transient = 0.

Return type:

pandas.Series

Steady State detector: variance filter

indsl.detect.ssid(data: Series, ratio_lim: float = 2.5, alpha1: float = 0.2, alpha2: float = 0.1, alpha3: float = 0.1) Series

Steady state (variance).

The steady state detector is based on the ration of two variances estimated from the same signal [2] . The algorithm first filters the data using the factor “Alpha 1” and calculates two variances (long and short term) based on the parameters “Alpa 2” and “Alpha 3”. The first variance is an exponentially weighted moving variance based on the difference between the data and the average. The second is also an exponentially weighted moving “variance” but based on sequential data differences. Larger Alpha values imply that fewer data are involved in the analysis, which has the benefit of reducing the time for the identifier to detect a process change (average run length, ARL) but has an undesired impact of increasing the variability on the results, broadening the distribution and confounding interpretation. Lower λ values undesirably increase the average run length to detection but increase precision (minimizing Type-I and Type-II statistical errors) by reducing the variability of the distributions and increasing the signal-to-noise ratio of a TS to SS situation.

Parameters:
  • data – Time series.

  • ratio_lim – Threshold. Specifies the variance ratio threshold if it is in steady state or not. A variance ratio greater than the threshold labels the state as transient.

  • alpha1 – Alpha 1. Filter factor for the mean. Value should be between 0 and 1. Recommended value is 0.2. Defaults to 0.2.

  • alpha2 – Alpha 2. Filter factor for variance 1. Value should be between 0 and 1. Recommended value is 0.1. Defaults to 0.1.

  • alpha3 – Alpha 3. Filter factor for variance 2. Value should be between 0 and 1. Recommended value is 0.1. Defaults to 0.1.

Returns:

Binary time series. Steady state = 0, transient = 1.

Return type:

pandas.Series

References

Steady State detector: variable moving average

indsl.detect.vma(series: Series, window_length: int = 10) Series

Steady state (vma).

This moving average is designed to become flat (constant value) when the data within the lookup window does not vary significantly. It can also be state detector. The calculation is based on the variability of the signal in a lookup window.

Parameters:
  • series – Time series.

  • window_length – Lookup window. Window length in data points used to estimate the variability of the signal.

Returns:

Moving average. If the result has the same value as the previous moving average result, the signal can be considered to be on steady state.

Return type:

pandas.Series

Unchanged signal identification

indsl.detect.unchanged_signal_detector.unchanged_signal_detector(data: Series, duration: Timedelta = Timedelta('0 days 01:00:00'), min_nr_data_points: int = 3) Series

Unchanged signal detection.

Detect periods of time when the data stays at a constant value for longer than a given time window.

Parameters:
  • data – Time series.

  • duration – Time window. Length of the time period to check for unchanged time series values. Defaults to ‘minutes=60’.

  • min_nr_data_points – Data points. The least number of data points to avoid alerting on missing data. Defaults to 3.

Returns:

Time series.

The returned time series is an indicator function that is 1 where the time series value has remained unchanged, 0 if it has changed.

Return type:

pd.Series

Raises: