Signals

Module for Generating Industrial Grade Synthetic Signals

Industrial time series (i.e., sensor data from facilities) are commonly sampled at irregular intervals (non-uniform time stamps), contain data gaps, noise of different characteristics, and many other data quality flaws. The objective of this module is to offer multiple type of synthetic signals and methods to introduce data quality features similar to those observed in real industrial time series.

Line time series

indsl.signals.generator.line(start_date: Timestamp | None = None, end_date: Timestamp | None = None, sample_freq: Timedelta = Timedelta('0 days 00:01:00'), slope: float = 0, intercept: float = 0) → Series

Line.

Generate a synthetic time series using the line equation. If no end and/or start dates are given, the default signal duration is set to 1 day. If no dates are provided, the end date is set to the current date and time.

Parameters:

start_date – Start date. The start date of the time series entered as a string, for example: “1975-05-09 20:09:10”, or “1975-05-09”.
end_date – End date. The end date of the time series entered as a string, for example: “1975-05-09 20:09:10”, or “1975-05-09”.
sample_freq –
Frequency. Sampling frequency as a time delta, value and time unit. Defaults to ‘1 minute’. Valid time units are:
- ‘W’, ‘D’, ‘T’, ‘S’, ‘L’, ‘U’, or ‘N’
- ‘days’ or ‘day’
- ‘hours’, ‘hour’, ‘hr’, or ‘h’
- ‘minutes’, ‘minute’, ‘min’, or ‘m’
- ‘seconds’, ‘second’, or ‘sec’
- ‘milliseconds’, ‘millisecond’, ‘millis’, or ‘milli’
- ‘microseconds’, ‘microsecond’, ‘micros’, or ‘micro’
- ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.
slope – Slope. Line slope. Defaults to 0 (horizontal line).
intercept – Intercept. Y-intercept. Defaults to 0.

Returns:

Time series: Synthetic time series for a line

Return type:

pandas.Series

Constant value time series

indsl.signals.generator.const_value(value: float = 0, timedelta: Timedelta = Timedelta('7 days 00:00:00')) → Series

Constant value.

This function generates a horizontal line. The assumptions when generating the horizontal line are that the start date is set as “1970-01-01”, the end date is set as “now”, and the sampling is “1 week”. If the number of data points generated exceeds 100000, then the start date is moved forward, such that the number of data points generated is not greater than 100000 with “1 week” sampling resolution.

Parameters:

value – Value. value. Defaults to 0.
timedelta –
Granularity. Sampling frequency as a time delta, value, and time unit. Defaults to one week (‘1 W’). Valid time units are:
- ‘W’, ‘D’, ‘T’, ‘S’, ‘L’, ‘U’, or ‘N’
- ‘days’ or ‘day’
- ‘hours’, ‘hour’, ‘hr’, or ‘h’
- ‘minutes’, ‘minute’, ‘min’, or ‘m’
- ‘seconds’, ‘second’, or ‘sec’
- ‘milliseconds’, ‘millisecond’, ‘millis’, or ‘milli’
- ‘microseconds’, ‘microsecond’, ‘micros’, or ‘micro’
- ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.

Returns:

Time series.: Synthetic time series for a line

Return type:

pandas.Series

Sine wave

indsl.signals.generator.sine_wave(start_date: Timestamp | None = None, end_date: Timestamp | None = None, sample_freq: Timedelta = Timedelta('0 days 00:00:01'), wave_period: Timedelta = Timedelta('0 days 01:00:00'), wave_mean: float = 0, wave_amplitude: float = 1, wave_phase: float = 0) → Series

Sine wave.

Generate a time series for a sine wave with a given wave period, amplitude, phase, and mean value. If no end and/or start dates are given, the default signal duration is set to 1 day. If no dates are provided, the end date is set to the current date and time.

Parameters:

start_date – Start date Date-time string when the time series starts. The date must be a string, for example: “1975-05-09 20:09:10”.
end_date – End date Date-time string when the time series ends. The date must be a string, for example: “1975-05-09 20:09:10”.
sample_freq –
Frequency. Sampling frequency as a time delta, value, and time unit. Defaults to ‘1 minute’. Valid time units are:
- ‘W’, ‘D’, ‘T’, ‘S’, ‘L’, ‘U’, or ‘N’
- ‘days’ or ‘day’
- ‘hours’, ‘hour’, ‘hr’, or ‘h’
- ‘minutes’, ‘minute’, ‘min’, or ‘m’
- ‘seconds’, ‘second’, or ‘sec’
- ‘milliseconds’, ‘millisecond’, ‘millis’, or ‘milli’
- ‘microseconds’, ‘microsecond’, ‘micros’, or ‘micro’
- ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.
wave_period –
Wave period. The time it takes for two successive crests (one wavelength) to pass a specified point. For example, defining a wave period of \(10 min\) will generate one full wave every 10 minutes. The period can not be 0. If no value is provided, it is 1 minute. Valid time units are:
- ‘W’, ‘D’, ‘T’, ‘S’, ‘L’, ‘U’, or ‘N’
- ‘days’ or ‘day’
- ‘hours’, ‘hour’, ‘hr’, or ‘h’
- ‘minutes’, ‘minute’, ‘min’, or ‘m’
- ‘seconds’, ‘second’, or ‘sec’
- ‘milliseconds’, ‘millisecond’, ‘millis’, or ‘milli’
- ‘microseconds’, ‘microsecond’, ‘micros’, or ‘micro’
- ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.
wave_mean – Mean. The wave’s mean value. Defaults to 0.
wave_amplitude – Peak amplitude. Maximum absolute deviation from the mean. Defaults to 1.
wave_phase – Phase. Specifies (in radians) where in its cycle the oscillation is at time = 0. When the phase is non-zero, the wave is shifted in time. A negative value represents a delay, and a positive value represents an advance. Defualts to 0.

Returns:

Sine wave

Return type:

pandas.Series

Perturb the index of a time series

indsl.signals.generator.perturb_timestamp(data: Series, magnitude: float = 1) → Series

Perturb timestamp.

Perturb the date-time index (timestamp) of the original time series using a normal (Gaussian) distribution with a mean of zero and a given standard deviation (magnitude) in seconds.

Parameters:

data – Time series
magnitude – Magnitude. Time delta perturbation magnitude in seconds. Has to be large than 0. Defaults to 1.

Returns:

Time series: Original signal with a non-uniform time stamp.

Return type:

pandas.Series

Raises:

UserTypeError – Only time series with a DateTimeIndex are supported
UserTypeError – If “magnitude” is not a float
UserValueError – If “magnitude” is not larger than 0

Create data gaps in a time series

indsl.signals.generator.insert_data_gaps(data: Series, fraction: float = 0.25, num_gaps: int | None = None, data_buffer: int = 5, method: Literal['Random', 'Single', 'Multiple'] = 'Random') → Series

Insert data gaps.

Method to synthetically remove data, i.e., generate data gaps in a time series. The amount of data points removed is defined by the given ‘fraction’ relative to the original time series.

Parameters:

data – Time series
fraction – Remove fraction. Fraction of data points to remove relative to the original time series. Must be a number higher than 0 and lower than 1 (0 < keep < 1). Defaults to 0.25.
num_gaps – Number of gaps. Number of gaps to generate. Only needs to be provided when using the “Multiple” gaps method.
data_buffer – Buffer. Minimum of data points to keep between data gaps and at the start and end of the time series. If the buffer of data points is higher than 1% of the number of data points in the time series, the end and start buffer is set to 1% of the total available data points.
method –
Method This function offers multiple methods to generate data gaps:
- Random: Removes data points at random locations so that the output time series size is a given fraction (‘Remove fraction’) of the original time series. The first and last data points are never deleted. No buffer is set between gaps, only for the start and end of the time series. If the buffer of data points is higher than 1% of the number of data points in the time series, the end and start buffer is set to 1% of the total available data points.
- Single: Remove consecutive data points at a single location. Buffer data points at the start and end of the time series is kept to prevent removing the start and end of the time series. The buffer is set to the maximum value between 5 data points or 1% of the data points in the signal.
- Multiple: Insert multiple non-overlapping data gaps at random dates and of random sizes such that the given fraction of data is removed. If the number of gaps is not defined or is less than 2, the function defaults to 2 gaps. To avoid gap overlapping, a minimum of 5 data points are imposed at the signal’s start and end and between gaps.

Returns:

Output: Original time series with synthetically generated data gap(s).

Return type:

pandas.Series

Raises:

UserTypeError – data is not a time series
UserTypeError – fraction is not a number
UserTypeError – fraction is not a number

Noise Generators

White noise

indsl.signals.noise.white_noise(data: Series, snr_db: float = 30, seed: int | None = None) → Series

Add white noise.

Adds white noise to the original data using a given signal-to-noise ratio (SNR).

Parameters:

data – Time series
snr_db – SNR. Signal-to-noise ratio (SNR) in decibels. SNR is a comparison of the level of a signal to the level of background noise. SNR is defined as the ratio of signal power to noise power. A ratio higher than 1 indicates more signal than noise. Defaults to 30.
seed – Seed. A seed (integer number) to initialize the random number generator. If left empty, then a fresh, unpredictable value will be generated. If a value is entered, the exact random noise will be generated if the time series data and date range are not changed.

Returns:

Output: Original data plus white noise.

Return type:

pandas.Series

Brownian noise

indsl.signals.generator.wave_with_brownian_noise(duration: int = 14400, resolution: float = 0.5, percentage: float = 100, amplitude: float = 10, mean: float = 200, frequency: float = 0.04, noise: List[int] = [1, 1])

Wave with brownian noise.

Sinusoidal signal with brownian noise. The signal has a given duration of 4 hours as a default, a resolution of 0.5, an amplitude of 10, a mean of 200 and a frequency of 0.04 Hz.

Parameters:

duration – Duration. Duration of the time series in seconds. Defaults to 14400.
resolution – Resolution. Frequency resolution. Defaults to 0.5.
percentage – Percentage. Percentage of the time series to keep. Defaults to 100.
amplitude – Amplitude. Amplitude of the wave. Defaults to 10.
mean – Mean. Mean of the wave. Defaults to 200.
frequency – Frequency. Frequency of the wave. Defaults to 0.04 Hz.
noise – Noise. Noise of the wave. Defaults to [1, 1].

Returns:

Sine wave with brownian noise.

Return type:

pd.Series

Polynomial Generators

Univariate Polynomial

indsl.signals.polynomial.univariate_polynomial(signal: Series, coefficients: List[float] = [0.0, 1.0]) → Series

Univariate polynomial.

Creates a univariate polynomial \(y\), of degree \(n\), from the time series \(x\), and a list of coefficients \(a_{n}\):

\[y = a_0 + a_1x + a_2x^2 + a_3x^3 + ... + a_nx^n\]

Parameters:

signal (pandas.Series) – Time series
coefficients (List[float]) – Coefficients List of coefficients separated by commas. The numbers must be entered separated by commas (e.g., 0, 1). The default is \(0.0, 1.0\), which returns the original time series.

Returns:

Output

Return type:

pd.Series

Sequence interpolation

1D interpolation of a sequence

indsl.signals.sequence_interpolation.sequence_interpolation_1d(signal: Series, x_values: List[float] = [0.0, 1.0], y_values: List[float] = [0.0, 1.0]) → Series

1D interpolation of a sequence.

The input time serie is interpolated to the input sequence to create the return timeseries. The x_values represent the input timeseries and the y_values represent the output timeseries. The interpolation routine is a simple linear interpolation. If the input series is outside the interpolation range, the return value is extrapolated.

Parameters:

signal (pandas.Series) – Time series
x_values (List[float]) – The x-values List of coefficients separated by commas. The numbers must be entered separated by commas (e.g., 0, 1). The default is \(0.0, 1.0\), which returns the original time series. The number of parameters must match the y_values.
y_values (List[float]) – The y-values List of coefficients separated by commas. The numbers must be entered separated by commas (e.g., 0, 1). The default is \(0.0, 1.0\), which returns the original time series. The number of parameters must match the x_values.

Returns:

Output.

Return type:

pd.Series

2D interpolation of a sequence

indsl.signals.sequence_interpolation.sequence_interpolation_2d(signal_x: Series, signal_y: Series, interp_x: List[float] = [0.0, 1.0], interp_y: List[float] = [0.0, 1.0], interp_z: List[float] = [0.0, 1.0], align_timesteps: bool = False) → Series

2D interpolation of a sequence.

The input time series is interpolated to the input sequence to create the return timeseries. The x_values and y_values represent the input timeseries and the z_values represent the output timeseries. The interpolation routine is a simple linear interpolation. If the input point is outside the convec hull of the interpolation region the nearest point is returned.

Parameters:

signal_x (pandas.Series) – Time series x-value
signal_y (pandas.Series) – Time series y-value
interp_x (List[float]) – The x-values List of coefficients separated by commas. The numbers must be entered separated by commas (e.g., 0, 1). The default is \(0.0, 1.0\), which returns the original time series. The number of parameters must match the y- and z-values.
interp_y (List[float]) – The y-values List of coefficients separated by commas. The numbers must be entered separated by commas (e.g., 0, 1). The default is \(0.0, 1.0\), which returns the original time series. The number of parameters must match the x- and z-values.
interp_z (List[float]) – The z-values List of coefficients separated by commas. The numbers must be entered separated by commas (e.g., 0, 1). The default is \(0.0, 1.0\), which returns the original time series. The number of parameters must match the x- and y-values.
align_timesteps – Auto-align. Automatically align time stamp of input time series. Default is False.

Returns:

Output.

Return type:

pd.Series