Signals

Module for Generating Industrial Grade Synthetic Signals

Industrial time series (i.e., sensor data from facilities) are commonly sampled at irregular intervals (non-uniform time stamps), contain data gaps, noise of different characteristics, and many other data quality flaws. The objective of this module is to offer multiple type of synthetic signals and methods to introduce data quality features similar to those observed in real industrial time series.

Line time series

indsl.signals.generator.line(start_date: Optional[Timestamp] = None, end_date: Optional[Timestamp] = None, sample_freq: Timedelta = Timedelta('0 days 00:01:00'), slope: float = 0, intercept: float = 0) Series

Line

Generate a synthetic time series using the line equation. If no end and/or start dates are given, the default signal duration is set to 1 day. If no dates are provided, the end date is set to the current date and time.

Parameters
  • start_date – Start date. The start date of the time series entered as a string, for example: “1975-05-09 20:09:10”, or “1975-05-09”.

  • end_date – End date. The end date of the time series entered as a string, for example: “1975-05-09 20:09:10”, or “1975-05-09”.

  • sample_freq

    Frequency Sampling frequency as a time delta, value and time unit. Defaults to ‘1 minute’. Valid time units are:

    • ‘W’, ‘D’, ‘T’, ‘S’, ‘L’, ‘U’, or ‘N’

    • ‘days’ or ‘day’

    • ‘hours’, ‘hour’, ‘hr’, or ‘h’

    • ‘minutes’, ‘minute’, ‘min’, or ‘m’

    • ‘seconds’, ‘second’, or ‘sec’

    • ‘milliseconds’, ‘millisecond’, ‘millis’, or ‘milli’

    • ‘microseconds’, ‘microsecond’, ‘micros’, or ‘micro’

    • ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.

  • slope – Slope Line slope. Defaults to 0 (horizontal line).

  • intercept – Intercept Y-intercept. Defaults to 0.

Returns

Time series

Synthetic time series for a line

Return type

pandas.Series

Constant value time series

indsl.signals.generator.const_value(value: float = 0) Series

Constant value

This function generates a horizontal line. The assumptions when generating the horizontal line is that the start date is set as “1970-01-01”, the end date is set as “now” and the sampling is “1 week”. If the number of data points generated exceeds 100000 then the start date is moved forward, such that the number of data points generated is not greater than 100000 with “1 week” sampling resolution.

Parameters

value – Value. value. Defaults to 0.

Returns

Time series.

Synthetic time series for a line

Return type

pandas.Series

Sine wave

indsl.signals.generator.sine_wave(start_date: Optional[Timestamp] = None, end_date: Optional[Timestamp] = None, sample_freq: Timedelta = Timedelta('0 days 00:00:01'), wave_period: Timedelta = Timedelta('0 days 01:00:00'), wave_mean: float = 0, wave_amplitude: float = 1, wave_phase: float = 0) Series

Sine wave

Generate a time series for a sine wave with a given wave period, amplitude, phase and mean value. If no end and/or start dates are given, the default signal duration is set to 1 day. If no dates are provided, the end date is set to the current date and time.

Parameters
  • start_date – Start date Date-time string when the time series starts. The date must be a string, for example: “1975-05-09 20:09:10”.

  • end_date – End date Date-time string when the time series starts. The date must be a string, for example: “1975-05-09 20:09:10”.

  • sample_freq

    Frequency Sampling frequency as a time delta, value and time unit. Defaults to ‘1 minute’. Valid time units are:

    • ‘W’, ‘D’, ‘T’, ‘S’, ‘L’, ‘U’, or ‘N’

    • ‘days’ or ‘day’

    • ‘hours’, ‘hour’, ‘hr’, or ‘h’

    • ‘minutes’, ‘minute’, ‘min’, or ‘m’

    • ‘seconds’, ‘second’, or ‘sec’

    • ‘milliseconds’, ‘millisecond’, ‘millis’, or ‘milli’

    • ‘microseconds’, ‘microsecond’, ‘micros’, or ‘micro’

    • ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.

  • wave_period

    Wave period The time it takes for two successive crests (one wavelength) to pass a specified point. For example, defining a wave period of \(10 min\) will generate one full wave every 10 minutes. The period can not be 0. If no value is provided, it 1 minute. Valid time units are:

    • ‘W’, ‘D’, ‘T’, ‘S’, ‘L’, ‘U’, or ‘N’

    • ‘days’ or ‘day’

    • ‘hours’, ‘hour’, ‘hr’, or ‘h’

    • ‘minutes’, ‘minute’, ‘min’, or ‘m’

    • ‘seconds’, ‘second’, or ‘sec’

    • ‘milliseconds’, ‘millisecond’, ‘millis’, or ‘milli’

    • ‘microseconds’, ‘microsecond’, ‘micros’, or ‘micro’

    • ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.

  • wave_mean – Mean The wave’s mean value. Defaults to 0.

  • wave_amplitude – Peak amplitude Maximum absolute deviation from the mean. Defaults to 1.

  • wave_phase – Phase Specifies (in radians) where in its cycle the oscillation is at time = 0. When the phase is non-zero, the wave is shifted in time. A negative value represents a delay, and a positive value represents an advance. Defualts to 0.

Returns

Sine wave

Return type

pandas.Series

Perturb the index of a time series

indsl.signals.generator.perturb_timestamp(data: Series, magnitude: float = 1) Series

Perturb timestamp

Perturb the date-time index (timestamp) of the original time series using a normal (Gaussian) distribution with a mean of zero and a given standard deviation (magnitude) in seconds.

Parameters
  • data – Time series

  • magnitude – Magnitude Time delta perturbation magnitude in seconds. If none is given, it is set to the inferred average sampling rate in seconds of the original signal.

Returns

Time series

Original signal with a non-uniform time stamp.

Return type

pandas.Series

Raises
  • UserTypeError – Only time series with a DateTimeIndex are supported

  • UserTypeError – If “magnitude” is not a float

Create data gaps in a time series

indsl.signals.generator.insert_data_gaps(data: Series, fraction: float = 0.25, num_gaps: Optional[int] = None, data_buffer: int = 5, method: Literal['Random', 'Single', 'Multiple'] = 'Random') Series

Insert data gaps

Method to synthetically remove data, i.e. generate data gaps in a time series. The amount of data points removed is defined by the given ‘fraction’ relative to the original time series.

Parameters
  • data – Time series

  • fraction – Remove fraction Fraction of data points to remove relative to the original time series. Must be a number higher than 0 and lower than 1 (0 < keep < 1). Defaults to 0.25.

  • num_gaps – Number of gaps Number of gaps to generate. Only needs to be provided when using the “Multiple” gaps method.

  • data_buffer – Buffer Minimum of data points to keep between data gaps and at the start and end of the time series. If the buffer of data points is higher than 1% of the number of data points in the time series, the end and start buffer is set to 1% of the total available data points.

  • method

    Method This function offers multiple methods to generate data gaps:

    • Random: Removes data points at random locations so that the output time series size is a given fraction (‘Remove fraction’) of the original time series. The first and last data points are never deleted. No buffer is set between gaps, only for the start and end of the time series. If the buffer of data points is higher than 1% of the number of data points in the time series, the end and start buffer is set to 1% of the total available data points.

    • Single: Remove consecutive data points at a single location. Buffer data points at the start and end of the time series is kept to prevent removing the start and end of the time series. The buffer is set to the maximum value between 5 data points or 1% of the data points in the signal.

    • Multiple: Insert multiple non-overlapping data gaps at random dates and of random sizes such that the given fraction of data is removed. If the number of gaps is not defined or is less than 2, the function defaults to 2 gaps. To avoid gap overlapping, a minimum of 5 data points is imposed at the start and end of the signal and between gaps.

Returns

Output

Original time series with synthetically generated data gap(s).

Return type

pandas.Series

Raises
  • UserTypeError – data is not a time series

  • UserTypeError – fraction is not a number

  • UserTypeError – fraction is not a number

Noise Generators

White noise

indsl.signals.noise.white_noise(data: Series, snr_db: float = 30, seed: Optional[int] = None) Series

Add white noise

Adds white noise to the original data using a given signal-to-noise ratio (SNR).

Parameters
  • data – Time series

  • snr_db – SNR Signal-to-noise ratio (SNR) in decibels. SNR is a comparison of the level of a signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power. A ratio higher than 1 indicates more signal than noise. Defaults to 30.

  • seed – Seed A seed (integer number) to initialize the random number generator. If left empty, then a fresh, unpredictable values will be generated. If a value is entered the exact random noise will be generated if the time series data and date range is not changed.

Returns

Output

Original data plus white noise.

Return type

pandas.Series

Examples with noise generators:

Polynomial Generators

Univariate Polynomial

indsl.signals.polynomial.univariate_polynomial(signal: Series, coefficients: List[float] = [0.0, 1.0]) Series

Univariate polynomial

Creates a univariate polynomial \(y\), of degree \(n\), from the time series \(x\), and a list of coefficients \(a_{n}\):

\[y = a_0 + a_1x + a_2x^2 + a_3x^3 + ... + a_nx^n\]
Parameters
  • signal – Time series

  • coefficients – Coefficients List of coefficient separated by commas. The numbers must be entered deparated by semicolons (e.g. 0; 1). The default is \(0.0; 1.0\), which returns the original time series.

Returns

Output

Return type

pandas.Series

Examples with univariate polynomial generators: