.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/data_quality/plot_extreme_outlier.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_data_quality_plot_extreme_outlier.py: ======================== Extreme Outliers Removal ======================== Example of point outlier removal with polynomial regression and Studentized residuals. We generate a toy data set with an underlying polynomial signal that has Gaussian noise and large point outliers added to it. In the figure below, it can be seen that the point outliers are filtered out from the raw data. This data can then be subsequently processed with a smoother to refine the underlying signal if desired. .. GENERATED FROM PYTHON SOURCE LINES 13-49 .. image-sg:: /auto_examples/data_quality/images/sphx_glr_plot_extreme_outlier_001.png :alt: plot extreme outlier :srcset: /auto_examples/data_quality/images/sphx_glr_plot_extreme_outlier_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /Users/neringaaltanaite/Project/indsl/indsl/equipment/volume_vessel.py:32: IndslUserWarning: Couldn't import fluids.numba_vectorized: No module named 'fluids.vectorized'. Default to import fluids.vectorized. warnings.warn( | .. code-block:: Python import matplotlib.pyplot as plt import numpy as np import pandas as pd from indsl.data_quality import extreme rng = np.random.default_rng(12345) plt.rcParams.update({"font.size": 18}) # Create Toy clean dataset nx = 1000 index = pd.date_range(start="1970", periods=nx, freq="1min") x = np.linspace(0, 10, nx) signal = 2 * x**2 - 10 * x + 2 noise = np.random.normal(loc=100, size=nx, scale=2) y = noise + signal # Add anomalies anom_num = rng.integers(low=0, high=200, size=20) anom_ids = rng.integers(low=0, high=nx, size=20) y[anom_ids] = anom_num is_anom = [item in anom_ids for item in range(nx)] raw_data = pd.Series(y, index=index) # Find anomalies and plot results res = extreme(raw_data) plt.figure(1, figsize=[15, 5]) raw_data.plot() res.plot() _ = plt.legend(["Raw Data", "Filtered with Anomaly Detector"]) .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.671 seconds) .. _sphx_glr_download_auto_examples_data_quality_plot_extreme_outlier.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_extreme_outlier.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_extreme_outlier.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_extreme_outlier.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_