Contributing

This project is a community effort and contributions are welcomed. Currently, it is privately hosted in GitHub. It is publicly available but only open for internal contributions at the moment. It is also to open for contribution to our customers via the Early Adopter program on Cognite Hub our community site. If you’re not yet a member of Cognite Hub please sign up following the steps in this guide.

The main objective of the InDLS is to provide industrial domain experts and data scientist with a rich library of algorithms to speed up their work. Therefore, we highly encourage data scientists with industrial domain knowledge to contribute algorithms and models within their niche expeertise. Nevertheless, we are industry and scientific domain agnostic. We accept any type of algorithm that improves the industrial data science experience and development.

Given the above, we are picky when it comes to adding new algorithms and how we document them. We want to speed up our user’s tasks with algorithms that minimize their exploratory and analytic work. We strive to include methods that will save them time and provide comprehensive documentation for each algorithm. Keep this in mind when developing a new algorithm.

There are multiple ways to contribute, the most common ones are:

We encourage contribution of algorithms that are compliant with the Cognite Charts calculations engine. Therefore, this guide focuses on the requirements to comply with it. Nevertheless, we accept any other algorithms (not exposed through Cognite Charts) to be used by installing the python package in your preferred development environment.

Although the core of this project are the industrial algorithms, improving our documentation is very important and making our library more robust over time is of paramount importance. Please don’t hesitate to submit a Github pull request for something as small as a typo.

Contributing a new CHARTS compliant algorithm

For an algorithm to play well with the CHARTS front-end (user interface) and the calculations back-end it has to adhere to some function I/O requirements, documentation (docstrings) format and a few other requirements to expose the algorithm to the front and back-end. The first few basic requirements to keep in mind before developing and algorithm are:

  1. It must belong to a particular toolbox. All the toolboxes are listed under the indsl/ folder.

  2. It must be a python function: def():

  3. Input data is passed to each algorithm as one or more pd.Series (one for each time series) with a datetime index.

  4. The output must be a pd.Series with a datetime index for it to be displayed on the UI.

  5. Function parameters types allowed are:

    • Time series: pd.Series

    • Time series or float: Union[pd.Series, float]

    • Integer: int

    • Float: float

    • Enumerations: Enum

    • String: str

    • Timestamp: pd.Timestamp

    • Timedelta: pd.Timedelta

    • String option: Literal

    • List of floats: List[int]

    • List of floats: List[float]

    • Optional type: Optional[float]

Note

We currently support python functions with pd.Series as data input and outputs (I/O). This restriction is in place to simplify how the CHARTS infrastructure fetches and displays data. However, we have in our roadmap expanding to support other input types.

Preliminaries and setup

Note

Avoid duplicating code. Before starting a new algorithm, check for similar ones in the following places:

This project uses Poetry for dependency management. Install it before starting

pip install poetry
  1. Clone the InDSL main repository on GitHub to your local environment.

git clone git@github.com:cognitedata/indsl.git
cd indsl
  1. Install the project dependencies.

poetry install
  1. Synchronize your local master branch with the remote master branch.

git checkout master
git pull origin master

Develop your algorithm

  1. Create a feature branch to work on your new algorithm. Never work on the master or documentation branches.

    git checkout -b my_new_algorithm
    
  2. Install pre-commit to run code style checks before each commit.

    poetry run pre-commit install  # Only needed if not installed
    poetry run pre-commit run --all-files
    
  3. If you need any additional module not in the installed dependencies, install it using the add command. If you need the new module for development, use the --dev option:

    poetry add new_module
    
    poetry add new_module --dev
    
  4. Develop the new algorithm on your local branch. Use the exception classes defined in indsl/exceptions.py when raising errors that are caused by invalid or erroneous user input. InDSL provides the @check_types decorator (from typeguard) for run-time type checking, which should be used instead of checking each input type explicitly. When finished or reach an important milestone, use git add and git commit to record it:

    git add .
    git commit -m "Short but concise commit message with your changes"
    

    If your function is not valid for certain input values, an error must be thrown. For example,

    def area(length: float) -> float:
        if length < 0:
            raise UserValueError("Length cannot be negative.")
        return length**2
    
  5. As you develop the algorithm it is good practice to add tests to it. All tests are stored in the root folder tests/ using the same folder structure as the indsl/ folder. We run pytest to verify pull requests before merging with the master version. Before sending your pull request for review, make sure you have written tests for the algorithm and ran them locally to verify they pass.

Note

New algorithms without proper tests will not be merged - help us keep the code coverage at a high level!

Document your algorithm

CHARTS compliant algorithms must follow a few simple docstrings formatting requirements for the information to be parsed and properly displayed on the user interface and included in the technical documentation.

  1. Use r”””raw triple double quotes””” docstrings to document your algorithm. This allows using backslashes in the documentation, hence LaTeX formulas are properly parsed and rendered. The documentation targets both data science developers and CHARTS users and the r””” allows us properly render formulas in the CHARTS UI and in the InDSL documentation. If you are not sure how to document, refer to any algorithm in the indsl/ folder for inspiration.

  2. Follow Google Style unless otherwise is stated in this guide.

  3. Function name: after the first r”””, write a short (1-5 words) descriptive name for your function with no punctuation at the end. This will be the function name displayed on the CHARTS user interface.

  4. Add an empty space line break after the title.

  5. Write a comprehensive description of your function. Take care to use full words to describe input arguments. For example, in code you might use poly_order as an argument but in the description use polynomial order instead.

  6. Parameter names and descriptions: define all the function arguments after Args: by listing all arguments, using tabs to differentiate each one and their respective description. Adhere as close as possible to the following formatting rules for each parameter name and description:

    • A parameter name must have 30 characters or less, excluding units defined within square brackets [] (more on this below). Square brackets are only allowed to input units in a parameter name. Using brackets within a parameter name for something different to units might generate an error in the pre-commit tests.

    • Must end with a period punctuation mark .

    • Use LaTeX language for typing formulas, if any, as follows:

      • Use the command :math:`LaTeX formula` for inline formulas

      • Use the command .. math:: for full line equations

    • If a parameter requires specific units, these must be typed as follows:

      • Enclosed in square brackets []

      • In Roman (not italic) font

      • If using LaTeX language, use the :math: inline formula command, and the command \mathrm{} to render the units in Roman font.

      • Placed at the end of the string

      For example:

r"""
...
Args:

    ...

    pump_hydraulic_power: Pump hydraulic power [W].
    pump_liquid_flowrate: Pump liquid flowrate [:math:`\mathrm{\frac{m^3}{h}}`].

    ...

This is a basic example of how to document a function :

r"""
...

Args:
    data: Time series.
    window_length: Window.
        Point-wise length of the filter window (i.e. number of data points). A large window results in a stronger
        smoothing effect and vice-versa. If the filter window length is not defined by the user, a
        length of about 1/5 of the length of time series is set.
    polyorder: Polynomial order.
        Order of the polynomial used to fit the samples. Must be less than the filter window length.
        Hint: A small polynomial order (e.g. 1) results in a stronger data smoothing effect.
        Defaults to 1, which typically results in a smoothed time series representing the dominating data trend
        and attenuates fluctuations.

Returns:
    pd.Series: Time series
    If you want, it is possible to add more text here to describe the output.

...
"""
  1. Define the function output after Returns: as shown above.

  2. The above are the minimal requirements to expose the documentation on the user interface and technical docs. But feel free to add more supported sections.

  3. Go to the docs-source/source/ folder and find the appropriate toolbox rst file (e.g. smooth.rst)

  4. Add the a new entry with the name of your function as a subtitle, underlined with the symbol ^.

  5. Add the sphinx directive .. autofunction:: followed by the path to your new algorithm (see the example below). This will autogenerate the documentation from the code docstrings.

.. autofunction:: indsl.smooth.sg
  1. If you have coded an example, add the sphinx directive .. topic:: Examples: and below it the sphinx reference to find the autogenerated material (see example below). The construct is as follows, sphx_glr_autoexamples_{toolbox_folder}_{example_code}.py

.. topic:: Examples:

    * :ref:`sphx_glr_auto_examples_smooth_plot_sg_smooth.py`

Front and back end compliance

For the algorithm to be picked up by the front and back end, and display user relevant information, take the following steps.

  1. Add human readable names to each input parameter (not the input data) in your algorithm. These will be displayed on the UI, hence avoid using long names or special characters.

  2. Add a technical but human readable description of your algorithm, the inputs required, what it does, and the expected result. This will be displayed on the UI and targets our users (i.e. domain experts).

  3. Add the @check_types decorator to the functions that contain Python type annotations. This makes sure that the function is always called with inputs of the same type as specified in the function signature.

  4. Add your function to the __init__.py file of the toolbox module your algorithm belongs to. For example, the

    Savitzky-Golay smoother (indsl.smooth.sg()) belongs to the smooth toolbox. Therefore, we add sg to the list __all__ in the file indsl/smooth/__init__.py.

This would be a good time to push your changes to the remote repository

Verify documentation build

It is highly recommended to check that the documentation for your new function is built and displayed correctly. Note that you will need all of the following Sphinx python libraries to successfully build the documentation (these packages can be installed with pip): * sphinx-gallery * sphinx * sphinx-prompt * sphinx-rtd-theme

While testing the build, some files that should not be committed to the remote repository, will be autogenerated in the folder docs-source/source/auto_examples/. If these are committed nothing will really happen, except for the PR probably being longer than expected and could confuse the reviewers if they are not aware of this. To avoid it there are two two options:

  1. Don’t stage the files inside the folder docs-source/source/auto_examples/, or

  2. add the folder docs-source/source/auto_examples/ to the file .git/info/exclude to locally exclude the folder from any commit. You can use your IDE git integration to locally exclude files (e.g. PyCharm).

Once you taken care of the above, do the following:

  1. In your terminal, go to the folder docs-source/

  2. Clean the previous build (if any) using

make clean
  1. Build the documentation with

make html
  1. If there were errors during the build, address them and repeat steps 2-3.

  2. If the build was successful, open the html file located in build/html/index.html and review it navigating to the section(s) relevant to your new function.

    For mac users the file can be opened with the following command:

open build/html/index.html
  1. Once satisfied with the documentation, commit and push the changes.

Version your algorithm

Note

This section is only relevant if you are changing an existing function in InDSL.

For industrial applications, consistency and reproducibility of calculation results is of critical importance. For this reason, InDSL keeps a version history of InDSL functions that developers user can choose from. Older versions can be marked as deprecated to notify users that a new version is available. The example Function versioning demonstrates in more detail how the function versioning works in InDSL.

Do I need to version my algorithm?

You need to version your algorithm if:

  1. You are changing an existing InDSL function, and one of the following conditions holds:

    • The signature of the new function is incompatible with the old function. For instance if a parameter was renamed or a new parameter was added without a default value.

    • The modifications change the function output for any given input.

  2. You are changing a helper function that is used by other InDSL functions. In that case you need to version the helper function and all affected InDSL functions.

Note

In order to avoid code duplication, one should explore if the modifications can be implemented in a backwards-compatible manner (for instance through a new parameter with a default value).

How do I version my function?

As an example, we consider a function myfunc in mymod.py. A new function version is released through the following steps.

  1. Move the function from mymod.py to mymod_.py. Create the file if it does not yet exist.

  2. If not already present, add the versioning.register() decorator to the function. Specifically,

    # file: mymod_.py
    def myfunc(...)
       # old implementation
    

    becomes:

    # file: mymod_.py
    from indsl import versioning
    
    @versioning.register(version="1", deprecated=True)
    def myfunc(...)
       # old implementation
    

    Note: The first version of any function must be 1.0! Also note that deprecated=True: InDSL allows at most one non-deprecated version. For functions already in CHARTS, deprecating all versions will remove the functions from the front-end.

    If there are more than one deprecated version, the different versions can be given different names in order to avoid name conflicts. This can be achieved by setting the paramter name:

    # file: mymod_.py
    from indsl import versioning
    
    @versioning.register(version="1", deprecated=True, name="myfunc")
    def myfunc_v1(...)
       # first implementation
    
    @versioning.register(version="2", deprecated=True)
    def myfunc(...)
       # second implementation
    
  3. Add the new implementation to mymod.py and import mymod_.py. The modified mymod.py file will look like:

    # file: mymod.py
    from indsl import versioning
    from . import mymod_  # noqa
    
    @versioning.register(version="3", changelog="Describe here how the function changed compared to the previous version")
    def myfunc(...)
       # new implementation
    

    Make sure to increment the version number (a single positive integer) of the new implementation. Optionally, non-breaking changes can be versioned. In that case follow the semantic versioning guidelines.

  4. Make sure the all versions of the function myfunc are tested. If the tests of the most recent version are in test_mymod.py, tests for older versions can be placed in test_mymod_.py.

Create a pull request

Before a PR is merged it needs to be approved by of our internal developers. If you expect to keep on working on your algorithm and are not ready to start the review process, please label the PR as a draft.

To make the review process a better experience, we encourage complying with the following guidelines:

  1. Give your pull request a helpful title. If it is part of a JIRA task in our development backlog, please add the task reference so it can be tracked by our team. If you are fixing a bug or improving documentation, using “BUG <ISSUE TITLE>” and “DOC <DESCRIPTION>” is enough.

  2. Make sure your code passes all the tests. You could run pytest globally, but this is not recommended as it will take a long time as our library grows. Typically, running a few tests only on your new algorithm is enough. For example, if you created a new_algorithm in the smooth toolbox and added the tests test_new_algorithm.py:

    • pytest tests/smooth/test_new_algorithm.py to run the tests specific to your algorithm

    • pytest tests/smooth to run the whole tests for the smooth toolbox module

  3. Make sure your code is properly commented and documented. We can not highlight enough how important documenting your algorithm is for the succes of this product.

  4. Make sure the documentation renders properly. For details on how to build the documentation. Check our documntation guidelines (WIP). The official documentation will be built and deployed by our CI/CD workflows.

  5. Add test to all new algorithms or improvements to algorithms. These test add robustness to our code base and ensure that future modifications comply with the desired behavior of the algorithm.

  6. Run black to auto-format your code contributions. Our pre-commit will run black for the entire project once you are ready to commit and push to the remote branch. But this can take some time as our code base grows. Therefore, it is good practice to run periodically run black only for your new code.

black {source_file_or_directory}

This is not an exact list of requirements or guidelines. If you have suggestions, don’t hesitate to submit an issue or a PR with enhancement to this document.

Finally, once you have completed your new contribution, sync with the remote/master branch one last in case there have been any recent changes to the code base:

git checkout master
git pull
git checkout {my_branch_name}
git merge master

Then use git add, git commit, and git push to record your new algorithm and send it to the remote repository:

git add .
git commit -m "Explicit commit message"
git push

Go to the InDSL repository PR page, start a New pull request and let the review process begin.

Coding Style

To ensure consistency throughout the code, we recommend using the following style conventions when contributing to the library:
  • Call the time series parameter of your function data unless a more specific name can be given, like pressure or temperature.

  • Use abbreviations when defining the types of function arguments. For example pd. instead of pandas.

Reviewer guidelines

Any InDSL function that is exposed in the CHARTS application (i.e. any function that is listed in the __init__.py files), must be reviewed by a member of the CHARTS development team.