Contributing

This project is a community effort and contributions are welcomed. InDSL is publicly available and open for contributions here. Engage on our community site, Cognite Hub, for discussion, suggestions and questions about InDSL.

The main objective of the InDLS is to provide industrial domain experts and data scientist with a rich library of algorithms to speed up their work. Therefore, we highly encourage data scientists with industrial domain knowledge to contribute algorithms and models within their area of expertise. We are industry and scientific domain agnostic. We accept any type of algorithm that improves the industrial data science experience and development.

Given the above, we are picky when it comes to adding new algorithms and how we document them. We want to speed up our user’s tasks with algorithms that minimize their exploratory and analytic work. We strive to include methods that will save them time and provide comprehensive documentation for each algorithm. Keep this in mind when developing a new algorithm.

There are multiple ways to contribute, the most common ones are:

  • New algorithm

  • Documentation

  • Examples

  • Bug reports

We encourage contribution of algorithms that are compliant with the Cognite Charts calculations engine. Therefore, this guide focuses on the requirements to comply with it. Nevertheless, we accept any other algorithms (not exposed through Cognite Charts) to be used by installing the python package in your preferred development environment.

Although the core of this project are the industrial algorithms, improving our documentation is very important and making our library more robust over time is of paramount importance. Please don’t hesitate to submit a Github pull request for something as small as a typo.

Open source contributions

Thank you for considering contributing to InDSL! We welcome all contributions as listed above. We encourage you to read this document to understand how to contribute to the project. Also, we are happy to help you get started, and we welcome your efforts to improve InDSL as long as everyone involved is treated with respect. Cordiality is highly appreciated. Please read our Code of Conduct before contributing.

A good PR should be concise, clear, and easy to understand. In order to contribute, follow these steps:

1. Fork the repository: Fork the repository to your own GitHub account.

2. Run the tests: Confirm that the tests pass on your local machine. We use pytest for testing. If they fail and you are unable to fix the issue, please reach out to us.

3. Make your changes: Make your changes to the code base. Make sure to follow the coding style and documentation guidelines. Pre-commit checks will run automatically when you push your changes. You can also run pre-commit checks manually for all staged files by running poetry run pre-commit run --all-files. We follow the Google Python Style Guide for docstrings.

4. Write tests: If you are adding a new feature or fixing a bug, write tests using the pytest framework to cover the new code. Make sure that they pass.

5. Make a pull request: Once you are satisfied with your changes and all of the tests pass, make a pull request in the base repository using the conventional commit message format.

Code Review Process

Contributions will only be merged after a code review. You are expected to address and incorporate feedback from the review unless there are compelling reasons not to. If you disagree with the feedback, present your objections clearly and respectfully. If the feedback is still deemed applicable after further discussion, you must either implement the suggested changes or choose to withdraw your contribution.

Documentation Contributions

Improvements to our documentation are much appreciated! The documentation source files are located in the docs-source/source directory of our codebase. They are formatted in reStructuredText and compiled with Sphinx to produce comprehensive documentation.

Contributing a new Cognite Charts compliant algorithm

For an algorithm to play well with the Charts front-end (user interface) and the calculations back-end it has to adhere to some function input and output requirements, documentation (docstrings) format and a few other requirements to expose the algorithm to the front and back-end. The first few basic requirements to keep in mind before developing and algorithm are:

  1. It must belong to a particular toolbox. All the toolboxes are listed under the indsl/ folder.

  2. It must be a python function: def():

  3. Input data is passed to each algorithm as one or more pd.Series (one for each time series) with a datetime index.

  4. The output must be a pd.Series with a datetime index for it to be displayed on the UI.

  5. Function parameters types allowed are:

    • Time series: pd.Series

    • Time series or float: Union[pd.Series, float]

    • Integer: int

    • Float: float

    • Enumerations: Enum

    • String: str

    • Timestamp: pd.Timestamp

    • Timedelta: pd.Timedelta

    • String option: Literal

    • List of floats: List[int]

    • List of floats: List[float]

    • Optional type: Optional[float]

Note

We currently support python functions with pd.Series as the type of data input and outputs. This restriction is in place to simplify how the Charts infrastructure fetches and displays data.

Preliminaries and setup

Note

Avoid duplicating code. Before starting a new algorithm, check for similar ones in the following places:

This project uses Poetry for dependency management. Install it before starting

pip install poetry
  1. For open source contributions, fork the InDSL main repository on GitHub to your local environment. If the contribution is internal, you may clone the repository directly.

git clone git@github.com:cognitedata/indsl.git
cd indsl
  1. Install the project dependencies.

poetry install --all-extras
  1. Synchronize your local main branch with the remote main branch.

git checkout main
git pull origin main

Develop your algorithm

  1. Create a feature branch to work on your new algorithm. Never work on the main or documentation branches.

    git checkout -b my_new_algorithm
    
  2. Install pre-commit to run code style checks before each commit.

    poetry run pre-commit install  # Only needed if not installed
    poetry run pre-commit run --all-files
    
  3. If you need any additional module not in the installed dependencies, install it using the add command. If you need the new module for development, use the --dev option:

    poetry add new_module
    
    poetry add new_module --dev
    
  4. Develop the new algorithm on your local branch. Use the exception classes defined in indsl/exceptions.py when raising errors that are caused by invalid or erroneous user input. InDSL provides the @check_types decorator (from typeguard) for run-time type checking, which should be used instead of checking each input type explicitly. When finished or reach an important milestone, use git add and git commit to record it:

    git add .
    git commit -m "Short but concise commit message with your changes"
    

    If your function is not valid for certain input values, an error must be thrown. For example,

    def area(length: float) -> float:
        if length < 0:
            raise UserValueError("Length cannot be negative.")
        return length**2
    
  5. As you develop the algorithm it is good practice to add tests to it. All tests are stored in the root folder tests/ using the same folder structure as the indsl/ folder. We run pytest to verify pull requests before merging with the main version. Before sending your pull request for review, make sure you have written tests for the algorithm and ran them locally to verify they pass.

Note

New algorithms without proper tests will not be merged - help us keep the code coverage at a high level!

Core or Extras

InDSL is divided into two main categories: core and extras. The core algorithms are the ones that only require numpy, scipy``and ``pandas as dependencies. The extras are algorithms that require additional dependencies.

If your algorithm requires additional dependencies, add them to the pyproject.toml file as optional dependencies and also add them under the tool.poetry.extras section in an appropriate category. The dependencies will also need to be lazy loaded to avoid loading them when the core part of the library is imported. To do this you need to import the dependencies in the function itself, and not at the top of the file.

Document your algorithm

Charts compliant algorithms must follow a few simple docstrings formatting requirements for the information to be parsed and properly displayed on the user interface and included in the technical documentation.

  1. Use r”””raw triple double quotes””” docstrings to document your algorithm. This allows using backslashes in the documentation, hence LaTeX formulas are properly parsed and rendered. The documentation targets both data science developers and Charts users and the r””” allows us properly render formulas in the Charts UI and in the InDSL documentation. If you are not sure how to document, refer to any algorithm in the indsl/ folder for inspiration.

  2. Follow Google Style unless otherwise is stated in this guide.

  3. Function name: after the first r”””, write a short (1-5 words) descriptive name for your function with no punctuation at the end. This will be the function name displayed on the Charts user interface.

  4. Add an empty space line break after the title.

  5. Write a comprehensive description of your function. Take care to use full words to describe input arguments. For example, in code you might use poly_order as an argument but in the description use polynomial order instead.

  6. Parameter names and descriptions: define all the function arguments after Args: by listing all arguments, using tabs to differentiate each one and their respective description. Adhere as close as possible to the following formatting rules for each parameter name and description:

    • A parameter name must have 30 characters or less, excluding units defined within square brackets [] (more on this below). Square brackets are only allowed to input units in a parameter name. Using brackets within a parameter name for something different to units might generate an error in the pre-commit tests.

    • Must end with a period punctuation mark . The punctuation after the parameter name will not be shown in the Charts user interface, but must be included in the docstrings.

    • Use LaTeX language for typing formulas, if any, as follows:

      • Use the command :math:`LaTeX formula` for inline formulas

      • Use the command .. math:: for full line equations

    • If a parameter requires specific units, these must be typed as follows:

      • They must be enclosed in square brackets [] to clearly distinguish them from the variable name. Failure to do so may lead to incorrect calculations or unit mismatches.

      • They should be typed in Roman (not italic) font

      • If using LaTeX language, use the :math: inline formula command, and the command \mathrm{} to render the units in Roman font.

      • Place the unit at the end of the string, following the argument’s descriptive name.

      For example:

r"""
...
Args:

    ...

    pump_hydraulic_power: Pump hydraulic power [W].
    pump_liquid_flowrate: Pump liquid flowrate [:math:`\mathrm{\frac{m^3}{h}}`].

    ...

This is a basic example of how to document a function :

r"""
...

Args:
    data: Time series.
    window_length: Window.
        Point-wise length of the filter window (i.e. number of data points). A large window results in a stronger
        smoothing effect and vice-versa. If the filter window length is not defined by the user, a
        length of about 1/5 of the length of time series is set.
    polyorder: Polynomial order.
        Order of the polynomial used to fit the samples. Must be less than the filter window length.
        Hint: A small polynomial order (e.g. 1) results in a stronger data smoothing effect.
        Defaults to 1, which typically results in a smoothed time series representing the dominating data trend
        and attenuates fluctuations.

Returns:
    pd.Series: Time series
    If you want, it is possible to add more text here to describe the output.

...
"""
  1. Define the function output after Returns: as shown above.

  2. The above are the minimal requirements to expose the documentation on the user interface and technical docs. But feel free to add more supported sections.

  3. Go to the docs-source/source/ folder and find the appropriate toolbox rst file (e.g. smooth.rst)

  4. Add the a new entry with the name of your function as a subtitle, underlined with the symbol ^.

  5. Add the sphinx directive .. autofunction:: followed by the path to your new algorithm (see the example below). This will autogenerate the documentation from the code docstrings.

.. autofunction:: indsl.smooth.sg
  1. If you have coded an example, add the sphinx directive .. topic:: Examples: and below it the sphinx reference to find the autogenerated material (see example below). The construct is as follows, sphx_glr_autoexamples_{toolbox_folder}_{example_code}.py

.. topic:: Examples:

    * :ref:`sphx_glr_auto_examples_smooth_plot_sg_smooth.py`

Front and back end compliance

For the algorithm to be picked up by the front and back end, and display user relevant information, take the following steps.

  1. Add human readable names to each input parameter (not the input data) in your algorithm. These will be displayed on the UI, hence avoid using long names or special characters.

  2. Add a technical but human readable description of your algorithm, the inputs required, what it does, and the expected result. This will be displayed on the UI and targets our users (i.e. domain experts).

  3. Add the @check_types decorator to the functions that contain Python type annotations. This makes sure that the function is always called with inputs of the same type as specified in the function signature.

  4. Add your function to the attribute __cognite__ in the __init__.py file of the toolbox module your algorithm belongs to. For example, the

    Savitzky-Golay smoother (indsl.smooth.sg()) belongs to the smooth toolbox. Therefore, we add sg to the list __cognite__ in the file indsl/smooth/__init__.py.

This would be a good time to push your changes to the remote repository

Verify documentation build

It is highly recommended to check that the documentation for your new function is built and displayed correctly. Note that you will need all of the following Sphinx python libraries to successfully build the documentation (these packages can be installed with pip): * sphinx-gallery * sphinx * sphinx-prompt * sphinx-rtd-theme

While testing the build, some files that should not be committed to the remote repository, will be autogenerated in the folder docs-source/source/auto_examples/. If these are committed nothing will really happen, except for the PR probably being longer than expected and could confuse the reviewers if they are not aware of this. To avoid it there are two two options:

  1. Don’t stage the files inside the folder docs-source/source/auto_examples/, or

  2. add the folder docs-source/source/auto_examples/ to the file .git/info/exclude to locally exclude the folder from any commit. You can use your IDE git integration to locally exclude files (e.g. PyCharm).

Once you taken care of the above, do the following:

  1. Install the dependencies needed to build the documentation:

poetry install --with docs
  1. In your terminal, go to the folder docs-source/

  2. Clean the previous build (if any) using

make clean
  1. Build the documentation with

poetry run make html
  1. If there were errors during the build, address them and repeat steps 2-3.

  2. If the build was successful, open the html file located in build/html/index.html and review it navigating to the section(s) relevant to your new function.

    For mac users the file can be opened with the following command:

open build/html/index.html
  1. Once satisfied with the documentation, commit and push the changes.

Version your algorithm

Note

This section is only relevant if you are changing an existing function in InDSL.

For industrial applications, consistency and reproducibility of calculation results is of critical importance. For this reason, InDSL keeps a version history of InDSL functions that developers user can choose from. Older versions can be marked as deprecated to notify users that a new version is available. The example Function versioning demonstrates in more detail how the function versioning works in InDSL.

Do I need to version my algorithm?

You need to version your algorithm if:

  1. You are changing an existing InDSL function, and one of the following conditions holds:

    • The signature of the new function is incompatible with the old function. For instance if a parameter was renamed or a new parameter was added without a default value.

    • The modifications change the function output for any given input.

  2. You are changing a helper function that is used by other InDSL functions. In that case you need to version the helper function and all affected InDSL functions.

Note

In order to avoid code duplication, one should explore if the modifications can be implemented in a backwards-compatible manner (for instance through a new parameter with a default value).

How do I version my function?

As an example, we consider a function myfunc in mymod.py. A new function version is released through the following steps.

  1. Move the function from mymod.py to mymod_vX.py, where X denotes the current function version. If the function is not versioned yet, create the file mymod_v1.py.

  2. If not already present, add the versioning.register() decorator to the function. Specifically,

    # file: mymod_v1.py
    @check_types
    def myfunc(...)
       # old implementation
    

    becomes:

    # file: mymod_v1.py
    from indsl import versioning
    
    @versioning.register(version="1.0", deprecated=True)
    @check_types
    def myfunc(...)
       # old implementation
    

    Note: The first version of any function must be 1.0! Also note that deprecated=True: InDSL allows at most one non-deprecated version. For functions already in Charts, deprecating all versions will remove the functions from the front-end.

    Note: check_types decorator should be placed before versioning.register decorator.

  3. Add the new implementation to mymod.py and import mymod_v1.py. The modified mymod.py file will look like:

    # file: mymod.py
    from indsl import versioning
    from . import mymod_v1  # noqa
    
    @versioning.register(version="2.0", changelog="Describe here how the function changed compared to the previous version")
    def myfunc(...)
       # new implementation
    

    Make sure to increment the version number (a single positive integer) of the new implementation. Optionally, non-breaking changes can be versioned. In that case follow the semantic versioning guidelines.

  4. Make sure the all versions of the function myfunc are tested. If the tests of the most recent version are in test_mymod.py, tests for the deprecated function can be placed in test_mymod_v1.py.

Create a pull request

Before a PR is merged it needs to be approved by of our internal developers. If you expect to keep on working on your algorithm and are not ready to start the review process, please label the PR as a draft.

To make the review process a better experience, we encourage complying with the following guidelines:

  1. Give your pull request a helpful title. If it is part of a JIRA task in our development backlog, please add the task reference so it can be tracked by our team. If you are fixing a bug or improving documentation, using “BUG <ISSUE TITLE>” and “DOC <DESCRIPTION>” is enough.

  2. Make sure your code passes all the tests. You could run pytest globally, but this is not recommended as it will take a long time as our library grows. Typically, running a few tests only on your new algorithm is enough. For example, if you created a new_algorithm in the smooth toolbox and added the tests test_new_algorithm.py:

    • pytest tests/smooth/test_new_algorithm.py to run the tests specific to your algorithm

    • pytest tests/smooth to run the whole tests for the smooth toolbox module

  3. Make sure your code is properly commented and documented. We can not highlight enough how important documenting your algorithm is for the succes of this product.

  4. Make sure the documentation renders properly. For details on how to build the documentation. Check our documentation guidelines (WIP). The official documentation will be built and deployed by our CI/CD workflows.

  5. Make sure the function renders properly in the UI. To preview the function node access the storybook build results url, which can be found in the PR comments. In chromatic, scroll down and inspect the stories for the function.

  6. Add test to all new algorithms or improvements to algorithms. These test add robustness to our code base and ensure that future modifications comply with the desired behavior of the algorithm.

  7. Run black to auto-format your code contributions. Our pre-commit will run black for the entire project once you are ready to commit and push to the remote branch. But this can take some time as our code base grows. Therefore, it is good practice to run periodically run black only for your new code.

black {source_file_or_directory}

This is not an exact list of requirements or guidelines. If you have suggestions, don’t hesitate to submit an issue or a PR with enhancement to this document.

Finally, once you have completed your new contribution, sync with the remote/main branch one last in case there have been any recent changes to the code base:

git checkout main
git pull
git checkout {my_branch_name}
git merge main

Then use git add, git commit, and git push to record your new algorithm and send it to the remote repository:

git add .
git commit -m "Explicit commit message"
git push

Go to the InDSL repository PR page, start a New pull request and let the review process begin.

Contributing a free form algorithm

It is possible to contribute to InDSL without the algorithm being exposed in the Charts application. In this case, the algorithm will only be available to users who install the InDSL python package. It should not be included in the __cognite__ attribute of the toolbox __init__.py file. Although the algorithm doesn’t need to meet the requirements mentioned in the previous section, it is still important to

document it properly, add all necessary tests and potentially an example to the documentation.

Coding Style

To ensure consistency throughout the code, we recommend using the following style conventions when contributing to the library:
  • Call the time series parameter of your function data unless a more specific name can be given, like pressure or temperature.

  • Use abbreviations when defining the types of function arguments. For example pd. instead of pandas.

Reviewer guidelines

Any InDSL function that is exposed in the Charts application (i.e. any function that is listed in __cognite__ in the __init__.py files), must be reviewed by a member of the Charts development team.