Contributing
This project is a community effort and contributions are welcomed. Currently, it is privately hosted in GitHub. It is publicly available but only open for internal contributions at the moment. It is also to open for contribution to our customers via the Early Adopter program on Cognite Hub our community site. If you’re not yet a member of Cognite Hub please sign up following the steps in this guide.
The main objective of the InDLS is to provide industrial domain experts and data scientist with a rich library of algorithms to speed up their work. Therefore, we highly encourage data scientists with industrial domain knowledge to contribute algorithms and models within their niche expeertise. Nevertheless, we are industry and scientific domain agnostic. We accept any type of algorithm that improves the industrial data science experience and development.
Given the above, we are picky when it comes to adding new algorithms and how we document them. We want to speed up our user’s tasks with algorithms that minimize their exploratory and analytic work. We strive to include methods that will save them time and provide comprehensive documentation for each algorithm. Keep this in mind when developing a new algorithm.
There are multiple ways to contribute, the most common ones are:
New algorithm
Documentation
Examples: Gallery of Charts
Bug reports
We encourage contribution of algorithms that are compliant with the Cognite Charts calculations engine. Therefore, this guide focuses on the requirements to comply with it. Nevertheless, we accept any other algorithms (not exposed through Cognite Charts) to be used by installing the python package in your preferred development environment.
Although the core of this project are the industrial algorithms, improving our documentation is very important and making our library more robust over time is of paramount importance. Please don’t hesitate to submit a Github pull request for something as small as a typo.
Contributing a new CHARTS compliant algorithm
For an algorithm to play well with the CHARTS front-end (user interface) and the calculations back-end it has to adhere to some function I/O requirements, documentation (docstrings) format and a few other requirements to expose the algorithm to the front and back-end. The first few basic requirements to keep in mind before developing and algorithm are:
It must belong to a particular toolbox. All the toolboxes are listed under the
indsl/
folder.It must be a python function:
def():
Input data is passed to each algorithm as one or more
pd.Series
(one for each time series) with adatetime
index.The output must be a
pd.Series
with adatetime
index for it to be displayed on the UI.Function parameters types allowed are:
Time series:
pd.Series
Time series or float:
Union[pd.Series, float]
Integer:
int
Float:
float
Enumerations:
Enum
String:
str
Timestamp:
pd.Timestamp
Timedelta:
pd.Timedelta
String option:
Literal
List of floats:
List[int]
List of floats:
List[float]
Optional type:
Optional[float]
Note
We currently support python functions with pd.Series
as data input and outputs (I/O). This restriction
is in place to simplify how the CHARTS infrastructure fetches and displays data. However, we have in our roadmap
expanding to support other input types.
Preliminaries and setup
Note
This project uses Poetry for dependency management. Install it before starting
pip install poetry
Clone the InDSL main repository on GitHub to your local environment.
git clone git@github.com:cognitedata/indsl.git
cd indsl
Install the project dependencies.
poetry install
Synchronize your local master branch with the remote master branch.
git checkout master
git pull origin master
Develop your algorithm
Create a feature branch to work on your new algorithm. Never work on the master or documentation branches.
git checkout -b my_new_algorithm
Install pre-commit to run code style checks before each commit.
poetry run pre-commit install # Only needed if not installed poetry run pre-commit run --all-files
If you need any additional module not in the installed dependencies, install it using the
add
command. If you need the new module for development, use the--dev
option:poetry add new_module
poetry add new_module --dev
Develop the new algorithm on your local branch. Use the exception classes defined in indsl/exceptions.py when raising errors that are caused by invalid or erroneous user input. InDSL provides the @check_types decorator (from typeguard) for run-time type checking, which should be used instead of checking each input type explicitly. When finished or reach an important milestone, use
git add
andgit commit
to record it:git add . git commit -m "Short but concise commit message with your changes"
If your function is not valid for certain input values, an error must be thrown. For example,
def area(length: float) -> float: if length < 0: raise UserValueError("Length cannot be negative.") return length**2
As you develop the algorithm it is good practice to add tests to it. All tests are stored in the root folder tests/ using the same folder structure as the
indsl/
folder. We runpytest
to verify pull requests before merging with the master version. Before sending your pull request for review, make sure you have written tests for the algorithm and ran them locally to verify they pass.
Note
New algorithms without proper tests will not be merged - help us keep the code coverage at a high level!
Document your algorithm
CHARTS compliant algorithms must follow a few simple docstrings formatting requirements for the information to be parsed and properly displayed on the user interface and included in the technical documentation.
Use r”””raw triple double quotes””” docstrings to document your algorithm. This allows using backslashes in the documentation, hence LaTeX formulas are properly parsed and rendered. The documentation targets both data science developers and CHARTS users and the r””” allows us properly render formulas in the CHARTS UI and in the InDSL documentation. If you are not sure how to document, refer to any algorithm in the
indsl
/ folder for inspiration.Follow Google Style unless otherwise is stated in this guide.
Function name: after the first r”””, write a short (1-5 words) descriptive name for your function with no punctuation at the end. This will be the function name displayed on the CHARTS user interface.
Add an empty space line break after the title.
Write a comprehensive description of your function. Take care to use full words to describe input arguments. For example, in code you might use
poly_order
as an argument but in the description usepolynomial order
instead.Parameter names and descriptions: define all the function arguments after
Args:
by listing all arguments, using tabs to differentiate each one and their respective description. Adhere as close as possible to the following formatting rules for each parameter name and description:A parameter name must have 30 characters or less, excluding units defined within square brackets
[]
(more on this below). Square brackets are only allowed to input units in a parameter name. Using brackets within a parameter name for something different to units might generate an error in the pre-commit tests.Must end with a period punctuation mark
.
Use LaTeX language for typing formulas, if any, as follows:
Use the command
:math:`LaTeX formula`
for inline formulasUse the command
.. math::
for full line equations
If a parameter requires specific units, these must be typed as follows:
Enclosed in square brackets
[]
In Roman (not italic) font
If using LaTeX language, use the
:math:
inline formula command, and the command\mathrm{}
to render the units in Roman font.Placed at the end of the string
For example:
r"""
...
Args:
...
pump_hydraulic_power: Pump hydraulic power [W].
pump_liquid_flowrate: Pump liquid flowrate [:math:`\mathrm{\frac{m^3}{h}}`].
...
This is a basic example of how to document a function :
r"""
...
Args:
data: Time series.
window_length: Window.
Point-wise length of the filter window (i.e. number of data points). A large window results in a stronger
smoothing effect and vice-versa. If the filter window length is not defined by the user, a
length of about 1/5 of the length of time series is set.
polyorder: Polynomial order.
Order of the polynomial used to fit the samples. Must be less than the filter window length.
Hint: A small polynomial order (e.g. 1) results in a stronger data smoothing effect.
Defaults to 1, which typically results in a smoothed time series representing the dominating data trend
and attenuates fluctuations.
Returns:
pd.Series: Time series
If you want, it is possible to add more text here to describe the output.
...
"""
Define the function output after
Returns:
as shown above.The above are the minimal requirements to expose the documentation on the user interface and technical docs. But feel free to add more supported sections.
Go to the
docs-source/source/
folder and find the appropriate toolboxrst
file (e.g.smooth.rst
)Add the a new entry with the name of your function as a subtitle, underlined with the symbol
^
.Add the sphinx directive
.. autofunction::
followed by the path to your new algorithm (see the example below). This will autogenerate the documentation from the code docstrings.
.. autofunction:: indsl.smooth.sg
If you have coded an example, add the sphinx directive
.. topic:: Examples:
and below it the sphinx reference to find the autogenerated material (see example below). The construct is as follows,sphx_glr_autoexamples_{toolbox_folder}_{example_code}.py
.. topic:: Examples:
* :ref:`sphx_glr_auto_examples_smooth_plot_sg_smooth.py`
Front and back end compliance
For the algorithm to be picked up by the front and back end, and display user relevant information, take the following steps.
Add human readable names to each input parameter (not the input data) in your algorithm. These will be displayed on the UI, hence avoid using long names or special characters.
Add a technical but human readable description of your algorithm, the inputs required, what it does, and the expected result. This will be displayed on the UI and targets our users (i.e. domain experts).
Add the @check_types decorator to the functions that contain Python type annotations. This makes sure that the function is always called with inputs of the same type as specified in the function signature.
- Add your function to the
__init__.py
file of the toolbox module your algorithm belongs to. For example, the Savitzky-Golay smoother (
indsl.smooth.sg()
) belongs to thesmooth
toolbox. Therefore, we addsg
to the list__all__
in the fileindsl/smooth/__init__.py
.
- Add your function to the
This would be a good time to push your changes to the remote repository
Add an example to the Gallery of Charts
Gallery of Charts is an auto generated collection of examples of our industrial data science algorithms. Following the steps below, your example will be automatically added to the gallery. We take care of auto generating the figures, adding the code to the gallery, and links to downloadable python and notebook versions of your code for other data scientists to use or get inspired by (sharing is caring!). We use Sphinx-Gallery for this purpose, if you want to find out more about what you can do to generate generate your example.
We want to offer our user and developers as much information as possible about our industrial algorithms. Therefore we strongly encourage all data scientist and developers to include one or more examples (license to go crazy here) to show off all the amazing features and functionalities of your new algorithm and how it can be used.
Clone the INDSL repo and create your own local branch.
Go to the toolbox folder in
examples
where your algorithm belongs to (e.g.smooth
)Create a new python file with the prefix plot_. For example
plot_my_new_algo_feature.py
.At the top of the file, add a triple quote docstring that start with the title of your example enclose by top and bottom equal symbols (as shown below), followed by a description of your example. For inspiration, check the Gallery of Charts or one of the examples in the repository (e.g.
examples/smooth/plot_sg_smooth.py
).
"""
=============
Example title
=============
Description of the example and what feature of the algorithm I'm showing off.
"""
import pandas as pd
...
Once you are done developing the example record your changes using
git add <path_to_file>
,git commit -m <commit_message>
andgit push -u origin <your_branch_name>
You can test the Sphinx build of your PR by following the steps in the section below.
Verify documentation build
It is highly recommended to check that the documentation for your new function is built and displayed correctly. Note that you will need all of the following Sphinx python libraries to successfully build the documentation (these packages can be installed with pip): * sphinx-gallery * sphinx * sphinx-prompt * sphinx-rtd-theme
While testing the build, some files that should not be committed to the remote repository, will be
autogenerated in the folder docs-source/source/auto_examples/
. If these are committed nothing will really happen,
except for the PR probably being longer than expected and could confuse the reviewers if they are not aware of this.
To avoid it there are two two options:
Don’t stage the files inside the folder
docs-source/source/auto_examples/
, oradd the folder
docs-source/source/auto_examples/
to the file.git/info/exclude
to locally exclude the folder from any commit. You can use your IDE git integration to locally exclude files (e.g. PyCharm).
Once you taken care of the above, do the following:
In your terminal, go to the folder
docs-source/
Clean the previous build (if any) using
make clean
Build the documentation with
make html
If there were errors during the build, address them and repeat steps 2-3.
If the build was successful, open the html file located in build/html/index.html and review it navigating to the section(s) relevant to your new function.
For mac users the file can be opened with the following command:
open build/html/index.html
Once satisfied with the documentation, commit and push the changes.
Version your algorithm
Note
This section is only relevant if you are changing an existing function in InDSL.
For industrial applications, consistency and reproducibility of calculation results is of critical importance. For this reason, InDSL keeps a version history of InDSL functions that developers user can choose from. Older versions can be marked as deprecated to notify users that a new version is available. The example Function versioning demonstrates in more detail how the function versioning works in InDSL.
Do I need to version my algorithm?
You need to version your algorithm if:
You are changing an existing InDSL function, and one of the following conditions holds:
The signature of the new function is incompatible with the old function. For instance if a parameter was renamed or a new parameter was added without a default value.
The modifications change the function output for any given input.
You are changing a helper function that is used by other InDSL functions. In that case you need to version the helper function and all affected InDSL functions.
Note
In order to avoid code duplication, one should explore if the modifications can be implemented in a backwards-compatible manner (for instance through a new parameter with a default value).
How do I version my function?
As an example, we consider a function myfunc in mymod.py. A new function version is released through the following steps.
Move the function from mymod.py to mymod_.py. Create the file if it does not yet exist.
If not already present, add the
versioning.register()
decorator to the function. Specifically,# file: mymod_.py def myfunc(...) # old implementation
becomes:
# file: mymod_.py from indsl import versioning @versioning.register(version="1", deprecated=True) def myfunc(...) # old implementation
Note: The first version of any function must be 1.0! Also note that
deprecated=True
: InDSL allows at most one non-deprecated version. For functions already in CHARTS, deprecating all versions will remove the functions from the front-end.If there are more than one deprecated version, the different versions can be given different names in order to avoid name conflicts. This can be achieved by setting the paramter
name
:# file: mymod_.py from indsl import versioning @versioning.register(version="1", deprecated=True, name="myfunc") def myfunc_v1(...) # first implementation @versioning.register(version="2", deprecated=True) def myfunc(...) # second implementation
Add the new implementation to mymod.py and import mymod_.py. The modified mymod.py file will look like:
# file: mymod.py from indsl import versioning from . import mymod_ # noqa @versioning.register(version="3", changelog="Describe here how the function changed compared to the previous version") def myfunc(...) # new implementation
Make sure to increment the version number (a single positive integer) of the new implementation. Optionally, non-breaking changes can be versioned. In that case follow the semantic versioning guidelines.
Make sure the all versions of the function myfunc are tested. If the tests of the most recent version are in test_mymod.py, tests for older versions can be placed in test_mymod_.py.
Create a pull request
Before a PR is merged it needs to be approved by of our internal developers. If you expect to keep on working on your
algorithm and are not ready to start the review process, please label the PR as a draft
.
To make the review process a better experience, we encourage complying with the following guidelines:
Give your pull request a helpful title. If it is part of a JIRA task in our development backlog, please add the task reference so it can be tracked by our team. If you are fixing a bug or improving documentation, using “BUG <ISSUE TITLE>” and “DOC <DESCRIPTION>” is enough.
Make sure your code passes all the tests. You could run
pytest
globally, but this is not recommended as it will take a long time as our library grows. Typically, running a few tests only on your new algorithm is enough. For example, if you created anew_algorithm
in thesmooth
toolbox and added the teststest_new_algorithm.py
:pytest tests/smooth/test_new_algorithm.py
to run the tests specific to your algorithmpytest tests/smooth
to run the whole tests for thesmooth
toolbox module
Make sure your code is properly commented and documented. We can not highlight enough how important documenting your algorithm is for the succes of this product.
Make sure the documentation renders properly. For details on how to build the documentation. Check our documntation guidelines (WIP). The official documentation will be built and deployed by our CI/CD workflows.
Add test to all new algorithms or improvements to algorithms. These test add robustness to our code base and ensure that future modifications comply with the desired behavior of the algorithm.
Run
black
to auto-format your code contributions. Our pre-commit will run black for the entire project once you are ready to commit and push to the remote branch. But this can take some time as our code base grows. Therefore, it is good practice to run periodically runblack
only for your new code.
black {source_file_or_directory}
This is not an exact list of requirements or guidelines. If you have suggestions, don’t hesitate to submit an issue or a PR with enhancement to this document.
Finally, once you have completed your new contribution, sync with the remote/master branch one last in case there have been any recent changes to the code base:
git checkout master
git pull
git checkout {my_branch_name}
git merge master
Then use git add
, git commit
, and git push
to record your new algorithm and send it to the remote
repository:
git add .
git commit -m "Explicit commit message"
git push
Go to the InDSL repository PR page, start
a New pull request
and let the review process begin.
Coding Style
- To ensure consistency throughout the code, we recommend using the following style conventions when contributing to the library:
Call the time series parameter of your function
data
unless a more specific name can be given, likepressure
ortemperature
.Use abbreviations when defining the types of function arguments. For example
pd.
instead ofpandas
.
Reviewer guidelines
Any InDSL function that is exposed in the CHARTS application (i.e. any function that is listed in the __init__.py files), must be reviewed by a member of the CHARTS development team.