Contributing¶
Contributions are welcome, and greatly appreciated ! You can contribute in many ways
Report Bugs¶
Report bugs at https://github.com/scikit-mine/scikit-mine/issues.
Please use the issue templates when submitting new issues.
Write Notebooks¶
scikit-mine could always include more showcase notebooks. We often concentrate on implementation details and lack of materials to show how useful our algorithms can be in real-life situations. Don’t hesitate to bring a little more story telling to scikit-mine !!
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub projects. Anything listed in the projects is a feature to be implemented.
You can also look through GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
For more details check the “Inclusion criteria” section below
Inclusion criteria¶
Scikit-mine is a library for descriptive analysis, and implements pattern mining algorithms. Only algorithms belonging to this family of algorithms will be accepted.
Now inclusion of new algorithms into scikit-mine follows a certain number of rules
- From most to least important:
200+ citations for the main algorithms.
The number of patterns used to describe a set of data should be low. For this we promote algorithms based on MDL .
A low number of parameters (usually one or two). This is to encourage reproducible experiments.
A technique that provides a clear improvement (enhanced datastructures, etc …) on a widely-used method will also be condidered for inclusion.
Development process¶
If you are a first-time contributor:
Go to https://github.com/scikit-mine/scikit-mine and click the “fork” button to create your own copy of the project.
Clone the project to your local computer:
git clone https://github.com/scikit-mine/scikit-mine
Change the directory:
cd scikit-mine
Add the official repository:
git remote add official https://github.com/scikit-mine/scikit-mine
Now, you have remote repositories named:
official
, which refers to thescikit-mine
repositoryorigin
, which refers to your personal fork
Setup up developer tools
create a local environment, using pip or conda
run pip install -r requirements.txt && pip install -r dev_requirements.txt
make sure test are passing by running make coverage
Develop your contribution
Pull the latest changes from official:
git checkout master git pull official master
Create a branch for the feature you want to work on. Since the branch name will appear in the merge message, use a sensible name such as ‘periodic-patterns-MDL-v0’:
git checkout -b periodic-patterns-MDL-V0
Don’t forget to update the documentation by editing .rst files inside docs. and running make docs and opening docs/_build/html/index.html with your favourite browser
Commit locally as you progress (
git add
andgit commit
) We trigger black automatically before any commit (see .pre-commit-config.yaml).
To submit your contribution:
Push your changes back to your fork on GitHub:: git push origin periodic-patterns-MDL-V0
Go to GitHub. The new branch will show up with a green Pull Request button - click it.
Explain your changes or to ask for review.
Test coverage¶
To measure the test coverage, install
pytest-cov
(using pip install pytest-cov
) and then run:
$ make coverage
This will print a report with one line for each file in skmine, detailing the test coverage:
Name Stmts Miss Branch BrPart Cover Missing
-----------------------------------------------------------------------------------------
skmine/__init__.py 4 0 0 0 100%
skmine/base.py 46 4 16 2 90% 154, 176, 202->207, 203->206, 206-207
Writing a benchmark¶
While not mandatory for most pull requests, we ask that performance related PRs include a benchmark in order to clearly depict the use-case that is being optimized for. This section mainly refers to the airpseed velocity documentation.
In this section we will review how to setup the benchmarks,
and three commands asv dev
, asv run
and asv continuous
.
You should have installed asv when running pip install -r dev_requirements.txt.
First of all you should run the command:
asv machine
To write benchmark, add a python file in the asv_bench
directory which contains a
a class with one setup
method and at least one method prefixed with time_
.
Note
In scikit-mine we use asv in a broad manner, i.e not only to mesure time and memory consumption. asv let us profile custom indicator, which we use for MDL-based methods to track compression ratios and make sure we don’t hurt the quality of our compression schemes from one development to another.
Take for example the slim
benchmark:
from skmine.itemsets import SLIM
from skmine.datasets import make_transactions
from skmine.preprocessing import TransactionEncoder
class SLIMBench:
params = ([20, 1000], [0.3, 0.7])
param_names = ["n_transactions", "density"]
# timeout = 20 # timeout for a single run, in seconds
repeat = (1, 3, 20.0)
processes = 1
def setup(self, n_transactions, density):
transactions = make_transactions(
n_transactions=n_transactions,
density=density,
random_state=7,
)
self.transactions = TransactionEncoder().fit_transform(transactions)
self.slim = SLIM()
def time_fit(self, *args):
self.slim.fit(self.transactions)
def track_data_size(self, *args):
return self.slim.data_size_
Testing the benchmarks locally¶
Prior to running the true benchmark, it is often worthwhile to test that the code is free of typos. To do so, you may use the command:
asv dev -b slim
Where the SLIM
above will be run once in your current environment
to test that everything is in order.
Comparing results to master¶
Often, the goal of a PR is to compare the results of the modifications in terms
speed to a snapshot of the code that is in the master branch of the
scikit-mine
repository. The command asv continuous
is of help here:
$asv continuous master -b slim
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6-pandas1.0.3
·· Building f353431e <v0.0.2> for conda-py3.6-pandas1.0.3.
·· Installing f353431e <v0.0.2> into conda-py3.6-pandas1.0.3
· Running 6 total benchmarks (2 commits * 1 environments * 3 benchmarks)
[ 0.00%] · For scikit-mine commit f353431e <v0.0.2> (round 1/1):
[ 0.00%] ·· Benchmarking conda-py3.6-pandas1.0.3
[ 16.67%] ··· slim.SLIMBench.time_fit ok
[ 16.67%] ··· ================ ============= ============ ============= ============
-- density / pruning
---------------- -----------------------------------------------------
n_transactions 0.4 / False 0.4 / True 0.6 / False 0.6 / True
================ ============= ============ ============= ============
20 329±0.03ms 619±3ms 371±2ms 1.21±0s
================ ============= ============ ============= ============