Emerging¶
Module dedicated to emerging patterns
The EPM problem was defined by Dong & Li as the task of finding patterns whose support increases significantly from one dataset to another
MBD-LLBorder¶
-
class
skmine.emerging.
MBDLLBorder
(min_growth_rate=2, min_supp=0.1, n_jobs=1)[source]¶ MBD-LLBorder aims at discovering patterns characterizing the difference between two collections of data.
It first discovers
two sets of maximal itemsets
, one for each collection. It then looks for borders of these sets, and characterizes the difference by only manipulating theses borders.This results in the algorithm only keeping a
concise description (borders)
as an internal state.Last but not least, it enumerates the set of emerging patterns from the borders.
- Parameters
min_growth_rate (int or float, default=2) – A pattern is considered as emerging iff its support in the first collection is at least min_growth_rate times its support in the second collection.
min_supp (int or float, default=.1) – Minimum support in each of the collection Must be a relative support between 0 and 1 Default to 0.1 (10%)
n_jobs (int, default=1) – The number of jobs to use for the computation.
- Variables
borders_ (list of tuple[list[set], set]) – List of pairs representing left and right borders For every pair, emerging patterns will be uncovered by enumerating itemsets from the right side, while checking for non membership in the left side
References
- .. [1]
Guozhu Dong, Jinyan Li “Efficient Mining of Emerging Patterns : Discovering Trends and Differences”, 1999
-
fit
(D, y)[source]¶ fit MBD-LLBorder on D, splitted on y
- This is done in two steps
Discover maximal itemsets for the two disinct collections contained in D
Mine borders from these maximal itemsets
- Parameters
D (pd.Series) – The input transactional database Where every entry contains singular items Items must be both hashable and comparable
y (array-like of shape (n_samples,)) – Targets on which to split D Must contain only two disctinct values, i.e len(np.unique(y)) == 2
-
discover
(min_size=3)[source]¶ Enumerate emerging patterns from borders Subsets are drawn from the right borders, and are accepted iff they do not belong to the corresponding left border.
This implementation is parallel, we consider every couple of right/left border in a separate worker and run the computation
- Parameters
min_size (int) – minimum size for an itemset to be valid