Linear time Closed item set Miner

LCM looks for closed itemset with respect to an input minimum support

[1]:
import skmine

print("This tutorial was tested with the following version of skmine :", skmine.__version__)
This tutorial was tested with the following version of skmine : 1.0.0

load the chess dataset

[2]:
from skmine.datasets.fimi import fetch_chess
chess = fetch_chess()
chess.head()
[2]:
0    [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25...
1    [1, 3, 5, 7, 9, 12, 13, 15, 17, 19, 21, 23, 25...
2    [1, 3, 5, 7, 9, 12, 13, 16, 17, 19, 21, 23, 25...
3    [1, 3, 5, 7, 9, 11, 13, 15, 17, 20, 21, 23, 25...
4    [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25...
Name: chess, dtype: object
[3]:
chess.shape
[3]:
(3196,)

fit_discover()

fit_discover makes pattern discovery more user friendly by outputting pretty formatted patterns, instead of the traditional tabular format used in the scikit community

[4]:
from skmine.itemsets import LCM
lcm = LCM(min_supp=2000, n_jobs=4)
# minimum support of 2000, running on 4 processes
%time patterns = lcm.fit_transform(chess)
CPU times: user 74.8 ms, sys: 30.1 ms, total: 105 ms
Wall time: 1.75 s
[5]:
patterns.shape
[5]:
(68967, 2)

This format in which patterns are rendered makes post hoc analysis easier

Here we filter patterns with a length strictly superior to 3

[6]:
patterns[patterns.itemset.map(len) > 3]
[6]:
itemset support
14 [29, 40, 52, 58] 3143
22 [29, 52, 58, 60] 3124
26 [40, 52, 58, 60] 3112
28 [29, 40, 58, 60] 3110
29 [29, 40, 52, 60] 3100
... ... ...
68960 [15, 52, 58, 60] 2003
68962 [15, 29, 58, 60] 2002
68964 [29, 40, 58, 70] 2006
68965 [29, 40, 52, 70] 2001
68966 [29, 40, 52, 58, 70] 2000

67716 rows × 2 columns

Note

Even when setting a very high minimum support threshold, we discovered more than 60K from only 3196 original transactions. This is a good illustration of the so-called pattern explosion problem


We could also get the top-k patterns in terms of supports, with a single line of code

[7]:
patterns.nlargest(10, columns=['support'])  # top 10 patterns
[7]:
itemset support
0 [58] 3195
1 [52] 3185
2 [52, 58] 3184
3 [29] 3181
4 [29, 58] 3180
5 [29, 52] 3170
7 [40] 3170
6 [29, 52, 58] 3169
8 [40, 58] 3169
9 [40, 52] 3159