SLIM for market basket analysisΒΆ

In this example, we are going to train a SLIM model on a transactional database

SLIM uses the Minimum Description Length principle to make pattern mining easier, as the resulting patterns will be a lossless compression of the original data

You end up having less data to consider, and your life just gets easier

[2]:
import skmine

print("This tutorial was tested with the following version of skmine :", skmine.__version__)
This tutorial was tested with the following version of skmine : 1.0.0
[2]:
import pandas as pd
from skmine.itemsets import SLIM

SLIM can be used to perform Market Basket Analysis

Here we define a set of transactions containing items bought in a store

[3]:
D = [
    ['bananas', 'milk'],
    ['milk', 'bananas', 'cookies'],
    ['cookies', 'butter', 'tea'],
    ['tea'],
    ['milk', 'bananas', 'tea'],
]
D
[3]:
[['bananas', 'milk'],
 ['milk', 'bananas', 'cookies'],
 ['cookies', 'butter', 'tea'],
 ['tea'],
 ['milk', 'bananas', 'tea']]
[4]:
slim = SLIM(pruning=True)
slim.fit_transform(D)
[4]:
itemset usage
0 [bananas, milk] 3
1 [tea] 3
2 [cookies] 2
3 [butter] 1

What if a new user comes to the store and buy some items ? We add its shopping cart to the data, like so

[5]:
D.append(['jelly', 'bananas', 'cookies'])
D
[5]:
[['bananas', 'milk'],
 ['milk', 'bananas', 'cookies'],
 ['cookies', 'butter', 'tea'],
 ['tea'],
 ['milk', 'bananas', 'tea'],
 ['jelly', 'bananas', 'cookies']]

Just retraining SLIM will give us a freshly updated summary of our market baskets

[6]:
SLIM().fit_transform(D)
[6]:
itemset usage
0 [bananas, milk] 3
1 [bananas, jelly] 1
2 [cookies] 3
3 [tea] 3
4 [butter] 1