SLIM for market basket analysis¶

In this example, we are going to train a SLIM model on a transactional database

SLIM uses the Minimum Description Length principle to make pattern mining easier, as the resulting patterns will be a lossless compression of the original data

You end up having less data to consider, and your life just gets easier

[2]:

import skmine

print("This tutorial was tested with the following version of skmine :", skmine.__version__)

This tutorial was tested with the following version of skmine : 1.0.0

[2]:

import pandas as pd
from skmine.itemsets import SLIM

SLIM can be used to perform Market Basket Analysis

Here we define a set of transactions containing items bought in a store

[3]:

D = [
    ['bananas', 'milk'],
    ['milk', 'bananas', 'cookies'],
    ['cookies', 'butter', 'tea'],
    ['tea'],
    ['milk', 'bananas', 'tea'],
]
D

[3]:

[['bananas', 'milk'],
 ['milk', 'bananas', 'cookies'],
 ['cookies', 'butter', 'tea'],
 ['tea'],
 ['milk', 'bananas', 'tea']]

[4]:

slim = SLIM(pruning=True)
slim.fit_transform(D)

[4]:

	itemset	usage
0	[bananas, milk]	3
1	[tea]	3
2	[cookies]	2
3	[butter]	1

What if a new user comes to the store and buy some items ? We add its shopping cart to the data, like so

[5]:

D.append(['jelly', 'bananas', 'cookies'])
D

[5]:

[['bananas', 'milk'],
 ['milk', 'bananas', 'cookies'],
 ['cookies', 'butter', 'tea'],
 ['tea'],
 ['milk', 'bananas', 'tea'],
 ['jelly', 'bananas', 'cookies']]

Just retraining SLIM will give us a freshly updated summary of our market baskets

[6]:

SLIM().fit_transform(D)

[6]:

	itemset	usage
0	[bananas, milk]	3
1	[bananas, jelly]	1
2	[cookies]	3
3	[tea]	3
4	[butter]	1