Callback API demo

In this demo we are going to use the callback API to track some useful information while training a SLIM miner

We simply define custom python functions, and submit them to create an instance of skmine.callbacks.CallBacks.

[2]:
import skmine

print("This tutorial was tested with the following version of skmine :", skmine.__version__)
This tutorial was tested with the following version of skmine : 1.0.0
[2]:
import pandas as pd
import numpy as np

from skmine.callbacks import CallBacks
from skmine.itemsets import SLIM
from skmine.datasets.fimi import fetch_mushroom
[3]:
mushroom = fetch_mushroom()
mushroom.head()
[3]:
0    [1, 3, 9, 13, 23, 25, 34, 36, 38, 40, 52, 54, ...
1    [2, 3, 9, 14, 23, 26, 34, 36, 39, 40, 52, 55, ...
2    [2, 4, 9, 15, 23, 27, 34, 36, 39, 41, 52, 55, ...
3    [1, 3, 10, 15, 23, 25, 34, 36, 38, 41, 52, 54,...
4    [2, 3, 9, 16, 24, 28, 34, 37, 39, 40, 53, 54, ...
Name: mushroom, dtype: object
[4]:
# set max_time to a limit value so that the dataset compression is not too long
slim = SLIM(max_time=30)

Define your own callbacks

We define custom functions, that will take the result of the function they target as input. Those results will be ingested for later reuse

Here we define two methods:

  1. post_evaluate is executed after SLIM.evaluate. It tracks sizes for both the data and the model

  2. post_gen is executed after SLIM.generate_candidates, and just records the size of the current batch of candidates

[8]:
sizes = list()
candidate_sizes = list()

def post_evaluate(data_size, model_size, *args):
    sizes.append((data_size, model_size))

def post_gen(candidates):
    candidate_sizes.append(len(candidates))

A skmine.callbacks.CallBacks is a collection of callbacks.

It’s a mapping between function names and their dedicated callbacks. When an instance of skmine.callbacks.CallBacks is called (() operator) on an object, it looks for internal methods and tries to attach the callbacks.

[9]:
callbacks = CallBacks(evaluate=post_evaluate, generate_candidates=post_gen)

callbacks(slim)
warning : `f_name`='set_output' return an error for `callable(getattr(miner, f_name)`
[10]:
%time slim.fit_transform(mushroom)
CPU times: user 30.2 s, sys: 266 ms, total: 30.4 s
Wall time: 30.1 s
[10]:
itemset usage
0 [2, 23, 28, 34, 36, 39, 53, 56, 59, 63, 85, 86... 864
1 [2, 23, 28, 34, 36, 39, 53, 56, 59, 63, 85, 86... 864
2 [1, 24, 29, 34, 36, 39, 52, 56, 61, 66, 85, 86... 648
3 [1, 24, 29, 34, 36, 39, 52, 56, 61, 66, 85, 86... 648
4 [1, 24, 34, 36, 38, 48, 53, 58, 59, 63, 85, 86... 432
... ... ...
155 [22] 16
156 [75] 8
157 [89] 8
158 [8] 4
159 [12] 4

160 rows × 2 columns

Inner view of MDL learning

The plot below clearly shows how SLIM performs compression.

While the blue curve represents the size of the data, red vertical lines emphasize the end of a batch of candidates.

We can clearly dinstinguish the beginning of a batch of candidates, where the learning curve is quite abrupt, from the end of a batch, where it reaches a plateau.

[11]:
sizes
[11]:
[(1064327.8204536438, 2049.1517753601074),
 (1064327.8204536438, 2049.1517753601074),
 (1019045.3720026016, 2051.0723099708557),
 (1019045.3720026016, 2051.0723099708557),
 (980036.730214119, 2075.213014602661),
 (980036.730214119, 2075.213014602661),
 (948613.6116323471, 2102.6085658073425),
 (948613.6116323471, 2102.6085658073425),
 (927717.8376092911, 2118.527575492859),
 (927717.8376092911, 2118.527575492859),
 (907849.3784189224, 2138.470380783081),
 (907849.3784189224, 2138.470380783081),
 (889519.3478527069, 2157.9796571731567),
 (889519.3478527069, 2157.9796571731567),
 (871974.2353172302, 2192.7303881645203),
 (871974.2353172302, 2192.7303881645203),
 (854607.3957920074, 2208.895321369171),
 (854607.3957920074, 2208.895321369171),
 (838443.3827610016, 2225.1477360725403),
 (838443.3827610016, 2225.1477360725403),
 (823428.2101373672, 2264.49365234375),
 (823428.2101373672, 2264.49365234375),
 (808966.1101493835, 2287.104096889496),
 (808966.1101493835, 2287.104096889496),
 (792759.5926294327, 2293.201919078827),
 (792759.5926294327, 2293.201919078827),
 (780306.3465356827, 2344.104003429413),
 (780306.3465356827, 2344.104003429413),
 (768081.5152606964, 2392.268000602722),
 (768081.5152606964, 2392.268000602722),
 (756362.2376337051, 2409.8285970687866),
 (756362.2376337051, 2409.8285970687866),
 (745031.9830083847, 2421.723343372345),
 (745031.9830083847, 2421.723343372345),
 (734200.0674972534, 2446.078966140747),
 (734200.0674972534, 2446.078966140747),
 (723434.2704172134, 2456.3236446380615),
 (723434.2704172134, 2456.3236446380615),
 (712732.7913103104, 2461.307161808014),
 (712732.7913103104, 2461.307161808014),
 (702638.7939929962, 2530.0034675598145),
 (702638.7939929962, 2530.0034675598145),
 (692757.5342140198, 2553.6208057403564),
 (692757.5342140198, 2553.6208057403564),
 (683180.6969127655, 2565.5511989593506),
 (683180.6969127655, 2565.5511989593506),
 (673874.3940858841, 2619.1964559555054),
 (673874.3940858841, 2619.1964559555054),
 (664669.040602684, 2694.46364402771),
 (664669.040602684, 2694.46364402771),
 (655349.6174497604, 2703.4007120132446),
 (655349.6174497604, 2703.4007120132446),
 (645934.4946928024, 2729.645245552063),
 (645934.4946928024, 2729.645245552063),
 (636988.622294426, 2742.826548576355),
 (636988.622294426, 2742.826548576355),
 (628349.489894867, 2745.7087755203247),
 (628349.489894867, 2745.7087755203247),
 (619778.3561525345, 2791.6579875946045),
 (619778.3561525345, 2791.6579875946045),
 (612687.4520568848, 2765.7485609054565),
 (612687.4520568848, 2765.7485609054565),
 (604839.8522281647, 2794.495764732361),
 (604839.8522281647, 2794.495764732361),
 (597740.2815599442, 2801.642425060272),
 (597740.2815599442, 2801.642425060272),
 (591211.9014139175, 2805.2541251182556),
 (591211.9014139175, 2805.2541251182556),
 (585517.1406850815, 2851.7055792808533),
 (585517.1406850815, 2851.7055792808533),
 (579929.8582677841, 2877.48024892807),
 (579929.8582677841, 2877.48024892807),
 (574430.0130376816, 2916.439881801605),
 (574430.0130376816, 2916.439881801605),
 (569052.748840332, 2923.763541698456),
 (569052.748840332, 2923.763541698456),
 (563976.3026885986, 2941.1488165855408),
 (563976.3026885986, 2941.1488165855408),
 (558968.347108841, 2923.6829719543457),
 (558968.347108841, 2923.6829719543457),
 (554284.5226974487, 3014.9049005508423),
 (554284.5226974487, 3014.9049005508423),
 (549905.7714662552, 3019.4470205307007),
 (549905.7714662552, 3019.4470205307007),
 (544997.0604438782, 3024.4853563308716),
 (544997.0604438782, 3024.4853563308716),
 (540950.6344032288, 3050.9515557289124),
 (540950.6344032288, 3050.9515557289124),
 (537252.2906827927, 3066.4759969711304),
 (537252.2906827927, 3066.4759969711304),
 (532131.9468197823, 3083.5683188438416),
 (532131.9468197823, 3083.5683188438416),
 (527885.6507277489, 3100.84166765213),
 (527885.6507277489, 3100.84166765213),
 (522150.7439804077, 3105.057946205139),
 (522150.7439804077, 3105.057946205139),
 (516739.6784582138, 3125.0889949798584),
 (516739.6784582138, 3125.0889949798584),
 (511636.4079031944, 3152.4172925949097),
 (511636.4079031944, 3152.4172925949097),
 (508022.9669327736, 3199.241506099701),
 (508022.9669327736, 3199.241506099701),
 (504501.93383073807, 3269.2803320884705),
 (504501.93383073807, 3269.2803320884705),
 (500812.76487112045, 3274.8100786209106),
 (500812.76487112045, 3274.8100786209106),
 (497417.9516406059, 3365.587016582489),
 (497417.9516406059, 3365.587016582489),
 (492801.83930683136, 3369.3371634483337),
 (492801.83930683136, 3369.3371634483337),
 (489391.1609711647, 3467.619252681732),
 (489391.1609711647, 3467.619252681732),
 (485460.7992911339, 3472.116714000702),
 (485460.7992911339, 3472.116714000702),
 (482427.1781849861, 3477.5840849876404),
 (482427.1781849861, 3477.5840849876404),
 (479693.1807589531, 3497.0940837860107),
 (479693.1807589531, 3497.0940837860107),
 (476574.6060566902, 3516.7616815567017),
 (476574.6060566902, 3516.7616815567017),
 (473364.8910665512, 3536.486557483673),
 (473364.8910665512, 3536.486557483673),
 (470712.14954042435, 3547.9589014053345),
 (470712.14954042435, 3547.9589014053345),
 (468034.41232585907, 3630.8001165390015),
 (468034.41232585907, 3630.8001165390015),
 (465488.4886255264, 3650.3392448425293),
 (465488.4886255264, 3650.3392448425293),
 (462855.69601774216, 3748.0396962165833),
 (462855.69601774216, 3748.0396962165833),
 (460318.39687108994, 3846.327386379242),
 (460318.39687108994, 3846.327386379242),
 (457533.85771226883, 3850.8916816711426),
 (457533.85771226883, 3850.8916816711426),
 (455091.3696861267, 3860.352683544159),
 (455091.3696861267, 3860.352683544159),
 (451439.63231420517, 3890.251036167145),
 (451439.63231420517, 3890.251036167145),
 (449009.84144067764, 3881.3497910499573),
 (449009.84144067764, 3881.3497910499573),
 (446528.81369543076, 3898.210277557373),
 (446528.81369543076, 3898.210277557373),
 (444225.6063990593, 3904.679087638855),
 (444225.6063990593, 3904.679087638855),
 (441868.0961327553, 3924.2356181144714),
 (441868.0961327553, 3924.2356181144714),
 (439658.52277326584, 3959.026750564575),
 (439658.52277326584, 3959.026750564575),
 (437459.57193517685, 3981.3535590171814),
 (437459.57193517685, 3981.3535590171814),
 (435022.1490917206, 4005.7886300086975),
 (435022.1490917206, 4005.7886300086975),
 (432903.7034549713, 4031.1638164520264),
 (432903.7034549713, 4031.1638164520264),
 (430781.06565475464, 4047.012207508087),
 (430781.06565475464, 4047.012207508087),
 (428734.2130870819, 4052.7625794410706),
 (428734.2130870819, 4052.7625794410706),
 (426716.5517811775, 4061.1834139823914),
 (426716.5517811775, 4061.1834139823914),
 (424729.18721199036, 4069.7247982025146),
 (424729.18721199036, 4069.7247982025146),
 (422736.1689748764, 4099.728016376495),
 (422736.1689748764, 4099.728016376495),
 (420768.4673900604, 4107.899528026581),
 (420768.4673900604, 4107.899528026581),
 (418418.19539165497, 4113.049150466919),
 (418418.19539165497, 4113.049150466919),
 (416493.6862053871, 4118.254427909851),
 (416493.6862053871, 4118.254427909851),
 (414565.99742126465, 4124.003650665283),
 (414565.99742126465, 4124.003650665283),
 (412686.0111503601, 4114.948963165283),
 (412686.0111503601, 4114.948963165283),
 (410796.34900665283, 4121.841695308685),
 (410796.34900665283, 4121.841695308685),
 (408918.20883369446, 4112.796407222748),
 (408918.20883369446, 4112.796407222748),
 (407037.3795146942, 4129.733689785004),
 (407037.3795146942, 4129.733689785004),
 (405187.4784345627, 4138.643919467926),
 (405187.4784345627, 4138.643919467926),
 (403360.82039642334, 4147.217342376709),
 (403360.82039642334, 4147.217342376709),
 (401540.30622291565, 4152.1692061424255),
 (401540.30622291565, 4152.1692061424255),
 (399756.1656303406, 4169.316841125488),
 (399756.1656303406, 4169.316841125488),
 (398016.8108062744, 4188.394077777863),
 (398016.8108062744, 4188.394077777863),
 (396290.0697169304, 4233.978398323059),
 (396290.0697169304, 4233.978398323059),
 (394593.36030578613, 4247.3708510398865),
 (394593.36030578613, 4247.3708510398865)]
[12]:
df = pd.DataFrame(sizes, columns=['data_size', 'model_size'])
ax = df.data_size.plot()
../../_images/tutorials_callbacks_callback_API_demo_12_0.png

Here is how the model size goes up

[13]:
df.model_size.plot()
[13]:
<AxesSubplot:>
../../_images/tutorials_callbacks_callback_API_demo_14_1.png

And finally the evolution of the total size of our dataset after compression via SLIM following MDL

[14]:
df['total_size'] = df['model_size']+df['data_size']
df.total_size.plot()
[14]:
<AxesSubplot:>
../../_images/tutorials_callbacks_callback_API_demo_16_1.png

From the sizes saved in df, we can determine the compression ratio of our dataset after applying SLIM

[15]:
compression_percentage = df['total_size'].iloc[-1]/df['total_size'].iloc[0]*100
compression_percentage
[15]:
37.401476358134936
[ ]: