{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## SLIM for market basket analysis\n", "In this example, we are going to train a SLIM model on a transactional database\n", "\n", "SLIM uses the [Minimum Description Length](https://en.wikipedia.org/wiki/Minimum_description_length) principle\n", "to make pattern mining easier, as the resulting patterns will be a **lossless compression of the original data**\n", "\n", "You end up having less data to consider, and your life just gets easier" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This tutorial was tested with the following version of skmine : 1.0.0\n" ] } ], "source": [ "import skmine\n", "\n", "print(\"This tutorial was tested with the following version of skmine :\", skmine.__version__)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from skmine.itemsets import SLIM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SLIM can be used to perform Market Basket Analysis\n", "\n", "Here we define a set of transactions containing items bought in a store" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['bananas', 'milk'],\n", " ['milk', 'bananas', 'cookies'],\n", " ['cookies', 'butter', 'tea'],\n", " ['tea'],\n", " ['milk', 'bananas', 'tea']]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D = [\n", " ['bananas', 'milk'],\n", " ['milk', 'bananas', 'cookies'],\n", " ['cookies', 'butter', 'tea'],\n", " ['tea'], \n", " ['milk', 'bananas', 'tea'],\n", "]\n", "D" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemsetusage
0[bananas, milk]3
1[tea]3
2[cookies]2
3[butter]1
\n", "
" ], "text/plain": [ " itemset usage\n", "0 [bananas, milk] 3\n", "1 [tea] 3\n", "2 [cookies] 2\n", "3 [butter] 1" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slim = SLIM(pruning=True)\n", "slim.fit_transform(D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----------\n", "What if a new user comes to the store and buy some items ?\n", "We add its shopping cart to the data, like so" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['bananas', 'milk'],\n", " ['milk', 'bananas', 'cookies'],\n", " ['cookies', 'butter', 'tea'],\n", " ['tea'],\n", " ['milk', 'bananas', 'tea'],\n", " ['jelly', 'bananas', 'cookies']]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D.append(['jelly', 'bananas', 'cookies'])\n", "D" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just retraining SLIM will give us a freshly updated summary of our market baskets" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
itemsetusage
0[bananas, milk]3
1[bananas, jelly]1
2[cookies]3
3[tea]3
4[butter]1
\n", "
" ], "text/plain": [ " itemset usage\n", "0 [bananas, milk] 3\n", "1 [bananas, jelly] 1\n", "2 [cookies] 3\n", "3 [tea] 3\n", "4 [butter] 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SLIM().fit_transform(D)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "165px" }, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }