Periodic

Models and functions to mine periodic patterns

PeriodicPatternMiner

class skmine.periodic.cycles.PeriodicPatternMiner(complex=True, auto_time_scale=True)[source]

Mining periodic cycles with a MDL Criterion

PeriodicPatternMiner is an approach to mine periodic cycles from event logs while relying on a Minimum Description Length (MDL) criterion to evaluate candidate cycles. The goal here is to extract a set of cycles that characterizes the periodic structure present in the data

A cycle is defined a 5-tuple of the form
\[\alpha, r, p, \tau, E\]

Where

  • \(\alpha\) is the repeating event

  • \(r\) is the number of repetitions of the event, called the cycle length

  • \(p\) is the inter-occurrence distance, called the cycle period

  • \(\tau\) is the index of the first occurrence, called the cycle starting point

  • \(E\) is a list of \(r - 1\) signed integer offsets, i.e cycle shift corrections

Parameters
  • complex (boolean) – True : compute complex pattern with horizontal and vertical combinations. False: compute only simple cycles.

  • auto_time_scale (boolean) – True : preprocessing on time data index in nano-second. Compute automatically the timescale for mining cycles by removing extra zeros on time index and possibly change unit from second to upper ones False: no preprocessing on time data index in nano-second

Examples

>>> from skmine.periodic import PeriodicPatternMiner
>>> import pandas as pd
>>> S = pd.Series("ring_a_bell", [10, 20, 32, 40, 60, 79, 100, 240])
>>> pcm = PeriodicPatternMiner().fit(S)
>>> pcm.transform(S)
   t0                  pattern  repetition_major  period_major  sum_E
0  20  (ring_a_bell)[r=5 p=20]                 5            20      2

References

1

Galbrun, E & Cellier, P & Tatti, N & Termier, A & Crémilleux, B “Mining Periodic Pattern with a MDL Criterion”

fit(S, y=None)[source]

fit PeriodicPatternMiner on data logs

This generates new candidate cycles and evaluate them. Residual occurrences are stored as an internal attribute, for later reconstruction (MDL is lossless)

Parameters

S (pd.Series) – logs, represented as a pandas Series This pandas Series must have an index of type in (pd.DatetimeIndex, pd.RangeIndex, pd.Int64Index)

transform(S, dE_sum=True, chronological_order=True)[source]

Return cycles as a pandas DataFrame, with 3 columns, with a 2-level multi-index: the first level mapping events, and the second level being positional

Parameters
  • dE_sum (boolean) – True : returm a columns “dE” with the sum of the errors False: returm a columns “dE” with the full list of errors.

  • chronological_order (boolean, default=True) – To sort or not the occurences by ascending date

Returns

DataFrame with the following columns

start

when the cycle starts

length

number of occurrences in the event

period

inter-occurrence delay

sum_E

absolute sum of errors

E

shift corrections (if dE_sum=False)

cost

MDL cost

Return type

pd.DataFrame

Examples

>>> from skmine.periodic import PeriodicPatternMiner
>>> import pandas as pd
>>> S = pd.Series("ring_a_bell", [10, 20, 32, 40, 60, 79, 100, 240])
>>> pcm = PeriodicPatternMiner().fit(S)
>>> pcm.transform(S)
   t0                  pattern  repetition_major  period_major  sum_E
0  20  (ring_a_bell)[r=5 p=20]                 5            20      2
export_patterns(file='patterns.json')[source]

Export pattern into a json file

Parameters

file (string) – name of the json file

import_patterns(file='patterns.json')[source]

Import pattern into a json file

Parameters

file (string) – name of the json file

reconstruct(*patterns_id, sort='time', drop_duplicates=None)[source]

Reconstruct all the occurrences from patterns (no argument), or the occurrences of selected patterns (with a patterns’id list as argument).

Parameters
  • patterns_id (None or List) – None (when reconstruct() is called) : Reconstruct all occurrences of the patterns List : of pattern id : Reconstruct occurrences of the patterns ids

  • sort (string) – “time” (by default) : sort by occurrences time “event” : sort by event names “construction_order” : sort by pattern reconstruction

  • drop_duplicates (bool, default=True) – An occurrence can appear in several patterns and thus appear several times in the reconstruction. To remove duplicates, set drop_duplicates to True otherwise to False to keep them. In the natural order of pattern construction, it is best to set the drop_duplicates variable to False for better understanding.

Returns

The reconstructed dataset

Return type

pd.DataFrame

get_residuals(*patterns_id, sort='time')[source]

Get all residual occurrences, i.e. events not covered by any pattern (no argument) or get the complementary occurrences of the selected patterns (with a patterns’id list as argument).

Parameters
  • patterns_id (None or list) – None (when reconstruct() is called) : complementary of all patterns occurrences List of pattern id : complementary of patterns ids occurrences

  • sort (string) – “time” (by default) : sort by occurrences time “event” : sort by event names anything else : sort by pattern reconstruction

Returns

residual events

Return type

pd.DataFrame

draw_pattern(pattern_id, directory=None)[source]

Visually display a pattern based on its id from the transform command.

Parameters
  • pattern_id (int) – The ID of the pattern to be displayed. This ID is to be retrieved directly from the transform command.

  • directory (str, default=None) – Directory where the generated image and the DOT file are stored

Returns

The generated tree. To see it in a python script, you have to add .view()

Return type

Digraph