Periodic¶
Models and functions to mine periodic patterns
PeriodicPatternMiner¶
-
class
skmine.periodic.cycles.
PeriodicPatternMiner
(complex=True, auto_time_scale=True)[source]¶ Mining periodic cycles with a MDL Criterion
PeriodicPatternMiner is an approach to mine periodic cycles from event logs while relying on a Minimum Description Length (MDL) criterion to evaluate candidate cycles. The goal here is to extract a set of cycles that characterizes the periodic structure present in the data
- A cycle is defined a 5-tuple of the form
- \[\alpha, r, p, \tau, E\]
Where
\(\alpha\) is the repeating event
\(r\) is the number of repetitions of the event, called the cycle length
\(p\) is the inter-occurrence distance, called the cycle period
\(\tau\) is the index of the first occurrence, called the cycle starting point
\(E\) is a list of \(r - 1\) signed integer offsets, i.e cycle shift corrections
- Parameters
complex (boolean) – True : compute complex pattern with horizontal and vertical combinations. False: compute only simple cycles.
auto_time_scale (boolean) – True : preprocessing on time data index in nano-second. Compute automatically the timescale for mining cycles by removing extra zeros on time index and possibly change unit from second to upper ones False: no preprocessing on time data index in nano-second
Examples
>>> from skmine.periodic import PeriodicPatternMiner >>> import pandas as pd >>> S = pd.Series("ring_a_bell", [10, 20, 32, 40, 60, 79, 100, 240]) >>> pcm = PeriodicPatternMiner().fit(S) >>> pcm.transform(S) t0 pattern repetition_major period_major sum_E 0 20 (ring_a_bell)[r=5 p=20] 5 20 2
References
- 1
Galbrun, E & Cellier, P & Tatti, N & Termier, A & Crémilleux, B “Mining Periodic Pattern with a MDL Criterion”
-
fit
(S, y=None)[source]¶ fit PeriodicPatternMiner on data logs
This generates new candidate cycles and evaluate them. Residual occurrences are stored as an internal attribute, for later reconstruction (MDL is lossless)
- Parameters
S (pd.Series) – logs, represented as a pandas Series This pandas Series must have an index of type in (pd.DatetimeIndex, pd.RangeIndex, pd.Int64Index)
-
transform
(S, dE_sum=True, chronological_order=True)[source]¶ Return cycles as a pandas DataFrame, with 3 columns, with a 2-level multi-index: the first level mapping events, and the second level being positional
- Parameters
dE_sum (boolean) – True : returm a columns “dE” with the sum of the errors False: returm a columns “dE” with the full list of errors.
chronological_order (boolean, default=True) – To sort or not the occurences by ascending date
- Returns
- DataFrame with the following columns
start
when the cycle starts
length
number of occurrences in the event
period
inter-occurrence delay
sum_E
absolute sum of errors
E
shift corrections (if dE_sum=False)
cost
MDL cost
- Return type
pd.DataFrame
Examples
>>> from skmine.periodic import PeriodicPatternMiner >>> import pandas as pd >>> S = pd.Series("ring_a_bell", [10, 20, 32, 40, 60, 79, 100, 240]) >>> pcm = PeriodicPatternMiner().fit(S) >>> pcm.transform(S) t0 pattern repetition_major period_major sum_E 0 20 (ring_a_bell)[r=5 p=20] 5 20 2
-
export_patterns
(file='patterns.json')[source]¶ Export pattern into a json file
- Parameters
file (string) – name of the json file
-
import_patterns
(file='patterns.json')[source]¶ Import pattern into a json file
- Parameters
file (string) – name of the json file
-
reconstruct
(*patterns_id, sort='time', drop_duplicates=None)[source]¶ Reconstruct all the occurrences from patterns (no argument), or the occurrences of selected patterns (with a patterns’id list as argument).
- Parameters
patterns_id (None or List) – None (when reconstruct() is called) : Reconstruct all occurrences of the patterns List : of pattern id : Reconstruct occurrences of the patterns ids
sort (string) – “time” (by default) : sort by occurrences time “event” : sort by event names “construction_order” : sort by pattern reconstruction
drop_duplicates (bool, default=True) – An occurrence can appear in several patterns and thus appear several times in the reconstruction. To remove duplicates, set drop_duplicates to True otherwise to False to keep them. In the natural order of pattern construction, it is best to set the drop_duplicates variable to False for better understanding.
- Returns
The reconstructed dataset
- Return type
pd.DataFrame
-
get_residuals
(*patterns_id, sort='time')[source]¶ Get all residual occurrences, i.e. events not covered by any pattern (no argument) or get the complementary occurrences of the selected patterns (with a patterns’id list as argument).
- Parameters
patterns_id (None or list) – None (when reconstruct() is called) : complementary of all patterns occurrences List of pattern id : complementary of patterns ids occurrences
sort (string) – “time” (by default) : sort by occurrences time “event” : sort by event names anything else : sort by pattern reconstruction
- Returns
residual events
- Return type
pd.DataFrame
-
draw_pattern
(pattern_id, directory=None)[source]¶ Visually display a pattern based on its id from the transform command.
- Parameters
pattern_id (int) – The ID of the pattern to be displayed. This ID is to be retrieved directly from the transform command.
directory (str, default=None) – Directory where the generated image and the DOT file are stored
- Returns
The generated tree. To see it in a python script, you have to add .view()
- Return type
Digraph