Preprocessing¶
MDLPDiscretizer¶
-
class
skmine.preprocessing.
MDLPDiscretizer
(random_state=None, n_jobs=1)[source]¶ Implementation of “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning”.
Given class labels
y
, MDLPDIscretizer discretizes continuous variables fromX
by minimizing the entropy in each interval.- Parameters
random_state (int, RandomState instance, default=None) – random state to use to shuffle the data. Can affect the outcome, leading to slightly different cut points if a variable contains samples with the same value but different labels.
- Variables
cut_points_ (dict) – A mapping between columns and their respective cut points. If fitted on a pandas DataFrame, keys will be the DataFrame column names.
References
Usama M. Fayyad, Keki B. Irani “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning”, 1993
Examples
>>> from skmine.preprocessing import MDLPDiscretizer >>> from sklearn.datasets import load_iris >>> iris = load_iris() >>> X, y = iris.data, iris.target >>> disc = MDLPDiscretizer() >>> disc.fit(X, y) >>> disc.cut_points_ {0: array([5.5, 6.2]), 1: array([2.9, 3.3]), 2: array([2.45, 4.9 ]), 3: array([0.8, 1.7])}
-
fit
(X, y)[source]¶ fit the MLDP discretizer on an input matrix
X
, given a label vectory
.- Parameters
X (np.ndarray or pd.DataFrame of shape (n_samples, n_features)) – The input matrix containing features. A set of cut points will be affected to each feature
y (np.ndarray of pd.Series of shape(n_samples,)) – The label vector used to discretize
X