Logistic regression
logistic_regression_train(X, y, validation_method='split', metrics=['accuracy'], split_size=0.2, cv_folds=5, penalty='l2', max_iter=100, solver='lbfgs', verbose=0, random_state=None, **kwargs)
Train and optionally validate a Logistic Regression classifier model using Sklearn.
Various options and configurations for model performance evaluation are available. No validation, split to train and validation parts, and cross-validation can be chosen. If validation is performed, metric(s) to calculate can be defined and validation process configured (cross-validation method, number of folds, size of the split). Depending on the details of the validation process, the output metrics dictionary can be empty, one-dimensional or nested.
The choice of the algorithm depends on the penalty chosen. Supported penalties by solver: 'lbfgs' - ['l2', None] 'liblinear' - ['l1', 'l2'] 'newton-cg' - ['l2', None] 'newton-cholesky' - ['l2', None] 'sag' - ['l2', None] 'saga' - ['elasticnet', 'l1', 'l2', None]
For more information about Sklearn Logistic Regression, read the documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[ndarray, DataFrame]
|
Training data. |
required |
y |
Union[ndarray, Series]
|
Target labels. |
required |
validation_method |
Literal[split, kfold_cv, skfold_cv, loo_cv, none]
|
Validation method to use. "split" divides data into two parts, "kfold_cv" performs k-fold cross-validation, "skfold_cv" performs stratified k-fold cross-validation, "loo_cv" performs leave-one-out cross-validation and "none" will not validate model at all (in this case, all X and y will be used solely for training). |
'split'
|
metrics |
Sequence[Literal[accuracy, precision, recall, f1, auc]]
|
Metrics to use for scoring the model. Defaults to "accuracy". |
['accuracy']
|
split_size |
float
|
Fraction of the dataset to be used as validation data (rest is used for training). Used only when validation_method is "split". Defaults to 0.2. |
0.2
|
cv_folds |
int
|
Number of folds used in cross-validation. Used only when validation_method is "kfold_cv" or "skfold_cv". Defaults to 5. |
5
|
penalty |
Literal[l1, l2, elasicnet, None]
|
Specifies the norm of the penalty. Defaults to 'l2'. |
'l2'
|
max_iter |
int
|
Maximum number of iterations taken for the solvers to converge. Defaults to 100. |
100
|
solver |
Literal[lbfgs, liblinear, newton - cg, newton - cholesky, sag, saga]
|
Algorithm to use in the optimization problem. Defaults to 'lbfgs'. |
'lbfgs'
|
verbose |
int
|
Specifies if modeling progress and performance should be printed. 0 doesn't print, values 1 or above will produce prints. |
0
|
random_state |
Optional[int]
|
Seed for random number generation. Defaults to None. |
None
|
**kwargs |
Additional parameters for Sklearn's LogisticRegression. |
{}
|
Returns:
Type | Description |
---|---|
Tuple[LogisticRegression, dict]
|
The trained Logistric Regression classifier and metric scores as a dictionary. |
Raises:
Type | Description |
---|---|
InvalidParameterValueException
|
If some of the numeric parameters are given invalid input values. |
NonMatchingParameterLengthsException
|
X and y have mismatching sizes. |
Source code in eis_toolkit/prediction/logistic_regression.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|