Logistic regression
          logistic_regression_train(X, y, validation_method='split', metrics=['accuracy'], split_size=0.2, cv_folds=5, penalty='l2', max_iter=100, solver='lbfgs', verbose=0, random_state=None, **kwargs)
  Train and optionally validate a Logistic Regression classifier model using Sklearn.
Various options and configurations for model performance evaluation are available. No validation, split to train and validation parts, and cross-validation can be chosen. If validation is performed, metric(s) to calculate can be defined and validation process configured (cross-validation method, number of folds, size of the split). Depending on the details of the validation process, the output metrics dictionary can be empty, one-dimensional or nested.
The choice of the algorithm depends on the penalty chosen. Supported penalties by solver: 'lbfgs' - ['l2', None] 'liblinear' - ['l1', 'l2'] 'newton-cg' - ['l2', None] 'newton-cholesky' - ['l2', None] 'sag' - ['l2', None] 'saga' - ['elasticnet', 'l1', 'l2', None]
For more information about Sklearn Logistic Regression, read the documentation here: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| X | Union[ndarray, DataFrame] | Training data. | required | 
| y | Union[ndarray, Series] | Target labels. | required | 
| validation_method | Literal[split, kfold_cv, skfold_cv, loo_cv, none] | Validation method to use. "split" divides data into two parts, "kfold_cv" performs k-fold cross-validation, "skfold_cv" performs stratified k-fold cross-validation, "loo_cv" performs leave-one-out cross-validation and "none" will not validate model at all (in this case, all X and y will be used solely for training). | 'split' | 
| metrics | Sequence[Literal[accuracy, precision, recall, f1, auc]] | Metrics to use for scoring the model. Defaults to "accuracy". | ['accuracy'] | 
| split_size | float | Fraction of the dataset to be used as validation data (rest is used for training). Used only when validation_method is "split". Defaults to 0.2. | 0.2 | 
| cv_folds | int | Number of folds used in cross-validation. Used only when validation_method is "kfold_cv" or "skfold_cv". Defaults to 5. | 5 | 
| penalty | Literal[l1, l2, elasicnet, None] | Specifies the norm of the penalty. Defaults to 'l2'. | 'l2' | 
| max_iter | int | Maximum number of iterations taken for the solvers to converge. Defaults to 100. | 100 | 
| solver | Literal[lbfgs, liblinear, newton - cg, newton - cholesky, sag, saga] | Algorithm to use in the optimization problem. Defaults to 'lbfgs'. | 'lbfgs' | 
| verbose | int | Specifies if modeling progress and performance should be printed. 0 doesn't print, values 1 or above will produce prints. | 0 | 
| random_state | Optional[int] | Seed for random number generation. Defaults to None. | None | 
| **kwargs | Additional parameters for Sklearn's LogisticRegression. | {} | 
Returns:
| Type | Description | 
|---|---|
| Tuple[LogisticRegression, dict] | The trained Logistric Regression classifier and metric scores as a dictionary. | 
Raises:
| Type | Description | 
|---|---|
| InvalidParameterValueException | If some of the numeric parameters are given invalid input values. | 
| NonMatchingParameterLengthsException | X and y have mismatching sizes. | 
Source code in eis_toolkit/prediction/logistic_regression.py
            | 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |  |