Class balancing
balance_SMOTETomek(X, y, sampling_strategy='auto', random_state=None)
Balances the classes of input dataset using SMOTETomek resampling method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[DataFrame, ndarray]
|
The feature matrix (input data as a DataFrame). |
required |
y |
Union[Series, ndarray]
|
The target labels corresponding to the feature matrix. |
required |
sampling_strategy |
Union[float, str, dict]
|
Parameter controlling how to perform the resampling. If float, specifies the ratio of samples in minority class to samples of majority class, if str, specifies classes to be resampled ("minority", "not minority", "not majority", "all", "auto"), if dict, the keys should be targeted classes and values the desired number of samples for the class. Defaults to "auto", which will resample all classes except the majority class. |
'auto'
|
random_state |
Optional[int]
|
Parameter controlling randomization of the algorithm. Can be given a seed (number). Defaults to None, which randomizes the seed. |
None
|
Returns:
Type | Description |
---|---|
tuple[Union[DataFrame, ndarray], Union[Series, ndarray]]
|
Resampled feature matrix and target labels. |
Raises:
Type | Description |
---|---|
NonMatchingParameterLengthsException
|
If X and y have different length. |
Source code in eis_toolkit/training_data_tools/class_balancing.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|