Class balancing

`balance_SMOTETomek(X, y, sampling_strategy='auto', random_state=None)`

Balances the classes of input dataset using SMOTETomek resampling method.

Parameters:

Name	Type	Description	Default
`X`	`Union[DataFrame, ndarray]`	The feature matrix (input data as a DataFrame).	required
`y`	`Union[Series, ndarray]`	The target labels corresponding to the feature matrix.	required
`sampling_strategy`	`Union[float, str, dict]`	Parameter controlling how to perform the resampling. If float, specifies the ratio of samples in minority class to samples of majority class, if str, specifies classes to be resampled ("minority", "not minority", "not majority", "all", "auto"), if dict, the keys should be targeted classes and values the desired number of samples for the class. Defaults to "auto", which will resample all classes except the majority class.	`'auto'`
`random_state`	`Optional[int]`	Parameter controlling randomization of the algorithm. Can be given a seed (number). Defaults to None, which randomizes the seed.	`None`

Returns:

Type	Description
`tuple[Union[DataFrame, ndarray], Union[Series, ndarray]]`	Resampled feature matrix and target labels.

Raises:

Type	Description
`NonMatchingParameterLengthsException`	If X and y have different length.

Source code in eis_toolkit/training_data_tools/class_balancing.py

@beartype
def balance_SMOTETomek(
    X: Union[pd.DataFrame, np.ndarray],
    y: Union[pd.Series, np.ndarray],
    sampling_strategy: Union[float, str, dict] = "auto",
    random_state: Optional[int] = None,
) -> tuple[Union[pd.DataFrame, np.ndarray], Union[pd.Series, np.ndarray]]:
    """Balances the classes of input dataset using SMOTETomek resampling method.

    Args:
        X: The feature matrix (input data as a DataFrame).
        y: The target labels corresponding to the feature matrix.
        sampling_strategy: Parameter controlling how to perform the resampling.
            If float, specifies the ratio of samples in minority class to samples of majority class,
            if str, specifies classes to be resampled ("minority", "not minority", "not majority", "all", "auto"),
            if dict, the keys should be targeted classes and values the desired number of samples for the class.
            Defaults to "auto", which will resample all classes except the majority class.
        random_state: Parameter controlling randomization of the algorithm. Can be given a seed (number).
            Defaults to None, which randomizes the seed.

    Returns:
        Resampled feature matrix and target labels.

    Raises:
        NonMatchingParameterLengthsException: If X and y have different length.
    """

    if len(X) != len(y):
        raise NonMatchingParameterLengthsException("Feature matrix X and target labels y must have the same length.")

    X_res, y_res = SMOTETomek(sampling_strategy=sampling_strategy, random_state=random_state).fit_resample(X, y)
    return X_res, y_res