Skip to content

Feature importance

evaluate_feature_importance(model, x_test, y_test, feature_names, n_repeats=50, random_state=None)

Evaluate the feature importance of a sklearn classifier or regressor.

Parameters:

Name Type Description Default
model BaseEstimator

A trained and fitted Sklearn model.

required
x_test ndarray

Testing feature data (X data need to be normalized / standardized).

required
y_test ndarray

Testing label data.

required
feature_names Sequence[str]

Names of the feature columns.

required
n_repeats int

Number of iteration used when calculate feature importance. Defaults to 50.

50
random_state Optional[int]

random state for repeatability of results. Optional parameter.

None

Returns:

Type Description
DataFrame

A dataframe containing features and their importance.

dict

A dictionary containing importance mean, importance std, and overall importance.

Raises:

Type Description
InvalidDatasetException

Either array is empty.

InvalidParameterValueException

Value for 'n_repeats' is not at least one.

Source code in eis_toolkit/exploratory_analyses/feature_importance.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
@beartype
def evaluate_feature_importance(
    model: sklearn.base.BaseEstimator,
    x_test: np.ndarray,
    y_test: np.ndarray,
    feature_names: Sequence[str],
    n_repeats: int = 50,
    random_state: Optional[int] = None,
) -> tuple[pd.DataFrame, dict]:
    """
    Evaluate the feature importance of a sklearn classifier or regressor.

    Args:
        model: A trained and fitted Sklearn model.
        x_test: Testing feature data (X data need to be normalized / standardized).
        y_test: Testing label data.
        feature_names: Names of the feature columns.
        n_repeats: Number of iteration used when calculate feature importance. Defaults to 50.
        random_state: random state for repeatability of results. Optional parameter.

    Returns:
        A dataframe containing features and their importance.
        A dictionary containing importance mean, importance std, and overall importance.

    Raises:
        InvalidDatasetException: Either array is empty.
        InvalidParameterValueException: Value for 'n_repeats' is not at least one.
    """

    if x_test.size == 0:
        raise InvalidDatasetException("Array 'x_test' is empty.")

    if y_test.size == 0:
        raise InvalidDatasetException("Array 'y_test' is empty.")

    if n_repeats < 1:
        raise InvalidParameterValueException("Value for 'n_repeats' is less than one.")

    result = permutation_importance(model, x_test, y_test.ravel(), n_repeats=n_repeats, random_state=random_state)

    feature_importance = pd.DataFrame({"Feature": feature_names, "Importance": result.importances_mean * 100})

    feature_importance = feature_importance.sort_values(by="Importance", ascending=False)

    return feature_importance, result