Skip to content

Covariance matrix

covariance_matrix(data, columns=None, min_periods=None, delta_degrees_of_freedom=1)

Compute covariance matrix on the input data.

It is assumed that the data is numeric, i.e. integers or floats. NaN values are excluded from the calculations.

Parameters:

Name Type Description Default
data DataFrame

Dataframe containing the input data.

required
columns Optional[Sequence[str]]

Columns to include in the covariance matrix. If None, all numeric columns are used.

None
min_periods Optional[int]

Minimum number of observations required per pair of columns to have valid result. Optional.

None
delta_degrees_of_freedom int

Delta degrees of freedom used for computing covariance matrix. Defaults to 1.

1

Returns:

Type Description
DataFrame

Dataframe containing matrix representing the covariance between the corresponding pair of variables.

Raises:

Type Description
EmptyDataFrameException

The input Dataframe is empty.

InvalidParameterValueException

Provided value for delta_degrees_of_freedom or min_periods is negative.

NonNumericDataException

The input data contain non-numeric data.

Source code in eis_toolkit/exploratory_analyses/covariance_matrix.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
@beartype
def covariance_matrix(
    data: pd.DataFrame,
    columns: Optional[Sequence[str]] = None,
    min_periods: Optional[int] = None,
    delta_degrees_of_freedom: int = 1,
) -> pd.DataFrame:
    """Compute covariance matrix on the input data.

    It is assumed that the data is numeric, i.e. integers or floats. NaN values are excluded from the calculations.

    Args:
        data: Dataframe containing the input data.
        columns: Columns to include in the covariance matrix. If None, all numeric columns are used.
        min_periods: Minimum number of observations required per pair of columns to have valid result. Optional.
        delta_degrees_of_freedom: Delta degrees of freedom used for computing covariance matrix. Defaults to 1.

    Returns:
        Dataframe containing matrix representing the covariance between the corresponding pair of variables.

    Raises:
        EmptyDataFrameException: The input Dataframe is empty.
        InvalidParameterValueException: Provided value for delta_degrees_of_freedom or min_periods is negative.
        NonNumericDataException: The input data contain non-numeric data.
    """
    if check_empty_dataframe(data):
        raise EmptyDataFrameException("The input Dataframe is empty.")

    if columns:
        invalid_columns = [column for column in columns if column not in data.columns]
        if invalid_columns:
            raise InvalidParameterValueException(f"Invalid columns: {invalid_columns}")
        data_subset = data[columns]
    else:
        data_subset = data.select_dtypes(include=np.number)

    if not all(data_subset.dtypes.apply(lambda x: np.issubdtype(x, np.number))):
        raise NonNumericDataException("The input data contain non-numeric data.")

    if delta_degrees_of_freedom < 0:
        raise InvalidParameterValueException("Delta degrees of freedom must be non-negative.")

    if min_periods and min_periods < 0:
        raise InvalidParameterValueException("Min perioids must be non-negative.")

    matrix = data_subset.cov(min_periods=min_periods, ddof=delta_degrees_of_freedom)

    return matrix