sihnpy.spatial_extent
Module Contents
Functions
|
Function estimating a 1- and a 2-cluster solution Gaussian Mixture Model. The Bayesian |
|
Quick function extracting and returning the average and SD values of the two components |
|
For all data kept after GMM estimation, this function computes the averages and SDs |
|
Function extracting the probability to be in the "second" component (high abnormal values). |
|
Histogram of the value DENSITIES with overlayed density function for each |
|
Generates a simple histogram of the values in a given region. Can plot both the |
|
Optional function plotting histograms from the raw data, with overlayed density functions |
|
Function deriving the actual thresholds based on the probabilities of belonging to the |
|
Exporting the histograms to file, if requested by user. Will export ALL |
|
Wrapper function exporting the final data used and the probability data to files. |
|
Function doing basic cleaning on the spatial extent and thresholds; just sorts |
|
Function applying the thresholds to the data, resulting in binary masks. The binary masks |
|
Create the spatial extent index, which is the sum of regions that are above the threshold. |
|
Another way to leverage the spatial extent is by creating individualized spatial extent |
|
Function to export the spatial extent metrics. |
|
Function to export the binary masks. |
|
Function to export the individualized spatial extent masks |
- sihnpy.spatial_extent.gmm_estimation(data_to_estimate, fix=False)[source]
Function estimating a 1- and a 2-cluster solution Gaussian Mixture Model. The Bayesian Information Criteria is output and compared between the two models.
- Parameters
data_to_estimate (pandas.DataFrame) – Data where each column needs to be fed to the GMM. Any column where a GMM should NOT be estimated should have been removed
fix (bool, optional) – Whether sihnpy should remove regions where 1-component fits better the data than a 2-component model (using smallest Bayesian Information Criteria), by default False
- Returns
Returns a dictionary with the GMM objects from scikit-learn and a pandas.DataFrame where columns were removed if fix is applied.
- Return type
dict, pandas.DataFrame
- sihnpy.spatial_extent._gmm_avg_sd(gm_obj)[source]
Quick function extracting and returning the average and SD values of the two components from the GMM estimation
- Parameters
gm_obj (sklearn.mixture.GaussianMixture) – Takes a GMM object as input
- Returns
Returns a dictionary for each GMM object, with the mean and SDs of each component.
- Return type
dict
- sihnpy.spatial_extent.gmm_measures(cleaned_data, gm_objects, fix=False)[source]
For all data kept after GMM estimation, this function computes the averages and SDs for both components.We then check that the order of the clusters is right and the measures are also used for the histograms in the spex.gmm_histograms function.
- Parameters
cleaned_data (pandas.DataFrame) – Dataframe output from spex.gmm_estimation.
gm_objects (dict) – Dictionary of the sklearn.mixture.GaussianMixture objects to extract measures from.
fix (bool, optional) – If the mean of component 2 is lower than the mean of component 1, it suggests that the components are inverted. If fix is True, we remove the region from further calculations, by default False
- Returns
Returns a Dataframe with clean data (if columns were removed by the fix), one dictionary with sklearn.mixture.GaussianMixture objects cleaned (if some estimations were removed) by fix and one dictionary with the averages/SDs of the two components, for regions kept.
- Return type
pandas.DataFrame, dict, dict
- sihnpy.spatial_extent.gmm_probs(final_data, final_gm_estimations, fix=False)[source]
Function extracting the probability to be in the “second” component (high abnormal values).
- Parameters
final_data (pandas.DataFrame) – Cleaned dataframe output by spex.gmm_measures
final_gm_estimations (dict) – Cleaned dictionary of sklearn.mixture.GaussianMixture objects output by spex.gmm_measures
fix (bool, optional) – If inverted distributions are not removed in spex.gmm_measures, they can be manually inverted here by setting to True, by default False
- Returns
Dataframe of the shape, index and columns from final_data. Contains probabilities of belonging to the “abnormal” distribution for each participant, for each region.
- Return type
pandas.DataFrame
- sihnpy.spatial_extent._gmm_density_histogram(regional_data, regional_gmm_measures, col, dist_2=True)[source]
Histogram of the value DENSITIES with overlayed density function for each GMM cluster.
Density is the count of each bin, divided by the total number of counts and the bin width. (Ref: Matplotlib documentation) This option is necessary to see the density curves.
- Parameters
regional_data (pandas.Series) – Single column from the final_data object representing the data in one region.
regional_gmm_measures (dict) – Dictionary containing the mean and SD of each component.
col (str) – String containing the name of the region. Used mostly for labels on the graphs.
dist_2 (bool, optional) – Whether we want to plot one or two density functions (True == two), by default True
- Returns
Returns matplotlib figure
- Return type
matplotlib.pyplot.figure
- sihnpy.spatial_extent._gmm_raw_histogram(regional_data, col)[source]
Generates a simple histogram of the values in a given region. Can plot both the probabilities and the raw values, as needed.
- Parameters
regional_data (pandas.Series) – Single column of data for a single region (data or probabilities)
col (str) – Name of the region of interest
- Returns
Returns matplotlib figure
- Return type
matplotlib.pyplot.figure
- sihnpy.spatial_extent.gmm_histograms(final_data, gmm_measures, probs_df, dist_2=True, type='density')[source]
Optional function plotting histograms from the raw data, with overlayed density functions for both clusters.
- Parameters
final_data (pandas.DataFrame) – Dataframe from spex.gmm_measures with final columns to plot.
gmm_measures (dict) – Nested dictionary containing the mean and SDs of each component, for each region.
probs_df (pandas.DataFrame) – Dataframe of the probabilities of belonging to the “abnormal” distribution, from the spex.gmm_probs function.
dist_2 (bool, optional) – Whether we want to plot one or two density functions (True == two) if we plot density, by default True
type (str, optional) – Type of histogram to plot (“density”, “raw”, “probs”, “all”), by default “density”.
- Returns
Returns a dictionary of matplotlib figures.
- Return type
dict
- sihnpy.spatial_extent.gmm_threshold_deriv(final_data, probs_df, prob_threshs, improb=None)[source]
Function deriving the actual thresholds based on the probabilities of belonging to the “abnormal” distribution.
Depending on the threshold value used, the probability of belonging to a given component can be inverted (e.g., the 50% probability threshold may have a higher value than the 90% threshold.). This usually happens when the second component is very spread out and overlaps with the first component. If that is the case, the use of the improb argument is recommended.
Also note that to give more flexibility to the user, sihnpy allows for a list of thresholds to be given to derive multiple thresholds. However, sihnpy doesn’t check whether the order of the thresholds make sense (e.g., that 50% comes before 90%) and assumes the user put them in the right order. It is up to the user to check this once the thresholds are derived.
- Parameters
final_data (pandas.DataFrame) – Final data derived from spex.gmm_measures.
probs_df (pandas.DataFrame) – Dataframe containing the probabilities of belonging to the “abnormal” distribution, from the spex.gmm_probs function.
prob_threshs (list of float) – List of thresholds to apply to the data. Thresholds have to range between 0 and 1.
improb (float, optional) – Value below which an “abnormal” value is improbable or impossible. Useful in the case that the GMM is very spread out, by default None
- Returns
Dataframe where rows are the regions and columns are the thresholds derived from the probabilities.
- Return type
pandas.DataFrame
- sihnpy.spatial_extent.export_histograms(hist_dict_fig, output_path, name)[source]
Exporting the histograms to file, if requested by user. Will export ALL histograms saved to the dictionary
- Parameters
hist_dict_fig (dict) – Dictionary of histogram figures from spex.gmm_histograms
output_path (str) – String of the path to where the output should go
name (str) – Name that should be tacked at the end of the file name, depending on the user’s conventions.
- sihnpy.spatial_extent.export_threshs(final_data, probs_data, thresh_df, output_path, name)[source]
Wrapper function exporting the final data used and the probability data to files.
- Parameters
final_data (pandas.DataFrame) – Final data derived from spex.gmm_measures.
probs_df (pandas.DataFrame) – Dataframe containing the probabilities of belonging to the “abnormal” distribution, from the spex.gmm_probs function.
thresh_df (pandas.DataFrame) – Dataframe containing the thresholds we just derived
output_path (str) – String of the path to where the output should go
name (str) – Name that should be tacked at the end of the file name, depending on the user’s conventions.
- sihnpy.spatial_extent.apply_clean(data_to_apply, thresh_data, index_name=None)[source]
Function doing basic cleaning on the spatial extent and thresholds; just sorts the rows and make sure they match between the thresholds and data to apply.
- Parameters
data_to_apply (pandas.DataFrame) – Data on which we want to apply thresholds. Columns should match rows of thresh_data.
thresh_data (pandas.DataFrame) – Thresholds to be applied to the data. Rows should match columns of data_to_apply.
index_name (str, optional) – String indicating the name of the column that should be considered as the pandas.DataFrame.Index. By default, assume it’s already set; by default None
- Returns
Returns pandas.DataFrame of the data, where the columns of the data shares the same order as the rows of the thresholds.
- Return type
pandas.DataFrame
- sihnpy.spatial_extent.apply_masks(data_to_apply_clean, thresh_data_clean)[source]
Function applying the thresholds to the data, resulting in binary masks. The binary masks have the same shape as the original data (rows are participants, columns are regions). The number of masks depends on the number of thresholds (columns) in thresh_data_clean.
- Parameters
data_to_apply_clean (pandas.DataFrame) – Data to which we want to apply the spatial extent, where columns are regions and rows are participants. From spex.apply_clean.
thresh_data_clean (pandas.DataFrame) – Dataframe containing the threshold data, where rows are regions and columns are thresholds. From spex.apply_clean
- Returns
Returns a dictionary of pandas.DataFrame`s, where each `DataFrame contains binary values for each region, for each participant.
- Return type
dict
- sihnpy.spatial_extent.apply_index(data_to_apply_clean, dict_masks)[source]
Create the spatial extent index, which is the sum of regions that are above the threshold. In the case where multiple thresholds are available we output the sum of each thresholds individually, as well as the total sum of all thresholds together.
- Parameters
data_to_apply_clean (pandas.DataFrame) – Original dataframe cleaned with spex.apply_clean. Only used to get the index to ensure the spatial extent is the same order.
dict_masks (dict) – Dictionary containing all the binary masks from spex.apply_masks
- Returns
Dataframe containing the spatial extent index for each threshold.
- Return type
pandas.DataFrame
- sihnpy.spatial_extent.apply_ind_mask(data_to_apply_clean, dict_masks)[source]
Another way to leverage the spatial extent is by creating individualized spatial extent masks. The idea is that simply add weights to the original data, based on the probability of being abnormal in a given region.
For instance, if a participant has a 90% probability of being positive, vs a 50% probability of being positive, we give more weight to the 90% probability value by multiplying it by a different constant.
- Parameters
data_to_apply_clean (pandas.DataFrame) – Original dataframe cleaned with spex.apply_clean.
dict_masks (dict) – Dictionary containing all the binary masks from spex.apply_masks
- Returns
Dictionary of individualized spatial extent masks.
- Return type
dict
- sihnpy.spatial_extent.export_spex_metrics(spex_metrics, output_path, name)[source]
Function to export the spatial extent metrics.
- Parameters
spex_metrics (pandas.DataFrame) – Dataframe containing the spatial extent indices.
output_path (str) – Path where the dataframe should be output.
name (str) – String that should be tacked at the end of the file name based on user convention.
- sihnpy.spatial_extent.export_spex_bin_masks(dict_masks, output_path, name)[source]
Function to export the binary masks.
- Parameters
dict_masks (dict) – Dictionary of binary masks where the thresholds were applied.
output_path (str) – Path where the dataframe should be output.
name (str) – String that should be tacked at the end of the file name based on user convention.
- sihnpy.spatial_extent.export_spex_ind_masks(spex_ind_masks, output_path, name)[source]
Function to export the individualized spatial extent masks
- Parameters
spex_ind_masks (dict) – Dictionary of individualized spatial extent masks
output_path (str) – Path where the dataframe should be output.
name (str) – String that should be tacked at the end of the file name based on user convention.