sihnpy.fingerprinting
Fingerprinting module
See SIHNPY documentation for more information on the functions of the script.
Module Contents
Classes
Class object used to store information for the fingerprinting and to output |
Functions
|
Function importing the list of IDs to analyze. We assume that the list of IDs are stored |
|
Internal function slicing matrices and returning "flattened" vectors. |
|
Internal function normalizing (if necessary) the arrays before fingeprinting. If normalizing, we change the cells that are calculated as Infinity to be missing. |
|
Function importing the data used for fingerprinting. This function assumes two important |
|
Main function computing fingerprinting for tabular data. It assumes that the variables to |
|
Function computing the different fingerprint metrics and stores them in a dataframe |
|
Simple wrapper function exporting the data for both visits of participants fingerprinted, |
|
Internal function computing the fingerprint identification accuracy, |
|
Internal function computing the self-identifiability (within-individual correlation). |
|
Internal function computing the others-identifiability (between-individual correlation). |
|
Internal function computing the differential identifiability metric |
- sihnpy.fingerprinting.import_fingerprint_ids(id_list)[source]
Function importing the list of IDs to analyze. We assume that the list of IDs are stored in either a .csv or .tsv file, or a text file with 1 ID per line.
- Parameters
id_list (str) – Path on the local computer to the file where the IDs are stored.
- Returns
Returns a list where each element is a participant ID.
- Return type
list
- sihnpy.fingerprinting._slice_matrix(matrix_file, nodes_index_within, nodes_index_between=None)[source]
Internal function slicing matrices and returning “flattened” vectors.
- Parameters
matrix_file (numpy.array) – Array for a given participant comprising all the functional connectivity nodes.
nodes_index_between (list) – List of nodes to include in the fingerprinting calculation. If not interested in looking at between-network, the function defaults to calculating within-network. Defaults to None.
nodes_index_within (list) – List of nodes to include in the fingerprinting calculation.
- Returns
Returns a flattened array of the functional connectivity data.
- Return type
numpy.array
- sihnpy.fingerprinting._norm_data(array_to_norm, norm=True)[source]
Internal function normalizing (if necessary) the arrays before fingeprinting. If normalizing, we change the cells that are calculated as Infinity to be missing.
- Parameters
array_to_norm (numpy.array) – Raw sliced array to normalize.
norm (bool, optional) – Whether or not the normalization should be applied, by default True
- Returns
Array with the chosen normalization applied to. Returns a copy of the array if no normalization is applied.
- Return type
numpy.array
- class sihnpy.fingerprinting.FingerprintMats(id_ls, path_m1, path_m2)[source]
Class object used to store information for the fingerprinting and to output the results of the fingerprinting analysis. This object is to be used when the input data is folders with 1 matrix per subject.
- fetch_matrix_file_names()[source]
Simple function importing the matrices as input for the fingerprinting computation. Does not require any argument (will use the path variables from the FingerprintMats objects).
- Raises
OSError – Checks whether the path exists and is able to import the file.
- subject_selection(files_m1, files_m2, verbose=True)[source]
Select participant files that are present in both modalities (i.e., intersection). The function assumes that the ID in the ID list will match in some way the file name in the folder (e.g., ID 6745 would match a matrix file named 6745.txt or part6745_rest.txt or 6745, but it will not match 674.txt).
- Parameters
files_m1 (list of str) – List of files for the first modality
files_m2 (list of str) – List of files for the second modality
verbose (bool, optional) – Whether or not we want an explicit description of participants included, by default True
- Returns
Returns three lists: participant ids included in the end, and the list of their filenames
- Return type
list
- Raises
SystemExit – If no subject ID is matched to any files, exit.
SystemExit – If files are duplicated after matching with subject list, exit.
SystemExit – If files are duplicated after matching with subject list, exit.
- _import_matrix(mod, i)[source]
Internal function importing the matrices of interest from the local computer during the fingerprinting operation.
- Parameters
mod (int) – Integer (1 or 2) indicating which folder to fetch the folders from
i (int) – Integer given by the loop in the fingerprint function. It identifies which list element we should import.
- Returns
Returns a numpy array containing the matrix of interest
- Return type
numpy.array
- fingerprint_mats(nodes_index_within, nodes_index_between=None, norm=True, corr_type='Pearson', verbose=True)[source]
Core fingerprinting function. Takes every pair of matrices from modality 1 and 2 and applies the fingerprint methodology between them.
- Parameters
nodes_index_within (list of int) – List of integers representing the number of nodes to select. If nodes_index_between is not given, we assume we want to extract a symmetric sub-matrix (i.e., within-network).
nodes_index_between (list of int, optional) – If requested, the matrix fed to the fingerprint can be asymmetric, which is the case when wanting to do between-network fingerprinting, by default None
norm (bool, optional) – Whether or not to Fisher normalize the data before fingerprinting, by default True
corr_type (str, optional) – Which correlation measure to use for generating fingerprinting, by default “Pearson”. Options include: [“Pearson”]
verbose (bool, optional) – Whether or not to print a message of which participants we are doing, by default True
- Returns
Returns a similarity matrix of the correlations within and between participants.
- Return type
numpy.array
- Raises
SystemExit – If the FingerprintMats step was skipped, we fail this function.
- _fia_calculator(similar_matrix)[source]
Internal function computing the fingerprint identification accuracy, (number of correct identifications).
- Parameters
similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function
- Returns
Binary array for every participant included: a 1 indicates correct identification within the cohort and a 0 indicates incorrect identification.
- Return type
numpy.array
- _si_calculator(similar_matrix)[source]
Internal function computing the self-identifiability (within-individual correlation). This is defined as the diagonal (within-individual correlations) of the similarity matrix.
- Parameters
similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function
- Returns
Returns an array containing the self-identifiability.
- Return type
numpy.array
- _oi_calculator(similar_matrix)[source]
Internal function computing the others-identifiability (between-individual correlation). This is defined as the average of the off-diagonal elements (row-wise) of the similarity matrix.
- Parameters
similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function
- Returns
Returns an array containing the others-identifiability.
- Return type
numpy.array
- _identif_calculator(si_coef, oi_coef)[source]
Internal function computing the differential identifiability metric from Amico and Goni (2018). This is simply the substraction of the diagonal and average off-diagonal elements from the similarity matrix.
- Parameters
si_coef (numpy.array) – Array containing the fingerprinting coefficient.
oi_coef (numpy.array) – Array containing the alikeness coefficient.
- Returns
Returns an array containing the differential identifiability.
- Return type
numpy.array
- fp_metrics_calc(similar_matrix, name)[source]
Method computing the different fingerprint metrics and stores them in a dataframe for export. Each metric is computed and stored in a numpy.array which are then used to populate the dataframe.
- Parameters
similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function
name (str) – String to add to the variables. This is so the user can differentiate the different runs of the fingerprinting if multiple are used.
- Returns
Returns a pandas.DataFrame containing 5 columns: the ID and each of the four metrics.
- Return type
pandas.DataFrame
- fp_mat_export(output_path, coef_data, similar_matrix, name, out_full=True, dir_struct=True)[source]
Export the fingerprinting output to file. What is outputted and how is user dependant. By default, exports the similarity matrix, the subject list and the computed fingerprint metrics, and creates separate dictories for the similarity matrix and the subject list.
- Parameters
output_path (str) – Path where all the fingerprinting output should go.
coef_data (pandas.Dataframe) – Dataframe containing the fingerprinting coefficients calculated before.
similar_matrix (numpy.array) – Similarity matrix containing the fingerprinting coefficients
name (str) – String to add to the file names
out_full (bool, optional) – Whether we want the similarity matrix and subject list to be outputted, by default True
dir_struct (bool, optional) – Whether we want similarity matrix and subject list to have their own directory, by default True
- sihnpy.fingerprinting.import_fingerprint_data(data, var)[source]
Function importing the data used for fingerprinting. This function assumes two important things: 1) The dataframe you are feeding it has an index that comprises the IDs of the participants and 2) the dataframe is in long form (i.e., one participant has more than one visit). Specifically, there should be a variable in the dataframe specifying the visit (var) argument.
Note that by default, sihnpy will grab the first and last visit of a participant if there are more than two visits. If you are interested in fingerprinting specific visits
sihnpy will also remove participants with only 1 visit as they can’t be fingerprinted.
- sihnpy.fingerprinting.fingerprint_tabs(data1, data2, pref)[source]
Main function computing fingerprinting for tabular data. It assumes that the variables to use for fingerprinting start with naming convention (e.g., “ctx”).
- sihnpy.fingerprinting.tab_metrics_calc(data, similar_matrix, name)[source]
Function computing the different fingerprint metrics and stores them in a dataframe for export. Each metric is computed and stored in a numpy.array which are then used to populate the dataframe.
- Parameters
data (pandas.DataFrame) – Either first or last visit of fingerprinting used. This is only used to grab the IDs of the participants and set them as index.
similar_matrix (numpy.array) – Similarity matrix from fingerprint_tabs function
name (str) – String to add to the variables. This is so the user can differentiate the different runs of the fingerprinting if multiple are used.
- Returns
Returns a pandas.DataFrame containing 5 columns: the ID and each of the four metrics.
- Return type
pandas.DataFrame
- sihnpy.fingerprinting.tab_export(outpath, data1, data2, similar_matrix, fp_metrics, name)[source]
Simple wrapper function exporting the data for both visits of participants fingerprinted, the similarity matrix, the fingerprint metrics and the name given by the user.
- sihnpy.fingerprinting._fia_calculator(similar_matrix)[source]
Internal function computing the fingerprint identification accuracy, (number of correct identifications).
- Parameters
similar_matrix (numpy.array) – Similarity matrix
- Returns
Binary array for every participant included: a 1 indicates correct identification within the cohort and a 0 indicates incorrect identification.
- Return type
numpy.array
- sihnpy.fingerprinting._si_calculator(similar_matrix)[source]
Internal function computing the self-identifiability (within-individual correlation). This is defined as the diagonal (within-individual correlations) of the similarity matrix.
- Parameters
similar_matrix (numpy.array) – Similarity matrix
- Returns
Returns an array containing the self-identifiability.
- Return type
numpy.array
- sihnpy.fingerprinting._oi_calculator(similar_matrix)[source]
Internal function computing the others-identifiability (between-individual correlation). This is defined as the average of the off-diagonal elements (row-wise) of the similarity matrix.
- Parameters
similar_matrix (numpy.array) – Similarity matrix
- Returns
Returns an array containing the others-identifiability.
- Return type
numpy.array
- sihnpy.fingerprinting._identif_calculator(si_coef, oi_coef)[source]
Internal function computing the differential identifiability metric from Amico and Goni (2018). This is simply the substraction of the diagonal and average off-diagonal elements from the similarity matrix.
- Parameters
si_coef (numpy.array) – Array containing the self-identifiability.
oi_coef (numpy.array) – Array containing the others-identifiability.
- Returns
Returns an array containing the differential identifiability.
- Return type
numpy.array