sihnpy.fingerprinting

Fingerprinting module

See SIHNPY documentation for more information on the functions of the script.

Module Contents

Classes

FingerprintMats

Class object used to store information for the fingerprinting and to output

Functions

import_fingerprint_ids(id_list)

Function importing the list of IDs to analyze. We assume that the list of IDs are stored

_slice_matrix(matrix_file, nodes_index_within[, ...])

Internal function slicing matrices and returning "flattened" vectors.

_norm_data(array_to_norm[, norm])

Internal function normalizing (if necessary) the arrays before fingeprinting. If normalizing, we change the cells that are calculated as Infinity to be missing.

import_fingerprint_data(data, var)

Function importing the data used for fingerprinting. This function assumes two important

fingerprint_tabs(data1, data2, pref)

Main function computing fingerprinting for tabular data. It assumes that the variables to

tab_metrics_calc(data, similar_matrix, name)

Function computing the different fingerprint metrics and stores them in a dataframe

tab_export(outpath, data1, data2, similar_matrix, ...)

Simple wrapper function exporting the data for both visits of participants fingerprinted,

_fia_calculator(similar_matrix)

Internal function computing the fingerprint identification accuracy,

_si_calculator(similar_matrix)

Internal function computing the self-identifiability (within-individual correlation).

_oi_calculator(similar_matrix)

Internal function computing the others-identifiability (between-individual correlation).

_identif_calculator(si_coef, oi_coef)

Internal function computing the differential identifiability metric

sihnpy.fingerprinting.import_fingerprint_ids(id_list)[source]

Function importing the list of IDs to analyze. We assume that the list of IDs are stored in either a .csv or .tsv file, or a text file with 1 ID per line.

Parameters

id_list (str) – Path on the local computer to the file where the IDs are stored.

Returns

Returns a list where each element is a participant ID.

Return type

list

sihnpy.fingerprinting._slice_matrix(matrix_file, nodes_index_within, nodes_index_between=None)[source]

Internal function slicing matrices and returning “flattened” vectors.

Parameters
  • matrix_file (numpy.array) – Array for a given participant comprising all the functional connectivity nodes.

  • nodes_index_between (list) – List of nodes to include in the fingerprinting calculation. If not interested in looking at between-network, the function defaults to calculating within-network. Defaults to None.

  • nodes_index_within (list) – List of nodes to include in the fingerprinting calculation.

Returns

Returns a flattened array of the functional connectivity data.

Return type

numpy.array

sihnpy.fingerprinting._norm_data(array_to_norm, norm=True)[source]

Internal function normalizing (if necessary) the arrays before fingeprinting. If normalizing, we change the cells that are calculated as Infinity to be missing.

Parameters
  • array_to_norm (numpy.array) – Raw sliced array to normalize.

  • norm (bool, optional) – Whether or not the normalization should be applied, by default True

Returns

Array with the chosen normalization applied to. Returns a copy of the array if no normalization is applied.

Return type

numpy.array

class sihnpy.fingerprinting.FingerprintMats(id_ls, path_m1, path_m2)[source]

Class object used to store information for the fingerprinting and to output the results of the fingerprinting analysis. This object is to be used when the input data is folders with 1 matrix per subject.

fetch_matrix_file_names()[source]

Simple function importing the matrices as input for the fingerprinting computation. Does not require any argument (will use the path variables from the FingerprintMats objects).

Raises

OSError – Checks whether the path exists and is able to import the file.

subject_selection(files_m1, files_m2, verbose=True)[source]

Select participant files that are present in both modalities (i.e., intersection). The function assumes that the ID in the ID list will match in some way the file name in the folder (e.g., ID 6745 would match a matrix file named 6745.txt or part6745_rest.txt or 6745, but it will not match 674.txt).

Parameters
  • files_m1 (list of str) – List of files for the first modality

  • files_m2 (list of str) – List of files for the second modality

  • verbose (bool, optional) – Whether or not we want an explicit description of participants included, by default True

Returns

Returns three lists: participant ids included in the end, and the list of their filenames

Return type

list

Raises
  • SystemExit – If no subject ID is matched to any files, exit.

  • SystemExit – If files are duplicated after matching with subject list, exit.

  • SystemExit – If files are duplicated after matching with subject list, exit.

_import_matrix(mod, i)[source]

Internal function importing the matrices of interest from the local computer during the fingerprinting operation.

Parameters
  • mod (int) – Integer (1 or 2) indicating which folder to fetch the folders from

  • i (int) – Integer given by the loop in the fingerprint function. It identifies which list element we should import.

Returns

Returns a numpy array containing the matrix of interest

Return type

numpy.array

fingerprint_mats(nodes_index_within, nodes_index_between=None, norm=True, corr_type='Pearson', verbose=True)[source]

Core fingerprinting function. Takes every pair of matrices from modality 1 and 2 and applies the fingerprint methodology between them.

Parameters
  • nodes_index_within (list of int) – List of integers representing the number of nodes to select. If nodes_index_between is not given, we assume we want to extract a symmetric sub-matrix (i.e., within-network).

  • nodes_index_between (list of int, optional) – If requested, the matrix fed to the fingerprint can be asymmetric, which is the case when wanting to do between-network fingerprinting, by default None

  • norm (bool, optional) – Whether or not to Fisher normalize the data before fingerprinting, by default True

  • corr_type (str, optional) – Which correlation measure to use for generating fingerprinting, by default “Pearson”. Options include: [“Pearson”]

  • verbose (bool, optional) – Whether or not to print a message of which participants we are doing, by default True

Returns

Returns a similarity matrix of the correlations within and between participants.

Return type

numpy.array

Raises

SystemExit – If the FingerprintMats step was skipped, we fail this function.

_fia_calculator(similar_matrix)[source]

Internal function computing the fingerprint identification accuracy, (number of correct identifications).

Parameters

similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function

Returns

Binary array for every participant included: a 1 indicates correct identification within the cohort and a 0 indicates incorrect identification.

Return type

numpy.array

_si_calculator(similar_matrix)[source]

Internal function computing the self-identifiability (within-individual correlation). This is defined as the diagonal (within-individual correlations) of the similarity matrix.

Parameters

similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function

Returns

Returns an array containing the self-identifiability.

Return type

numpy.array

_oi_calculator(similar_matrix)[source]

Internal function computing the others-identifiability (between-individual correlation). This is defined as the average of the off-diagonal elements (row-wise) of the similarity matrix.

Parameters

similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function

Returns

Returns an array containing the others-identifiability.

Return type

numpy.array

_identif_calculator(si_coef, oi_coef)[source]

Internal function computing the differential identifiability metric from Amico and Goni (2018). This is simply the substraction of the diagonal and average off-diagonal elements from the similarity matrix.

Parameters
  • si_coef (numpy.array) – Array containing the fingerprinting coefficient.

  • oi_coef (numpy.array) – Array containing the alikeness coefficient.

Returns

Returns an array containing the differential identifiability.

Return type

numpy.array

fp_metrics_calc(similar_matrix, name)[source]

Method computing the different fingerprint metrics and stores them in a dataframe for export. Each metric is computed and stored in a numpy.array which are then used to populate the dataframe.

Parameters
  • similar_matrix (numpy.array) – Similarity matrix from fingerprint_mats function

  • name (str) – String to add to the variables. This is so the user can differentiate the different runs of the fingerprinting if multiple are used.

Returns

Returns a pandas.DataFrame containing 5 columns: the ID and each of the four metrics.

Return type

pandas.DataFrame

fp_mat_export(output_path, coef_data, similar_matrix, name, out_full=True, dir_struct=True)[source]

Export the fingerprinting output to file. What is outputted and how is user dependant. By default, exports the similarity matrix, the subject list and the computed fingerprint metrics, and creates separate dictories for the similarity matrix and the subject list.

Parameters
  • output_path (str) – Path where all the fingerprinting output should go.

  • coef_data (pandas.Dataframe) – Dataframe containing the fingerprinting coefficients calculated before.

  • similar_matrix (numpy.array) – Similarity matrix containing the fingerprinting coefficients

  • name (str) – String to add to the file names

  • out_full (bool, optional) – Whether we want the similarity matrix and subject list to be outputted, by default True

  • dir_struct (bool, optional) – Whether we want similarity matrix and subject list to have their own directory, by default True

sihnpy.fingerprinting.import_fingerprint_data(data, var)[source]

Function importing the data used for fingerprinting. This function assumes two important things: 1) The dataframe you are feeding it has an index that comprises the IDs of the participants and 2) the dataframe is in long form (i.e., one participant has more than one visit). Specifically, there should be a variable in the dataframe specifying the visit (var) argument.

Note that by default, sihnpy will grab the first and last visit of a participant if there are more than two visits. If you are interested in fingerprinting specific visits

sihnpy will also remove participants with only 1 visit as they can’t be fingerprinted.

sihnpy.fingerprinting.fingerprint_tabs(data1, data2, pref)[source]

Main function computing fingerprinting for tabular data. It assumes that the variables to use for fingerprinting start with naming convention (e.g., “ctx”).

sihnpy.fingerprinting.tab_metrics_calc(data, similar_matrix, name)[source]

Function computing the different fingerprint metrics and stores them in a dataframe for export. Each metric is computed and stored in a numpy.array which are then used to populate the dataframe.

Parameters
  • data (pandas.DataFrame) – Either first or last visit of fingerprinting used. This is only used to grab the IDs of the participants and set them as index.

  • similar_matrix (numpy.array) – Similarity matrix from fingerprint_tabs function

  • name (str) – String to add to the variables. This is so the user can differentiate the different runs of the fingerprinting if multiple are used.

Returns

Returns a pandas.DataFrame containing 5 columns: the ID and each of the four metrics.

Return type

pandas.DataFrame

sihnpy.fingerprinting.tab_export(outpath, data1, data2, similar_matrix, fp_metrics, name)[source]

Simple wrapper function exporting the data for both visits of participants fingerprinted, the similarity matrix, the fingerprint metrics and the name given by the user.

sihnpy.fingerprinting._fia_calculator(similar_matrix)[source]

Internal function computing the fingerprint identification accuracy, (number of correct identifications).

Parameters

similar_matrix (numpy.array) – Similarity matrix

Returns

Binary array for every participant included: a 1 indicates correct identification within the cohort and a 0 indicates incorrect identification.

Return type

numpy.array

sihnpy.fingerprinting._si_calculator(similar_matrix)[source]

Internal function computing the self-identifiability (within-individual correlation). This is defined as the diagonal (within-individual correlations) of the similarity matrix.

Parameters

similar_matrix (numpy.array) – Similarity matrix

Returns

Returns an array containing the self-identifiability.

Return type

numpy.array

sihnpy.fingerprinting._oi_calculator(similar_matrix)[source]

Internal function computing the others-identifiability (between-individual correlation). This is defined as the average of the off-diagonal elements (row-wise) of the similarity matrix.

Parameters

similar_matrix (numpy.array) – Similarity matrix

Returns

Returns an array containing the others-identifiability.

Return type

numpy.array

sihnpy.fingerprinting._identif_calculator(si_coef, oi_coef)[source]

Internal function computing the differential identifiability metric from Amico and Goni (2018). This is simply the substraction of the diagonal and average off-diagonal elements from the similarity matrix.

Parameters
  • si_coef (numpy.array) – Array containing the self-identifiability.

  • oi_coef (numpy.array) – Array containing the others-identifiability.

Returns

Returns an array containing the differential identifiability.

Return type

numpy.array