4 min read

Partitioning subjects based on high-dimensional fMRI data

Goal

My first ‘first-author’ paper is accepted for publication (Yaaaay!). In this research my PhD supervisor Tom Wilderjans and I proposed a relatively simple ‘two-step’ procedure for the clustering of subjects based on (big) functional Magnetic Resonance Imaging (fMRI) data.

In this two-step procedure we:

  • (step 1) first apply a data reduction procedure known as Independent Component Analysis to each subject’s measured fMRI data in order to estimate brain connectivity that is associated to that persons’ functioning. The additional benefit besides estimating relevant brain connectivity is that the size of each subject’s dataset is greatly reduced (and therefore computationally manageble on a computer).

  • (step 2) After the data reduction step we compute a similarity measure between all estimated sets of brain networks for all subject pairs. In this way we can quantify to what extent the set of brain networks for subject number 1 are similar (or dissimilar) to subject number 2. After computing all similarties between all subject pairs we apply a clustering procedure to this computed similarity data. We investigated the performance of several known clustering procedures, one example will be given below.

By applying this two-step approach, researchers are able to automatically allocated subjects into groups. This procedure is totally data-driven, thus we ‘let the data speak for themselves’. Note that members in group A share more similarities in brain functioning to their groupmembers compared to the subjects that are automatically allocated to group B for example. By exploring these clusters or groups, researchers potentially may discover valuable information about brain dysfunctions.

Example

Below you can play around with an interactive dendrogram that displays the result of applying our two-step procedure on either an (1) easy simulated dataset or (2) an difficult simulated dataset. For both datasets we generated artificial fMRI data for a total of 60 subjects and made sure that 4 clusters are present in the data. Remember that subjects that belong to a cluster are more similar to eachother in brain functioning than subjects that belong to an other cluster.

In order to make it easy to spot the clustermemberships I added symbols to the graph that denotes the original membership; 5 subjects are indicated with a square, 10 subjects with a circle, 15 with a triangle and 25 subjects with a diamond.

Results

For the results of the two-step procedure on the easy dataset you can cleary see that our procedure correctly estimates the simulated clustering: the four symbols that denote the true clustering are all grouped together on a seperate branch of the dendrogram (easy to differentiate by colour).

For the results of the two-step procedure on the difficult dataset, the method has more problems in correctly estimating the true cluster structure. For example, the dendrogram of the difficult dataset using Ward’s method indicates that subject number 9 (originally belonging to the circle group) is allocated to a cluster together with 24 subjects from the diamond group. Also note that the far right cluster, indicated by the purple branch is a rather mixed group with circles, triangles and a diamond subject. This is to be expected since we generated the data in such a way that it is very difficult to discriminate between subjects. However, without employing our two-step procedure to this difficult dataset (select the Without two-step procedure dataset option), the cluster allocation performs worse and no sensible interpretation could be given to the results since no brain connectivity is estimated with ICA (i.e., the first step of our approach). In other words, our two-step procedure ensured that we picked up as much signal (or conversely, filtered out a lot of noise) from the data as possible in order to achieve a good clustering result.

For more details and an empirical example of our two-step procedure concerning patients with Alzheimer’s disease and frontotemporal dementia, please read my paper :). Note that my article is open access and therefore free for everyone to read. Note that easy to use R code of the two-step procedure can be found on my GitHub page and a tutorial pdf on my website (see Code and software tutorials).