Optimize this line of code:
B = sum(gradient(corrcoef(A)).^2);
for a matrix A with size(A,2)>>size(A,1)
Description:
Analyses of large datasets often require some level of optimization for speed and/or memory usage. This is an example problem where both the initial data A and final measure of interest B fit perfectly well in memory, but the intermediate variables (in this case an impossibly large correlation matrix) required to compute the final measure of interest do not.
We have a large 2-dimensional matrix A (with dimensions 100 x 100,000).
We need to compute the following row vector B (with dimensions 1 x 100,000):
B = sum(gradient(corrcoef(A)).^2);
This computes first the matrix of correlation coefficients for each pair of columns in A:
a = corrcoef(A)
(a 100,000 by 100,000 matrix), then computes the spatial derivative of the resulting correlation matrix along the second dimension:
b = gradient(a)
(another 100,000 by 100,000 matrix), and finally computes the squared norm of each column in the resulting matrix:
B = sum(b.^2,1)
(a 100,000 element vector)
This straight-forward "vectorized" approach, nevertheless, fails because it requires too much memory (enough to store a 100,000 x 100,000 correlation matrix, around 80Gb).
We clearly need some form of simplification/optimization. Can you compute the measure B within the time-limit of a Cody solution? (typically about 30 seconds)
Solutions will be scored based on computation time (score equal to total time in seconds).
Context: (not relevant to solving this problem)
This problem arises in the analyses of fMRI datasets. A typical result from a fMRI scanner session is a 4-dimensional matrix A(x,y,z,t), where the first three dimensions are spatial dimensions (a scanner of the subject's head/brain) and the fourth dimension is temporal (sequential scans obtained during a typical fMRI session). Think of these as time-varying three-dimensional pictures of your brain activation. A lot of research in the past few years has focused on functional connectivity, a measure of the temporal correlation between the "activation" of any pair of brain areas. Several recent papers have investigated the possibility to obtain entirely data-driven parcellations of the brain (partitioning the brain into functionally-homogeneous areas) based on these spatial patterns of functional connectivity. The measure B above represents one of the measures that have been suggested as a way to drive these data-driven parcellations (borders between two adjacent but functionally distinct brain areas are expected to show higher spatial gradients in functional connectivity profiles). For simplicity I have collapsed the three spatial dimensions into one for this problem, but the computational complexity of the original computation is approximately the same (a typical scanner session results in something of the order of several hundred thousands "voxels" -three dimensional "pixels"- within the brain, and a few hundred time-points; this makes computing the entire "voxel-to-voxel" correlation matrix, or measures derived from it, rather challenging).
HI Alfonso! There seems to be a problem in the test code. Says something like
"Error using evalin Undefined function or variable 'time_count'. Error in TestPoint4 (line 9) assignin('caller','time_count',[evalin('caller','time_count') t1]);"
from the 4th test on. Probably you have to store the time_count variable somewhere else...
Hi Alfonso! Tests 4, 5 and 6 in the test suite seem to be malfunctioning because of a problem with function 'evalin' and/or undefined variable 'time_count'. Could you please fix this?
sorry about this (the error was a side effect of Cody changing to using a more general unit testing framework some time back). I have fixed the testsuite now
671 Solvers
226 Solvers
Compute a dot product of two vectors x and y
601 Solvers
Generate a string like abbcccddddeeeee
149 Solvers
103 Solvers