I have a huge fileList
array where many elements of array start from sub-**
pattern. I want to group these files belonging to same sub-**
(for e.g.: files which start with sub-01
) into one subject specific array. Thus after creating all these sub specific array they can be pushed to a large array nifti_fileList
.
I have written following code in JavaScript with very crude logic:
subFilelist = [];
var previous_sub = null;
var capture = null;
var file_no = 0
async.eachOfLimit(fileList, 200, function (file, key, cb) {
testfile = file.relativePath;
sub_capture_re = /sub-[a-zA-Z0-9]*/gi;
var current_sub = testfile.match(sub_capture_re)
if (current_sub) {
if((file_no === 0)){
subFilelist.push(testfile)
file_no = file_no + 1;
previous_sub = current_sub[0];
}
if ((current_sub[0] === previous_sub) && (file_no > 0)){
previous_sub = current_sub[0];
subFilelist.push(file.relativePath)
} else {
nifti_fileList.push(subFilelist)
subFilelist = [];
file_no =0
}
}
process.nextTick(cb);
});
console.log(nifti_fileList);
Sample input:
fileList = [/sub-09/func/sub-09_task-genInstrAv_run-05_physio.tsv.gz,
/sub-10/anat/sub-10_T1w.json,
/sub-10/anat/sub-10_T1w.nii.gz,
/sub-10/fmap/sub-10_magnitude1.nii.gz,
/sub-10/fmap/sub-10_magnitude2.nii.gz,
/sub-10/fmap/sub-10_phasediff.json,
/sub-10/fmap/sub-10_phasediff.nii.gz,
/sub-11/anat/sub-11_T1w.nii.gz]
nifti_fileList = [[/sub-09/func/sub-09_task-genInstrAv_run-05_physio.tsv.gz],
[/sub-10/anat/sub-10_T1w.json,
/sub-10/anat/sub-10_T1w.nii.gz,
/sub-10/fmap/sub-10_magnitude1.nii.gz,
/sub-10/fmap/sub-10_magnitude2.nii.gz,
/sub-10/fmap/sub-10_phasediff.json,
/sub-10/fmap/sub-10_phasediff.nii.gz],
[/sub-11/anat/sub-11_T1w.nii.gz]]
Can this be made more efficient?
1 Answer 1
Observations and Remarks
- Your code is pretty complex and hard to read, as it involves a
callback with a complex outside state encoded within the global
revious_sub
,capture
andfile_no
variables. Each time your callback is executed, it modifies some of those state variables = there are many side-effects, which makes things confusing. - The variable
capture
is unused. - Not all variables are declared and thus become global.
- Your sample input seems to be a list of strings, but within your
callback you access the path via
testfile = file.relativePath;
indicating that you deal with objects. - It doesn't seem necessary to me to perform a basic array operation without IO delays asynchronously. So I recommend a simpler synchronous function.
- Your resulting grouped file list is not very useful, as its groups
are unnamed. If we want to know the 'sub-*' prefix of a group, we
would have to manually inspect a random group element. An explicit
mapping from prefix or name to file would allow us to query e.g.
groups.get('sub-03')
in constant time. - Your current implementation requires a sorted input file list. By
using a
Map
, you can guarantee the same runtime complexity without having to rely on sorted input.
Suggested Implementation
Write a general purpose groupBy
helper function or use a library which already includes one. Such a function would group array elements by their label or name. The user supplies a callback which returns the label or name for each array element:
function groupBy(array, cp) {
return array.reduce((groups, next) => {
const name = cp(next);
if (name !== undefined) {
const group = groups.get(name);
if (group) group.push(next);
else groups.set(name, [next]);
}
return groups;
}, new Map());
}
For a given list of file paths, you could then simply write:
const files = [
'/sub-09/func/sub-09_task-genInstrAv_run-05_physio.tsv.gz',
'/sub-10/anat/sub-10_T1w.json',
'/sub-10/anat/sub-10_T1w.nii.gz',
'/sub-10/fmap/sub-10_magnitude1.nii.gz',
'/sub-10/fmap/sub-10_magnitude2.nii.gz',
'/sub-10/fmap/sub-10_phasediff.json',
'/sub-10/fmap/sub-10_phasediff.nii.gz',
'/sub-11/anat/sub-11_T1w.nii.gz'
];
const groups = groupBy(files, path => {
const match = path.match(/^\/(sub-\w+)\//);
if (match) return match[1];
});
If your file list is made up of objects instead of strings, you could write:
const groups = groupBy(files, file => {
const match = file.relativePath.match(/^\/(sub-\w+)\//);
if (match) return match[1];
});
The result is a Map { "sub-09" → [...], "sub-10" → [...], "sub-11" → [...] }
which we can query via e.g. groups.get('sub-09')
or transform into your array of arrays via e.g. [...groups.values()]
.
-
\$\begingroup\$ How should i access groups elements then..? \$\endgroup\$learnningprogramming– learnningprogramming2017年12月08日 20:08:03 +00:00Commented Dec 8, 2017 at 20:08
-
\$\begingroup\$ @learnningprogramming You can access the first file in a group by e.g.
groups.get('sub-10')[0]
. To print all files, use e.g.console.log(...groups.get('sub-10'))
. To iterate through all files in a group, use e.g.for (const file of groups.get('sub-10')) { ... }
. To iterate all groups, use e.g.for (const [name, files] of groups.entries()) { ... }
. \$\endgroup\$le_m– le_m2017年12月08日 20:22:02 +00:00Commented Dec 8, 2017 at 20:22 -
\$\begingroup\$ the solution you provided works well however, as per ES6
return array.reduce((groups, next) => { const name = cp(next);
gives lint error as follows:Parsing error: Parenthesized pattern
\$\endgroup\$learnningprogramming– learnningprogramming2017年12月08日 23:41:01 +00:00Commented Dec 8, 2017 at 23:41 -
\$\begingroup\$ @learnningprogramming debug, update or change your linter then? \$\endgroup\$le_m– le_m2017年12月08日 23:49:22 +00:00Commented Dec 8, 2017 at 23:49
-
\$\begingroup\$ I am using npm linter>> this code is for another huge chunk of software already written. dont want to break it. \$\endgroup\$learnningprogramming– learnningprogramming2017年12月09日 00:00:29 +00:00Commented Dec 9, 2017 at 0:00
Explore related questions
See similar questions with these tags.