What am I trying to do
I'm trying to scan a disk and save (cache) all the paths of files and directories into an array so that I could perform a quick file/directory search later on.
Implementation
I'm using a Node.js module called walkdir.
cacheDisk () {
let searchDir = 'C:/'
let cache = []
let options = {
"max_depth": 5, // only recurse down to max_depth. if you need more than no_recurse
"track_inodes": true // should be used with max_depth to prevent infinite loop. On windows or with hardlinks some files are not emitted due to inode collision.
}
let paths = walkdir(searchDir, options, (path) => {
cache.push(path)
})
.on('end', () => {
console.log(cache)
this.findItem(cache)
})
},
findItem (cache) {
let filename = 'file_1.txt'
const matches = cache.filter(element => element.includes(filename))
console.log(matches)
}
Questions / Problems:
FYI: I only need it to work in the latest Chrome (it's an Electron app)
- Is using
cache.push(path)
a good way (performance-wise) to save paths to the array or is the module already doing it for me and I'm just repeating this action on every iteration? It takes about 30 seconds and 500-800 MB of RAM on a core i7 to finish that caching operation for my ssd with 1 million paths
- Is using
cache.filter(element => element.includes(filename))
a good / safe way to find an element in an array that can potentially be ~10,000,000 paths in length?
- When I'm trying to scan disk's root (e.g.
C:/
) I get 2 slashes after the disk letter in every path (e.g.C:\\Program Files\test
). But when I'm scanning something likeC:/test/
everything is fine. I guess that's Node's doing? How do I avoid that? (do I just replace all\\
with\
in all the paths afterwards?)
1 Answer 1
I have to ask why you would want to find a file anywhere on the filesystem. That seems like a pretty odd requirement. Most apps (take node module loading for example) check a limited number of folders which can be done very quickly.
To answer your questions:
the OS does not do this kind of caching.
No, I wouldn't consider this safe. For two reasons. There are many other names that could match. For example
video_file_1.txt
,file_1.txt.old
orsome_path/file_1.txt/cat.jpg
. I would check against the basename something likefunction basename(path) { return path.match(/[^\\\/]+$/)[0]; } const matches = cache.filter(element => basename(element) == filename)
Someone already answered this in the comments. It is really just an artifact of attempting to handle windows and unix paths consistently.
Also if retrieval efficiency is important then I would store the data as a hash not an array, i.e:
let cache = {};
let paths = walkdir(searchDir, options, (path) => {
const name = basename(path);
if (!cache[basename])
cache[basename] = [];
cache[basename].push(path)
});
Then finding an element is a simple matter of:
findItem (cache) {
let filename = 'file_1.txt';
const matches = cache[filename];
console.log(matches);
}
Note that, since you are on Windows, you should also consider case-insensitivity of filenames.
-
\$\begingroup\$ Thank you for the detailed answer. I'm trying to implement global search in my app and I need it to cache all the mounted disks entirely so that any user file on any disk could be found instantly. Good idea to use an object instead of an array. As for the exact filename matching problem, I think I'm going to have to implement some sort of "fuzzy" search instead of using filter \$\endgroup\$Un1– Un12018年05月25日 21:59:15 +00:00Commented May 25, 2018 at 21:59
-
\$\begingroup\$ If that is what you want I would look into integrating with Windows search service. \$\endgroup\$Marc Rohloff– Marc Rohloff2018年05月26日 04:50:15 +00:00Commented May 26, 2018 at 4:50
-
\$\begingroup\$ I need a cross platform search, plus Windows search never finds anything. I think I'm gonna stick to my custom caching + fuzzy search, it only takes 30 seconds to cache a 256GB ssd anyway \$\endgroup\$Un1– Un12018年05月26日 10:37:25 +00:00Commented May 26, 2018 at 10:37
-
\$\begingroup\$ Then I would certainly look into using promises, workers or another asynchronous technique so that your users don't wait 30 seconds for interactivity. \$\endgroup\$Marc Rohloff– Marc Rohloff2018年05月26日 18:08:07 +00:00Commented May 26, 2018 at 18:08
-
\$\begingroup\$ Sure thing, thankfully node.js provides both sync and async functions for pretty much everything. \$\endgroup\$Un1– Un12018年05月26日 18:32:46 +00:00Commented May 26, 2018 at 18:32
Explore related questions
See similar questions with these tags.
C:/test
it adds 1 slash. But when I search disk root directory it adds 2 slashes after the disk letter, even if I don't specify slash e.g.C:
. (yes, node.js automatically uses needed slash) \$\endgroup\$