In order to store attachments, a /path/to/atts/
directory will have numerous child-directories (product IDs) created (from 1 to ~10,000 or maybe more in the future), and in each of this subdir, 1 to ~10 attachment files will be created.
In /path/to/atts/
1
├── file1.1
├── file1.2
└── file1.3
2
└── file2.1
...
10000
├── file10000.1
├── file10000.2
├── file10000.3
├── file10000.4
└── file10000.5
(actually 1 .. 10000 was chosen for the sake of a simpler explanation - IDs will be int32 numbers)
I'm wondering, on the ext4 file system, what is the cd
(actually path resolution) complexity, when reaching /path/to/atts/54321/...
for instance:
Does the path resolution checks all inode / names one by one in the
atts
dir until54321
is reached? Meaning on average n/2 inodes are checked (O(n))Or is there some tree structure within a dir that reduces the search (e.g. a trie tree, alphabetical order...), that would reduce dramatically the number of inodes checked, like log(n) instead of n/2?
If it is the former, I'll change the way the products tree structure is implemented.
Just to be clear: the question is not about a find
search of a file in a file system tree (that's O(n)). It's actually a path resolution (done by the FS), crossing a directory where thousands of file names reside (the product IDs).
1 Answer 1
You can read about the hash tree index used for directories here.
A linear array of directory entries isn't great for performance, so a new feature was added to ext3 to provide a faster (but peculiar) balanced tree keyed off a hash of the directory entry name.