- C++ 97.8%
- Meson 2.2%
| img | edit readme | |
| src | edit readme | |
| .gitignore | seriously nearly finished | |
| example_data.csv | finish touches | |
| LICENSE | fileparsing works... | |
| meson.build | finish touches | |
| README.md | edit readme | |
fast file-processing
a simple test project for me to try out pushing the limits of fast reading & processing of large files in c++. [it's mainly about optimisation]
how to compile?
this project uses the meson buildsystem for compiling commands for compilation (in root directory of project):
meson setup <build_directory_name> --buildtype=release -Dcpp_args='/GL'cd <build_directory_name> && meson compile./fast_file_processing
data-set
the programs purpose was to be run on a file containing 1 billion datasets with a total size of ~16GB. however, such a file is too large to upload to the git repository, so there's only a small example file with ~44'000 datasets/lines.
performance:
on my laptop, being bottlenecked by the intel i7-9750 processor, the program - using the large data-set - finishes, on average, within ~6 minutes. running the small data-set, it finishes within 58ms.
the most time-consuming operation by far, is the add_to_map function, which is adding and updating data in the std::map.
simply parsing the large data-set, without calling the add_to_map function [on the large data-set], the program takes ~49 seconds to run. in this case, it is bottlenecked by my ssd's read-speed, sitting at ~330MB/s.
notes to the implementation:
this project is realised using memory-mapping functions provided by the windows-api. it isn't multithreaded and simply served as a playground for trying out different methods of file reading.
since the primary goal was speed, some parts of the codebase are deliberately written to not be extensible or pretty for the sakes of performance.
licence
this project is released under the gnu affero general public license