tools/fitdata.cpp File Reference
Gather statistics from dataset for MPLSH tuning.
More...
#include <cstdlib>
#include <gsl/gsl_multifit.h>
#include <boost/program_options.hpp>
#include <boost/progress.hpp>
#include <lshkit.h>
Namespaces
namespace tr1
Functions
bool is_good_value (double v)
int main (int argc, char *argv[])
Detailed Description
Gather statistics from dataset for MPLSH tuning.
This program gahters statistical data from a small sample dataset for automatic MPLSH parameter tuning. It carries out the following steps:
- Sample N points from the dataset. Only those N points will be used for future computation.
- Sample P pairs of points from the sample, calculate the distance for each pair.
- Sample Q points from the sample as queries points.
- Divide the sample into F folds.
- For i = 1 to F, take i folds and run K-NN search, so the query points will be searched against sample datasets of N/F, 2N/F, ..., N/F points.
The statistical data is printed to standard output after the progress display.
Allowed options:
-h [ --help ] produce help message.
-N [ -- ] arg (=0) number of points to use
-P [ -- ] arg (=50000) number of pairs to sample
-Q [ -- ] arg (=1000) number of queries to sample
-K [ -- ] arg (=100) search for K nearest neighbors
-F [ -- ] arg (=10) divide the sample to F folds
-D [ --data ] arg data file
Get LSHKIT at SourceForge.net. Fast, secure and Free Open Source software downloads
doxygen