Author
This package can retrieve, transform and process text documents.
It can take one or more text documents eventually from files and retrieve their contents to perform several types of processing to transform the documents. Currently it can:
- Extract text features
- Normalize text
- Evaluate text similarity
- Perform statistic calculations
- Split the text in tokens
- Generate stem strings from the text words
Droopy
Basset is a full-text PHP Information Retrieval library. This is a collection of developments in the field of IR and ported over to PHP for research purposes.
Basset provides different ways of searching through documents in a collection (ad-hoc retrieval), by applying advanced and experimental IR algorithms and/or techniques gathered from different Research studies and Conferences, most notably:
You can read about it here
The Cranfield Collection has been the pioneer collection in information retrieval to validate a system's effectiveness.
I've included the 1400 abstract Cranfield Collection as an XML file that you can parse into separate files.
The test file at tests/sample.php can be executed right away to do the parsing and do a search for a single test query. Customize it to your needs if needed.
You can read Cranfield/cranfield-collection/cranqrel for Glassgow's qrels result.
I've also included SMART system's stopword list for standardization (see stopwords/stopwords.txt).
File | Role | Description | ||
---|---|---|---|---|
Files folder imageconfig (1 file) | ||||
Files folder imageCranfield (1 file, 1 directory) | ||||
Files folder imagesrc (1 directory) | ||||
Files folder imagestopwords (1 file) | ||||
Files folder imagetests (3 files, 1 directory) | ||||
Accessible without login Plain text file .travis.yml | Data | Auxiliary data | ||
Accessible without login Plain text file autoload.php | Aux. | Auxiliary script | ||
Accessible without login Plain text file composer.json | Data | Auxiliary data | ||
Accessible without login Plain text file LICENSE | Lic. | License text | ||
Accessible without login Plain text file README.markdown | Doc. | Documentation |
File | Role | Description |
---|---|---|
Accessible without login Plain text file config.ini | Data | Auxiliary data |
File | Role | Description | ||
---|---|---|---|---|
Files folder imagecranfield-collection (4 files) | ||||
Plain text file cranfield_parser.php | Class | Class source |
File | Role | Description |
---|---|---|
Accessible without login Plain text file cran.all.1400.xml-format.xml | Data | Auxiliary data |
Accessible without login Plain text file cran.qry.xml-format.xml | Data | Auxiliary data |
Accessible without login Plain text file cranqrel | Data | Auxiliary data |
Accessible without login Plain text file cranqrel.readme | Doc. | Documentation |
File | Role | Description | ||
---|---|---|---|---|
Files folder imageBasset (16 directories) |
File | Role | Description | ||
---|---|---|---|---|
Files folder imageCollections (2 files) | ||||
Files folder imageDocuments (3 files) | ||||
Files folder imageExpansion (14 files) | ||||
Files folder imageFeature (3 files) | ||||
Files folder imageIndex (6 files) | ||||
Files folder imageMath (1 file) | ||||
Files folder imageMetaData (1 file) | ||||
Files folder imageMetric (31 files) | ||||
Files folder imageModels (34 files, 7 directories) | ||||
Files folder imageNormalizers (3 files) | ||||
Files folder imageResults (2 files) | ||||
Files folder imageSearch (1 file) | ||||
Files folder imageStatistics (3 files) | ||||
Files folder imageStemmers (3 files) | ||||
Files folder imageTokenizers (3 files) | ||||
Files folder imageUtils (4 files) |
File | Role | Description |
---|---|---|
Plain text file CollectionInterface.php | Class | Class source |
Plain text file CollectionSet.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file Document.php | Class | Class source |
Plain text file DocumentInterface.php | Class | Class source |
Plain text file TokensDocument.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file CauchyDE.php | Class | Class source |
Plain text file CauchyDE.php | Class | Class source |
Plain text file DifferentialEvolution.php | Class | Class source |
Plain text file Feedback.php | Class | Class source |
Plain text file GeneticAlgorithm.php | Class | Class source |
Plain text file IdeDecHi.php | Class | Class source |
Plain text file IdeRegular.php | Class | Class source |
Plain text file PRFEAVSMInterface.php | Class | Class source |
Plain text file PRFInterface.php | Class | Class source |
Plain text file PRFVSMInterface.php | Class | Class source |
Plain text file RelevanceModel.php | Class | Class source |
Plain text file Rocchio.php | Class | Class source |
Plain text file SelfAdaptiveDE.php | Class | Class source |
Plain text file SelfAdaptiveDE.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file FeatureExtraction.php | Class | Class source |
Plain text file FeatureInterface.php | Class | Class source |
Plain text file FeatureVector.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file Index.php | Class | Class source |
Plain text file IndexEntry.php | Class | Class source |
Plain text file IndexInterface.php | Class | Class source |
Plain text file IndexManager.php | Class | Class source |
Plain text file IndexReader.php | Class | Class source |
Plain text file IndexWriter.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file Math.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file MetaData.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file BrayCurtisDistance.php | Class | Class source |
Plain text file CanberraDistance.php | Class | Class source |
Plain text file ChebyshevDistance.php | Class | Class source |
Plain text file ChiSquareDistance.php | Class | Class source |
Plain text file CosineSimilarity.php | Class | Class source |
Plain text file CzekanowskiSimilarity.php | Class | Class source |
Plain text file DiceSimilarity.php | Class | Class source |
Plain text file DistanceInterface.php | Class | Class source |
Plain text file EuclideanDistance.php | Class | Class source |
Plain text file HellingerDistance.php | Class | Class source |
Plain text file JaccardIndex.php | Class | Class source |
Plain text file JSDivergence.php | Class | Class source |
Plain text file KLDivergence.php | Class | Class source |
Plain text file KulczynskiDistance.php | Class | Class source |
Plain text file LorentzianDistance.php | Class | Class source |
Plain text file MatusitaDistance.php | Class | Class source |
Plain text file Metric.php | Class | Class source |
Plain text file MetricInterface.php | Class | Class source |
Plain text file MotykaSimilarity.php | Class | Class source |
Plain text file OverlapCoefficient.php | Class | Class source |
Plain text file RenyiDivergence.php | Class | Class source |
Plain text file RuzickaSimilarity.php | Class | Class source |
Plain text file SimilarityInterface.php | Class | Class source |
Plain text file SoergleDistance.php | Class | Class source |
Plain text file SqrtCosineSimilarity.php | Class | Class source |
Plain text file StamatatosDistance.php | Class | Class source |
Plain text file test.php | Class | Class source |
Plain text file TriangleSectorSimilarity.php | Class | Class source |
Plain text file TverskyIndex.php | Class | Class source |
Plain text file VectorSimilarity.php | Class | Class source |
Plain text file VSMInterface.php | Class | Class source |
File | Role | Description | ||
---|---|---|---|---|
Files folder imageContracts (5 files) | ||||
Files folder imageDFIModels (5 files) | ||||
Files folder imageDFRAfterEffect (4 files) | ||||
Files folder imageDFRModels (8 files) | ||||
Files folder imageIBDistribution (3 files) | ||||
Files folder imageIBLambda (4 files) | ||||
Files folder imageNormalization (11 files) | ||||
Plain text file AbsoluteDiscountingLM.php | Class | Class source | ||
Plain text file AtireBM25.php | Class | Class source | ||
Plain text file BaseIdf.php | Class | Class source | ||
Plain text file BM25.php | Class | Class source | ||
Plain text file BM25L.php | Class | Class source | ||
Plain text file BM25Plus.php | Class | Class source | ||
Plain text file BSDS.php | Class | Class source | ||
Plain text file DFIModel.php | Class | Class source | ||
Plain text file DFRModel.php | Class | Class source | ||
Plain text file DirichletLM.php | Class | Class source | ||
Plain text file DirichletSPUD.php | Class | Class source | ||
Plain text file HiemstraLM.php | Class | Class source | ||
Plain text file IBModel.php | Class | Class source | ||
Plain text file Idf.php | Class | Class source | ||
Plain text file IdfDFR.php | Class | Class source | ||
Plain text file IdfOkapi.php | Class | Class source | ||
Plain text file IdfSparckRobertson.php | Class | Class source | ||
Plain text file IRRA12.php | Class | Class source | ||
Plain text file JelinekMercerLM.php | Class | Class source | ||
Plain text file JelinekMercerSPUD.php | Class | Class source | ||
Plain text file LemurTfIdf.php | Class | Class source | ||
Plain text file ModBM25.php | Class | Class source | ||
Plain text file PivotedConcaveTF.php | Class | Class source | ||
Plain text file PivotedConcaveTFIDF.php | Class | Class source | ||
Plain text file PivotedTfIdf.php | Class | Class source | ||
Plain text file TermCount.php | Class | Class source | ||
Plain text file TermFrequency.php | Class | Class source | ||
Plain text file TfConcaveK.php | Class | Class source | ||
Plain text file TfConcaveLog.php | Class | Class source | ||
Plain text file TfIdf.php | Class | Class source | ||
Plain text file TfRobertson.php | Class | Class source | ||
Plain text file TwoStageLM.php | Class | Class source | ||
Plain text file WeightedModel.php | Class | Class source | ||
Plain text file XSqrAM.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file IDFInterface.php | Class | Class source |
Plain text file LanguageModelInterface.php | Class | Class source |
Plain text file ProbabilisticModelInterface.php | Class | Class source |
Plain text file TFInterface.php | Class | Class source |
Plain text file WeightedModelInterface.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file ChiSquared.php | Class | Class source |
Plain text file DFIInterface.php | Class | Class source |
Plain text file DFIModel.php | Class | Class source |
Plain text file Saturated.php | Class | Class source |
Plain text file Standardized.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file AfterEffect.php | Class | Class source |
Plain text file AfterEffectInterface.php | Class | Class source |
Plain text file B.php | Class | Class source |
Plain text file L.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file BasicModel.php | Class | Class source |
Plain text file BasicModelInterface.php | Class | Class source |
Plain text file BE.php | Class | Class source |
Plain text file G.php | Class | Class source |
Plain text file In.php | Class | Class source |
Plain text file InExp.php | Class | Class source |
Plain text file InFreq.php | Class | Class source |
Plain text file P.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file IBDistributionInterface.php | Class | Class source |
Plain text file LLDistribution.php | Class | Class source |
Plain text file SPLDistribution.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file IBLambdaInterface.php | Class | Class source |
Plain text file Lambda.php | Class | Class source |
Plain text file LambdaDF.php | Class | Class source |
Plain text file LambdaTTF.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file Normalization.php | Class | Class source |
Plain text file NormalizationBM25.php | Class | Class source |
Plain text file NormalizationDP.php | Class | Class source |
Plain text file NormalizationF.php | Class | Class source |
Plain text file NormalizationH1.php | Class | Class source |
Plain text file NormalizationH2.php | Class | Class source |
Plain text file NormalizationH2E.php | Class | Class source |
Plain text file NormalizationInterface.php | Class | Class source |
Plain text file NormalizationJMDF.php | Class | Class source |
Plain text file NormalizationJMTF.php | Class | Class source |
Plain text file NormalizationP.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file English.php | Class | Class source |
Plain text file Normalizer.php | Class | Class source |
Plain text file NormalizerInterface.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file ResultEntry.php | Class | Class source |
Plain text file ResultSet.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file Search.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file CollectionStatistics.php | Class | Class source |
Plain text file EntryStatistics.php | Class | Class source |
Plain text file PostingStatistics.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file RegexStemmer.php | Class | Class source |
Plain text file Stemmer.php | Class | Class source |
Plain text file StemmerInterface.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file TokenizerInterface.php | Class | Class source |
Plain text file WhitespaceAndPunctuationTokenizer.php | Class | Class source |
Plain text file WhitespaceTokenizer.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file Serializer.php | Class | Class source |
Plain text file StopWords.php | Class | Class source |
Plain text file TransformationInterface.php | Class | Class source |
Plain text file TransformationSet.php | Class | Class source |
File | Role | Description |
---|---|---|
Accessible without login Plain text file stopwords.txt | Doc. | Documentation |
File | Role | Description | ||
---|---|---|---|---|
Files folder imageBasset (3 directories) | ||||
Accessible without login Plain text file bootstrap.php | Aux. | Auxiliary script | ||
Accessible without login Plain text file phpunit.xml | Data | Auxiliary data | ||
Plain text file sample.php | Class | Class source |
File | Role | Description | ||
---|---|---|---|---|
Files folder imageDocuments (3 files) | ||||
Files folder imageMetric (25 files) | ||||
Files folder imageTokenizers (3 files) |
File | Role | Description |
---|---|---|
Plain text file BaseDocuments.php | Class | Class source |
Plain text file DocumentsTest.php | Class | Class source |
Plain text file TokensDocumentTest.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file BrayCurtisDistanceTest.php | Class | Class source |
Plain text file CanberraDistanceTest.php | Class | Class source |
Plain text file ChebyshevDistanceTest.php | Class | Class source |
Plain text file ChiSquareDistanceTest.php | Class | Class source |
Plain text file CosineSimilarityTest.php | Class | Class source |
Plain text file CzekanowskiSimilarityTest.php | Class | Class source |
Plain text file DiceSimilarityTest.php | Class | Class source |
Plain text file EuclideanDistanceTest.php | Class | Class source |
Plain text file HellingerDistanceTest.php | Class | Class source |
Plain text file JaccardIndexTest.php | Class | Class source |
Plain text file JSDivergenceTest.php | Class | Class source |
Plain text file KLDivergenceTest.php | Class | Class source |
Plain text file KulczynskiDistanceTest.php | Class | Class source |
Plain text file LorentzianDistanceTest.php | Class | Class source |
Plain text file MatusitaDistanceTest.php | Class | Class source |
Plain text file MotykaSimilarityTest.php | Class | Class source |
Plain text file OverlapCoefficientTest.php | Class | Class source |
Plain text file RenyiDivergenceTest.php | Class | Class source |
Plain text file RuzickaSimilarityTest.php | Class | Class source |
Plain text file SoergleDistanceTest.php | Class | Class source |
Plain text file SqrtCosineSimilarityTest.php | Class | Class source |
Plain text file StamatatosDistanceTest.php | Class | Class source |
Plain text file TriangleSectorSimilarityTest.php | Class | Class source |
Plain text file TverskyIndexTest.php | Class | Class source |
Plain text file VectorSimilarityTest.php | Class | Class source |
File | Role | Description |
---|---|---|
Plain text file BaseTokenizers.php | Class | Class source |
Plain text file WhitespaceAndPunct...onTokenizerTest.php | Class | Class source |
Plain text file WhitespaceTokenizerTest.php | Class | Class source |
Add link image If you know an application of this package, send a message to the author to add a link here.