I have to design and implement an algorithm for my university project that searches a given set of documents based on the keywords/query given. Assume that each document contain few sentences and these documents can be stored in a suitable data structure. When a query is made I have to display the documents that contain the keywords. A query can contain simple logical operators such as "AND" and "OR".
For example assume that there are 3 documents named Doc1, Doc2, Doc3 with this content:
- Doc1: This is my University.
- Doc2: My University is situated at Delhi.
- Doc3: I like My University.
Here are the answers to some queries:
- "University": Doc1, Doc2, Doc3
- "my AND University": Doc1, Doc2
- "like OR Delhi": Doc2, Doc3
Currently what I have developed reads each file and put its contents into separate binary trees, and I've developed a function for searching one word from the binary trees. How can I extend my search algorithm for search with logical operators?
-
Your are not looking for third party tool? You need to build this from the ground up? Lucene is Open Source.paparazzo– paparazzo2016年10月11日 15:43:12 +00:00Commented Oct 11, 2016 at 15:43
-
yeah. I don't want to use any 3rd party tool and request to provide any possible approach to implement this.SHdinesh Madushanka– SHdinesh Madushanka2016年10月11日 15:45:27 +00:00Commented Oct 11, 2016 at 15:45
-
If I understand correctly, the answers to some queries are theoretical expected results, not actual results you're already getting, is that right?doppelgreener– doppelgreener2016年10月11日 15:47:36 +00:00Commented Oct 11, 2016 at 15:47
-
2Why does query 2 "my AND University" not return all three documents? Or, if this is case-sensitive, why does it return Doc2?jscs– jscs2016年10月11日 16:56:57 +00:00Commented Oct 11, 2016 at 16:56
-
if you know to find one word then break the string into tokens and ineach step check two: the word + operator. if or and found word true, if "and" and not found... false else move to the next two query words.Asaf– Asaf2016年10月11日 18:36:24 +00:00Commented Oct 11, 2016 at 18:36
2 Answers 2
Hash the words to a key value pair of (word, set of documents). When you do the search, insert the sets found into the hashtable. Then do a union on the sets for OR and intersections for AND
-
thanks for your answer and can you provide an example?SHdinesh Madushanka– SHdinesh Madushanka2016年10月11日 15:55:02 +00:00Commented Oct 11, 2016 at 15:55
You could go with .
Dictionary<string, HashSet<Int>> se
word docID
But I thought you had to build from scratch
I don't know java
It may be called a HashTable in java
var docs = se["my"].Interset(se["university"]);
var docs = se["my"].Union(se["Delhi"]);
Explore related questions
See similar questions with these tags.