4

I have to design and implement an algorithm for my university project that searches a given set of documents based on the keywords/query given. Assume that each document contain few sentences and these documents can be stored in a suitable data structure. When a query is made I have to display the documents that contain the keywords. A query can contain simple logical operators such as "AND" and "OR".

For example assume that there are 3 documents named Doc1, Doc2, Doc3 with this content:

  • Doc1: This is my University.
  • Doc2: My University is situated at Delhi.
  • Doc3: I like My University.

Here are the answers to some queries:

  1. "University": Doc1, Doc2, Doc3
  2. "my AND University": Doc1, Doc2
  3. "like OR Delhi": Doc2, Doc3

Currently what I have developed reads each file and put its contents into separate binary trees, and I've developed a function for searching one word from the binary trees. How can I extend my search algorithm for search with logical operators?

doppelgreener
1,30417 silver badges26 bronze badges
asked Oct 11, 2016 at 15:33
5
  • Your are not looking for third party tool? You need to build this from the ground up? Lucene is Open Source. Commented Oct 11, 2016 at 15:43
  • yeah. I don't want to use any 3rd party tool and request to provide any possible approach to implement this. Commented Oct 11, 2016 at 15:45
  • If I understand correctly, the answers to some queries are theoretical expected results, not actual results you're already getting, is that right? Commented Oct 11, 2016 at 15:47
  • 2
    Why does query 2 "my AND University" not return all three documents? Or, if this is case-sensitive, why does it return Doc2? Commented Oct 11, 2016 at 16:56
  • if you know to find one word then break the string into tokens and ineach step check two: the word + operator. if or and found word true, if "and" and not found... false else move to the next two query words. Commented Oct 11, 2016 at 18:36

2 Answers 2

3

Hash the words to a key value pair of (word, set of documents). When you do the search, insert the sets found into the hashtable. Then do a union on the sets for OR and intersections for AND

answered Oct 11, 2016 at 15:46
1
  • thanks for your answer and can you provide an example? Commented Oct 11, 2016 at 15:55
1

You could go with .

Dictionary<string, HashSet<Int>> se
 word docID

But I thought you had to build from scratch

I don't know java
It may be called a HashTable in java

var docs = se["my"].Interset(se["university"]);
var docs = se["my"].Union(se["Delhi"]);
answered Oct 11, 2016 at 16:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.