2

i am building a small search engine to search a collection of pdfs. From each pdf i extract a set of tokens and store it in database. I do not want to store duplicate tokens in database, instead i want to store count of each token in the database. Does python has any special datastructure that do not store duplicates but stores the counts of each token?

asked May 17, 2011 at 9:38

4 Answers 4

5

Python>=2.7 has the Counter.

answered May 17, 2011 at 9:42
Sign up to request clarification or add additional context in comments.

Comments

3

I'd suggest to use a simple dictionary to store the count like

storage = {} # initialize
# ...
if !storage.has_key(token):
 storage[token] = 1
else:
 storage[token] += 1

EDIT

That said, if you're using Python 3 I'd follow Space_C0wb0y's suggestion to use the Counter class ...

answered May 17, 2011 at 9:43

4 Comments

if not storage.hash_key(token)
I'd use a collections.defaultdict and eliminate the if statement entirely.
@nikhil: Why did you accept this solution? It is quite inefficient. I think the only reason to do it this way is if you have a really old Python version.
You should test for a key using if token not in storage.
3

The collections package has defaultdict which can be used as a key-value storage with a counter:

>>> s = 'mississippi'
>>> d = defaultdict(int)
>>> for k in s:
... d[k] += 1
...
>>> d.items()
[('i', 4), ('p', 2), ('s', 4), ('m', 1)]

Just so notice: This is not a databse, it's a pure in memory storage. You would have to save this data somehow!

answered May 17, 2011 at 9:45

Comments

0

You could always implement an object for every file, giving it a number of methods, like open and display and etc etc. You could then define __hash__ and __eq__ for the object, this would allow you to store items in a set, causing the duplicates to just update a single instance inside the set.

This is just another way of doing something by no means is it the best method.

answered May 17, 2011 at 9:57

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.