Creating an inverted index in Python

Asked 7 years, 7 months ago

Viewed 9k times

\$\begingroup\$

Here is the code I have written to create an inverted index dictionary for a set of documents:

inv_indx = {i:[] for i in corpus_dict}
for word in corpus_dict:
 for i in range(len(docs)):
 if word in docs[i]:
 inv_indx[word].append(i)

docs is a list of sets of the words in various documents:

[{'once','upon','a','time',...},{'lorum','ipsum','time'...},...]

corpus_dict is a set of all the words that appear in any of the documents:

{'once','upon','a','time','lorum','ipsum',...}

inv_index becomes a dictionary with each word in the corpus_dict as a key for a list of the document ids that contain that word:

{'once':[0],'time':[0,1],...}

The problem is this becomes very slow if the number of documents gets too big. How can I make this code more efficient?

edited Mar 6, 2018 at 5:35

Jamal's user avatar

Jamal

35.2k13 gold badges134 silver badges238 bronze badges

asked Mar 6, 2018 at 0:17

Joe's user avatar

Joe Joe

2933 silver badges6 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

Suggestions

You check all words in all documents. Try iterate only over docs (without unnecessary checks)
instead of create empty inv_indx = {i:[] for i in corpus_dict} you can use defaultdict

Code

from collections import defaultdict
inv_indx = defaultdict(list)
for idx, text in enumerate(docs):
 for word in text:
 inv_indx[word].append(idx)

answered Mar 6, 2018 at 1:11

vaeta's user avatar

vaeta vaeta

8865 silver badges8 bronze badges

\$\endgroup\$

\$\begingroup\$ Wow that was so much faster. The defaultdict doesn't seem to have much impact on the perfomance though so I'll leave that out \$\endgroup\$

Joe
– Joe

2018年03月06日 14:57:22 +00:00
Commented Mar 6, 2018 at 14:57

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Creating an inverted index in Python

1 Answer 1

Suggestions

Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Creating an inverted index in Python

1 Answer 1

Suggestions

Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions