1

I want to manipulate data of this form:

{red -> 1,5,6,7,5,11,...}
{green -> 2,3,4,10,11,12,...}
{blue -> 2,3,5,6,7,8,9,10,...}

where colors are keys, and numbers are, let's say, some locations (non-key integer values).

I'll have a lot of colors, and a lot of associated numbers.

I want to perform operations like total number of colors, top 5 colors with most numbers in it, etc.

What data structures in Python can you suggest to use (which stores key value and associated non key entries)?

I know this is a broad question. I'm trying to solve this problem, if that helps.

PS. I'm trying to follow online course. And that is not a hw. Even if that was a hw, my question is not asking for a solution, i guess.

EDIT

that data collection contains a lot of small txt files with some text in it. In data structure, eventually I want to save unique words from all that txt files along with pointers to documentid's where those words appear.

Ex:

1.txt
"The weather today is good"
2.txt
"It is going to rain today"
data structure should be (numbers are docid's)
{
The->1
weather->1
today->1,2
is->1,2
good->1
it->2
going->2
to->2
rain->2
asked Sep 9, 2018 at 14:28
9
  • 2
    A dict d = {"red":[1,5,...],"green":[2,3,], etc..} (where values can be also sets if you don't need duplicates) does not fit your use case? Commented Sep 9, 2018 at 14:31
  • 2
    This is too broad and unfocused of a question (what do you mean by "etc."?). Using pandas would be a reasonable approach Commented Sep 9, 2018 at 14:32
  • I don't know, I'm new to computer science, that's why I ask. If you think dictionary is a good idea, please post as an answer, along with alternatices, if any. thanks Commented Sep 9, 2018 at 14:33
  • @JohnColeman, thanks, actually I'm going to read from a lot of files first. pasndas seem reasonable too Commented Sep 9, 2018 at 14:35
  • which question in that link you are solving? Commented Sep 9, 2018 at 14:40

1 Answer 1

3

What you want is almost certainly a dictionary of lists.

data = {"red": [1, 5, 6, 7, 5, 11],
 "green": [2, 3, 4, 10, 11, 12],
 "blue": [2, 3, 5, 6, 7, 8, 9, 10],
 }

To get the total number of colours:

number = len(data)

To sort the dictionary by the length of the values:

sorted_colours = sorted(data, key=lambda x: len(data[x]), reverse=True)

But you should probably check out defaultdict, OrderedDict, and counter from the collections module.

answered Sep 9, 2018 at 14:46
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.