Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

proportionmap accepts iterators #855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ArunS-tack wants to merge 3 commits into JuliaStats:master
base: master
Choose a base branch
Loading
from ArunS-tack:itr

Conversation

@ArunS-tack
Copy link

@ArunS-tack ArunS-tack commented Mar 4, 2023

closes #842.

src/counts.jl Outdated
than the proportion of raw counts.
"""
proportionmap(x::AbstractArray) = _normalize_countmap(countmap(x), length(x))
proportionmap(x) = _normalize_countmap(countmap(x), length(collect(x)))
Copy link
Member

@devmotion devmotion Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just count the total number of elements when building the countmap? It seems inefficient to materialize x only to obtain its length if we already iterate through it anyway.

Copy link
Author

@ArunS-tack ArunS-tack Mar 4, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something around sum(values(countmap(x))? But I think that's memory inefficient even though it doesn't iterate again.

Copy link
Member

@devmotion devmotion Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I thought counting directly inside of countmap. But probably sum(values, countmap(x)) would still be more efficient than using collect(x) if x is an iterator with a large number of elements.

ArunS-tack reacted with thumbs up emoji
Copy link
Author

@ArunS-tack ArunS-tack Mar 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia> @btime proportionmap(skipmissing(a)) 8.625 μs (27 allocations: 146.67 KiB) Dict{Int64, Float64} with 4 entries: 4 => 0.25 2 => 0.25 3 => 0.25 1 => 0.25

julia> @btime proportionmap(skipmissing(a)) 316.667 ns (9 allocations: 1.08 KiB) Dict{Int64, Float64} with 4 entries: 4 => 0.25 2 => 0.25 3 => 0.25 1 => 0.25

Looks like a significant improvement 🧐

Comment on lines +454 to +459
countm = Dict{eltype(x), Int}()
n = 0
for y in x
countm[y] = get(countm, y, 0) + 1
n += 1
end
Copy link
Member

@nalimilan nalimilan Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reinvents countmap. Better make countmap allow iterators instead, so that both functions benefit.

Copy link
Author

@ArunS-tack ArunS-tack Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countmap already accepts iterators; I did that to keep a count of n while iterating.

Copy link
Member

@nalimilan nalimilan Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The problem is that countmap uses different algorithms under the hood for performance. By using a Dict here, you lose the benefit of the fast radix sort and count sort algorithms.

I see two solutions:

  • do n = Base.IteratorSize(x) isa Union{HasLength, HasShape} ? length(x) : sum(values(countm))
  • adjust all _addcounts! methods to return the number of elements (this should be cheap so not a big deal if it's not used by addcounts)

Copy link

@tylerjthomas9 tylerjthomas9 Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking to help get this across the line. Is this your first proposed solution?

function proportionmap(x)
 countm = countmap(x)
 n = Base.IteratorSize(x) isa Union{Base.HasLength, Base.HasShape} ? length(x) : sum(values(countm))
 _normalize_countmap(countm, n)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@devmotion devmotion devmotion left review comments

@nalimilan nalimilan nalimilan left review comments

+1 more reviewer

@tylerjthomas9 tylerjthomas9 tylerjthomas9 left review comments

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

proportionmap should accept any iterator

AltStyle によって変換されたページ (->オリジナル) /