|
|
|
Created:
14 years, 9 months ago by crawshaw2 Modified:
12 years, 5 months ago Reviewers:
CC:
golang-dev Visibility:
Public. |
path/filepath: Use a channel to return matches from Glob() instead of a slice.
For when a glob can match millions of files.
Patch Set 1 #Patch Set 2 : diff -r 65a786a78417 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 65a786a78417 https://go.googlecode.com/hg/ #
Total comments: 4
Patch Set 4 : diff -r 65a786a78417 https://go.googlecode.com/hg/ #
Total comments: 5
Patch Set 5 : diff -r 65a786a78417 https://go.googlecode.com/hg/ #Patch Set 6 : diff -r f782663275a7 https://go.googlecode.com/hg/ #Total messages: 18
|
crawshaw2
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
|
14 years, 9 months ago (2011年03月25日 17:08:36 UTC) #1 |
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
On 2011年03月25日 17:08:36, david.crawshaw wrote: > Hello mailto:golang-dev@googlegroups.com (cc: mailto:golang-dev@googlegroups.com), > > I'd like you to review this change to > https://go.googlecode.com/hg/ This is almost a question phrased as a CL. Does this make sense? If so, another question: does a chan string copy the contents of the string? It's not clear to me. If so, I should probably be using chan *string. P.s. I was unable to sign in with crawshaw@google.com, so you get my personal email address.
I'm not sure whether this should be the default. On the one hand most of the time it is overkill. On the other hand, when it's not, it's in code that you didn't write, so this would help make that code just work. In this form, Glob should return a <-chan string and do the closing itself. That will make the usage a little less awkward. Russ
Hello golang-dev@googlegroups.com, rsc (cc: golang-dev@googlegroups.com), Please take another look.
P.s. with generics I could write (predicating over T):
func collect(ch chan <-T) []T {
sl := make([]T)
for elem := range(ch) {
sl = append(sl, elem)
}
return sl
}
And then you could recover the old API via:
matches := collect(filepath.Glob("*/*/*"))
On Mar 25, 2011, at 10:52 AM, david.crawshaw@zentus.com wrote: > P.s. with generics I could write (predicating over T): > > func collect(ch chan <-T) []T { > sl := make([]T) > for elem := range(ch) { > sl = append(sl, elem) > } > return sl > } > > And then you could recover the old API via: > > matches := collect(filepath.Glob("*/*/*")) > > http://codereview.appspot.com/4276078/ True, but if f() returns a slice and then is changed to return a channel, the only change to the clients is that the the for statement's range clause goes from for _, x := range to for x := range -rob
http://codereview.appspot.com/4276078/diff/6001/src/pkg/path/filepath/match.go File src/pkg/path/filepath/match.go (right): http://codereview.appspot.com/4276078/diff/6001/src/pkg/path/filepath/match.g... src/pkg/path/filepath/match.go:210: // Glob returns the names of all files matching pattern or nil update comment http://codereview.appspot.com/4276078/diff/6001/src/pkg/path/filepath/match_t... File src/pkg/path/filepath/match_test.go (right): http://codereview.appspot.com/4276078/diff/6001/src/pkg/path/filepath/match_t... src/pkg/path/filepath/match_test.go:86: func contains(vector []string, s string) bool { this function is no longer used... http://codereview.appspot.com/4276078/diff/6001/src/pkg/path/filepath/match_t... src/pkg/path/filepath/match_test.go:97: func containsChan(ch <-chan string, s string) bool { ...so this could just be "contains".... http://codereview.appspot.com/4276078/diff/6001/src/pkg/path/filepath/match_t... src/pkg/path/filepath/match_test.go:122: if !containsChan(matches, tt.result) { ... but then it's hardly worth it. you could put the range here, but it's fine if you want a separate function
Hello golang-dev@googlegroups.com, rsc, r2, r (cc: golang-dev@googlegroups.com), Please take another look.
On 2011年03月25日 17:55:13, r2 wrote: > True, but if f() returns a slice and then is changed to return a channel, the > only change to the clients is that the the for statement's range clause goes > from > for _, x := range > to > for x := range > > -rob > I'm sure this must have been answered elsewhere, but again my search has failed: Why not have range be consistent over slices and channels, and not return an index? Then there could be an rangeIndex (or maybe, 'ranged') that gives you an index on both slices and channels.
channels are the inconsistency, not slices
http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match_... File src/pkg/path/filepath/match_test.go (right): http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match_... src/pkg/path/filepath/match_test.go:98: func collect(ch <-chan string) []string { i thought you were just going to rename containsChan to contains. that's a preferable design - no need to copy all the values just to search for them
http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.go File src/pkg/path/filepath/match.go (right): http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.... src/pkg/path/filepath/match.go:216: matches := make(chan string) Should this be a buffered channel? I often use unbuffered channels and channels of size 1, but unless I'm using a channel for rate control ("n tasks running at a time"), I'm never sure how large to make my "optimized-for-throughput" buffered channels. I end up picking some arbitrary number like 64. I'm curious what others do in cases like this.
http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.go File src/pkg/path/filepath/match.go (right): http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.... src/pkg/path/filepath/match.go:216: matches := make(chan string) Some buffering would be fine, since it allows the two halves to run more in parallel. There's no need for fine synchronization. An arbitrary number is probably fine. Any smallish value will get whatever benefit there is.
Cheers. (I just found hg upload.) http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.go File src/pkg/path/filepath/match.go (right): http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.... src/pkg/path/filepath/match.go:216: matches := make(chan string) On 2011年03月25日 20:17:14, r wrote: > Some buffering would be fine, since it allows the two halves to run more in > parallel. There's no need for fine synchronization. > > An arbitrary number is probably fine. Any smallish value will get whatever > benefit there is. Done. Though it would be nice if I could pull the arbitrary number from a constant in some package like os or sync. http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match_... File src/pkg/path/filepath/match_test.go (right): http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match_... src/pkg/path/filepath/match_test.go:98: func collect(ch <-chan string) []string { On 2011年03月25日 19:55:21, r wrote: > i thought you were just going to rename containsChan to contains. that's a > preferable design - no need to copy all the values just to search for them Then I can't use the output of the channel in the error message. I could call Glob() again for the error message, but that could hide an indeterminism bug.
Ping. Any interest in this? On 2011年03月25日 20:27:32, david.crawshaw wrote: > Cheers. (I just found hg upload.) > > http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.go > File src/pkg/path/filepath/match.go (right): > > http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match.... > src/pkg/path/filepath/match.go:216: matches := make(chan string) > On 2011年03月25日 20:17:14, r wrote: > > Some buffering would be fine, since it allows the two halves to run more in > > parallel. There's no need for fine synchronization. > > > > An arbitrary number is probably fine. Any smallish value will get whatever > > benefit there is. > > Done. Though it would be nice if I could pull the arbitrary number from a > constant in some package like os or sync. > > http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match_... > File src/pkg/path/filepath/match_test.go (right): > > http://codereview.appspot.com/4276078/diff/10002/src/pkg/path/filepath/match_... > src/pkg/path/filepath/match_test.go:98: func collect(ch <-chan string) []string > { > On 2011年03月25日 19:55:21, r wrote: > > i thought you were just going to rename containsChan to contains. that's a > > preferable design - no need to copy all the values just to search for them > > Then I can't use the output of the channel in the error message. I could call > Glob() again for the error message, but that could hide an indeterminism bug.
apparently not