5
\$\begingroup\$

I am new in Go and for study I have to hold a presentation about concurrency in Go. I think the Golang Tour - Webcrawler exercise is a nice example to talk about that. Before I do that, it would be nice if anybody could verify if this solution fits. I assume it is correct but perhaps I have missed anything or any of you have a better alternative.

package main
import (
 "fmt"
 "sync"
 "strconv"
 "time"
)
/*
 * Data and Types
 * ===================================================================================
 */
var fetched map[string]bool // Map of fetched URLs -> true: fetched
var lock sync.Mutex // locks write access to fetched-map
var urlChan chan string // Channel to Write fetched URL
type Fetcher interface {
 // Fetch returns the body of URL and
 // a slice of URLs found on that page.
 Fetch(url string) (body string, urls []string, err error)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
 body string
 urls []string
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
 "http://golang.org/": &fakeResult{
 "The Go Programming Language",
 []string{
 "http://golang.org/pkg/",
 "http://golang.org/cmd/",
 },
 },
 "http://golang.org/pkg/": &fakeResult{
 "Packages",
 []string{
 "http://golang.org/",
 "http://golang.org/cmd/",
 "http://golang.org/pkg/fmt/",
 "http://golang.org/pkg/os/",
 },
 },
 "http://golang.org/pkg/fmt/": &fakeResult{
 "Package fmt",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
 "http://golang.org/pkg/os/": &fakeResult{
 "Package os",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
}
/*
 * End Data and Types
 * ===================================================================================
 */
/*
 * Webcrawler implementation
 * ===================================================================================
 */
func waitUntilDone(d int) {
 fMap := make(map[string]string)
 for i := 0; i < d; i++ {
 fMap[<-urlChan] = strconv.Itoa(time.Now().Nanosecond())
 }
 time.Sleep(time.Millisecond * 100)
 fmt.Println()
 fmt.Println("Fetch stats")
 fmt.Println("==================================================================")
 for k, v := range fMap {
 fmt.Println("Fetched: " + k + " after: " + v + " ns")
 }
 fmt.Println("==================================================================")
 fmt.Println()
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if fetched[url] {
 // already fetched?
 str, strArr, err = "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 }else if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 // Start fetching url concurrently
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err := fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 lock.Lock()
 fetched[url] = true
 // Unlock
 lock.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}
func StartCrawling(url string, depth int, fetcher Fetcher) {
 fmt.Println()
 fmt.Println("Start crawling ...")
 fmt.Println("==================================================================")
 go func(u string, i int, f Fetcher) {
 Crawl(u, i, f, 0)
 }(url, depth, fetcher)
}
/*
 * End Webcrawler implementation
 * ===================================================================================
 */
/*
 * Main
 * ====================================================================
 */
func main() {
 depth := len(fetcher)
 fetched = make(map[string]bool)
 url := "http://golang.org/"
 urlChan = make(chan string, len(fetcher))
 go StartCrawling(url, depth, fetcher)
 waitUntilDone(depth)
}
/*
 * End Main
 * =====================================================================
 */

Playground

Exercise link

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Apr 7, 2016 at 16:49
\$\endgroup\$
2
  • \$\begingroup\$ Welcome to Code Review! I have rolled back the last edit. Please see what you may and may not do after receiving answers . \$\endgroup\$ Commented Apr 8, 2016 at 8:46
  • \$\begingroup\$ Hello there - like I said before, please do not update your question with "the new, improved code". If you just want to share your improvements, answer your own question and describe what steps you took to make it better. If you want to have your newly improved code reviewed, make a follow-up question and link back to this one. If you don't feel like writing a whole answer to share your newly improved code, just write a comment with the link to the new playground. \$\endgroup\$ Commented Apr 8, 2016 at 8:57

3 Answers 3

7
\$\begingroup\$

Combine the mutex with the thing it is locking into a single struct. Often mutex elements of a struct are unnamed, since you only need to lock and unlock the whole thing

type SafeMap struct {
 sync.Mutex
 URLs map[string]bool
}

which you can then use like so:

url := "http://some_example.com"
fetched := SafeMap{URLs:map[string]bool{}}
fetched.Lock()
fetched.URLs[url] = true
fetched.Unlock()
answered Apr 7, 2016 at 17:25
\$\endgroup\$
1
\$\begingroup\$

UPDATE: I have refactored some code and put in the suggestion above

I changed:

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if fetched[url] {
 // already fetched?
 str, strArr, err = "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 }else if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}

To:

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}

And:

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 // Start fetching url concurrently
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err := fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 lock.Lock()
 fetched[url] = true
 // Unlock
 lock.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}

To:

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 var body string; var urls [] string; var err error
 // lock
 safeMap.Lock()
 if safeMap.URLs[url] { //already fetched?
 body, urls, err =
 "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 fmt.Println(err)
 safeMap.Unlock()
 return
 }
 safeMap.URLs[url] = true
 safeMap.Unlock()
 // Start fetching url concurrentl
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err = fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 safeMap.Lock()
 safeMap.URLs[url] = true
 // Unlock
 safeMap.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}

Whole updated Code:

package main
import (
 "fmt"
 "sync"
 "strconv"
 "time"
)
/*
 * Data and Types
 * ===================================================================================
 */
type SafeMap struct{
 sync.Mutex // locks write access to fetched-map
 URLs map[string]bool // Map of fetched URLs -> true: fetched
}
var urlChan chan string // Channel to Write fetched URL
var safeMap SafeMap
type Fetcher interface {
 // Fetch returns the body of URL and
 // a slice of URLs found on that page.
 Fetch(url string) (body string, urls []string, err error)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
 body string
 urls []string
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
 "http://golang.org/": &fakeResult{
 "The Go Programming Language",
 []string{
 "http://golang.org/pkg/",
 "http://golang.org/cmd/",
 },
 },
 "http://golang.org/pkg/": &fakeResult{
 "Packages",
 []string{
 "http://golang.org/",
 "http://golang.org/cmd/",
 "http://golang.org/pkg/fmt/",
 "http://golang.org/pkg/os/",
 },
 },
 "http://golang.org/pkg/fmt/": &fakeResult{
 "Package fmt",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
 "http://golang.org/pkg/os/": &fakeResult{
 "Package os",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
}
/*
 * End Data and Types
 * ===================================================================================
 */
/*
 * Webcrawler implementation
 * ===================================================================================
 */
func waitUntilDone(d int) {
 fMap := make(map[string]string)
 for i := 0; i < d; i++ {
 fMap[<-urlChan] = strconv.Itoa(time.Now().Nanosecond())
 }
 time.Sleep(time.Millisecond * 100)
 fmt.Println()
 fmt.Println("Fetch stats")
 fmt.Println("==================================================================")
 for k, v := range fMap {
 fmt.Println("Fetched: " + k + " after: " + v + " ns")
 }
 fmt.Println("==================================================================")
 fmt.Println()
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 var body string; var urls [] string; var err error
 // lock
 safeMap.Lock()
 if safeMap.URLs[url] {
 body, urls, err =
 "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 fmt.Println(err)
 safeMap.Unlock()
 return
 }
 safeMap.URLs[url] = true
 safeMap.Unlock()
 // Start fetching url concurrentl
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err = fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 safeMap.Lock()
 safeMap.URLs[url] = true
 // Unlock
 safeMap.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}
func StartCrawling(url string, depth int, fetcher Fetcher) {
 fmt.Println()
 fmt.Println("Start crawling ...")
 fmt.Println("==================================================================")
 go func(u string, i int, f Fetcher) {
 Crawl(u, i, f, 0)
 }(url, depth, fetcher)
}
/*
 * End Webcrawler implementation
 * ===================================================================================
 */
/*
 * Main
 * ====================================================================
 */
func main() {
 depth := len(fetcher)
 safeMap = SafeMap{URLs:map[string]bool{}}
 //safeMap = SafeMap{Urls: make(map[string]bool)}
 url := "http://golang.org/"
 urlChan = make(chan string, len(fetcher))
 go StartCrawling(url, depth, fetcher)
 waitUntilDone(depth)
}
/*
 * End Main
 * =====================================================================
 */

Playground: https://play.golang.org/p/2dPgUKshm6 Exercise: https://tour.golang.org/concurrency/10

answered Apr 8, 2016 at 9:18
\$\endgroup\$
0
\$\begingroup\$

It seems odd to me to use an algorithm that relies on knowing exactly how many URLs will need to be fetched.

You use len(fetcher) to determine your starting depth as well as the buffer size for the urlChan. So basically you're using what you know about the mock data to write code that would need to be deployed on the unknowable internet.

Thanks for sharing your solution. If you have any feedback about mine (written before I saw this), I'd be happy to hear it: My solution

answered Jan 7, 2021 at 3:23
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.