Golang Tour Webcrawler exercise

Question 1

I am new in Go and for study I have to hold a presentation about concurrency in Go. I think the Golang Tour - Webcrawler exercise is a nice example to talk about that. Before I do that, it would be nice if anybody could verify if this solution fits. I assume it is correct but perhaps I have missed anything or any of you have a better alternative.

package main
import (
 "fmt"
 "sync"
 "strconv"
 "time"
)
/*
 * Data and Types
 * ===================================================================================
 */
var fetched map[string]bool // Map of fetched URLs -> true: fetched
var lock sync.Mutex // locks write access to fetched-map
var urlChan chan string // Channel to Write fetched URL
type Fetcher interface {
 // Fetch returns the body of URL and
 // a slice of URLs found on that page.
 Fetch(url string) (body string, urls []string, err error)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
 body string
 urls []string
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
 "http://golang.org/": &fakeResult{
 "The Go Programming Language",
 []string{
 "http://golang.org/pkg/",
 "http://golang.org/cmd/",
 },
 },
 "http://golang.org/pkg/": &fakeResult{
 "Packages",
 []string{
 "http://golang.org/",
 "http://golang.org/cmd/",
 "http://golang.org/pkg/fmt/",
 "http://golang.org/pkg/os/",
 },
 },
 "http://golang.org/pkg/fmt/": &fakeResult{
 "Package fmt",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
 "http://golang.org/pkg/os/": &fakeResult{
 "Package os",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
}
/*
 * End Data and Types
 * ===================================================================================
 */
/*
 * Webcrawler implementation
 * ===================================================================================
 */
func waitUntilDone(d int) {
 fMap := make(map[string]string)
 for i := 0; i < d; i++ {
 fMap[<-urlChan] = strconv.Itoa(time.Now().Nanosecond())
 }
 time.Sleep(time.Millisecond * 100)
 fmt.Println()
 fmt.Println("Fetch stats")
 fmt.Println("==================================================================")
 for k, v := range fMap {
 fmt.Println("Fetched: " + k + " after: " + v + " ns")
 }
 fmt.Println("==================================================================")
 fmt.Println()
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if fetched[url] {
 // already fetched?
 str, strArr, err = "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 }else if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 // Start fetching url concurrently
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err := fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 lock.Lock()
 fetched[url] = true
 // Unlock
 lock.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}
func StartCrawling(url string, depth int, fetcher Fetcher) {
 fmt.Println()
 fmt.Println("Start crawling ...")
 fmt.Println("==================================================================")
 go func(u string, i int, f Fetcher) {
 Crawl(u, i, f, 0)
 }(url, depth, fetcher)
}
/*
 * End Webcrawler implementation
 * ===================================================================================
 */
/*
 * Main
 * ====================================================================
 */
func main() {
 depth := len(fetcher)
 fetched = make(map[string]bool)
 url := "http://golang.org/"
 urlChan = make(chan string, len(fetcher))
 go StartCrawling(url, depth, fetcher)
 waitUntilDone(depth)
}
/*
 * End Main
 * =====================================================================
 */

Playground

Exercise link

Question 2

Welcome to Code Review! I have rolled back the last edit. Please see what you may and may not do after receiving answers .

Question 3

Hello there - like I said before, please do not update your question with "the new, improved code". If you just want to share your improvements, answer your own question and describe what steps you took to make it better. If you want to have your newly improved code reviewed, make a follow-up question and link back to this one. If you don't feel like writing a whole answer to share your newly improved code, just write a comment with the link to the new playground.

Question 4

Combine the mutex with the thing it is locking into a single struct. Often mutex elements of a struct are unnamed, since you only need to lock and unlock the whole thing

type SafeMap struct {
 sync.Mutex
 URLs map[string]bool
}

which you can then use like so:

url := "http://some_example.com"
fetched := SafeMap{URLs:map[string]bool{}}
fetched.Lock()
fetched.URLs[url] = true
fetched.Unlock()

Question 5

UPDATE: I have refactored some code and put in the suggestion above

I changed:

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if fetched[url] {
 // already fetched?
 str, strArr, err = "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 }else if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}

To:

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}

And:

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 // Start fetching url concurrently
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err := fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 lock.Lock()
 fetched[url] = true
 // Unlock
 lock.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}

To:

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 var body string; var urls [] string; var err error
 // lock
 safeMap.Lock()
 if safeMap.URLs[url] { //already fetched?
 body, urls, err =
 "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 fmt.Println(err)
 safeMap.Unlock()
 return
 }
 safeMap.URLs[url] = true
 safeMap.Unlock()
 // Start fetching url concurrentl
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err = fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 safeMap.Lock()
 safeMap.URLs[url] = true
 // Unlock
 safeMap.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}

Whole updated Code:

package main
import (
 "fmt"
 "sync"
 "strconv"
 "time"
)
/*
 * Data and Types
 * ===================================================================================
 */
type SafeMap struct{
 sync.Mutex // locks write access to fetched-map
 URLs map[string]bool // Map of fetched URLs -> true: fetched
}
var urlChan chan string // Channel to Write fetched URL
var safeMap SafeMap
type Fetcher interface {
 // Fetch returns the body of URL and
 // a slice of URLs found on that page.
 Fetch(url string) (body string, urls []string, err error)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
 body string
 urls []string
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
 "http://golang.org/": &fakeResult{
 "The Go Programming Language",
 []string{
 "http://golang.org/pkg/",
 "http://golang.org/cmd/",
 },
 },
 "http://golang.org/pkg/": &fakeResult{
 "Packages",
 []string{
 "http://golang.org/",
 "http://golang.org/cmd/",
 "http://golang.org/pkg/fmt/",
 "http://golang.org/pkg/os/",
 },
 },
 "http://golang.org/pkg/fmt/": &fakeResult{
 "Package fmt",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
 "http://golang.org/pkg/os/": &fakeResult{
 "Package os",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
}
/*
 * End Data and Types
 * ===================================================================================
 */
/*
 * Webcrawler implementation
 * ===================================================================================
 */
func waitUntilDone(d int) {
 fMap := make(map[string]string)
 for i := 0; i < d; i++ {
 fMap[<-urlChan] = strconv.Itoa(time.Now().Nanosecond())
 }
 time.Sleep(time.Millisecond * 100)
 fmt.Println()
 fmt.Println("Fetch stats")
 fmt.Println("==================================================================")
 for k, v := range fMap {
 fmt.Println("Fetched: " + k + " after: " + v + " ns")
 }
 fmt.Println("==================================================================")
 fmt.Println()
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 var body string; var urls [] string; var err error
 // lock
 safeMap.Lock()
 if safeMap.URLs[url] {
 body, urls, err =
 "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 fmt.Println(err)
 safeMap.Unlock()
 return
 }
 safeMap.URLs[url] = true
 safeMap.Unlock()
 // Start fetching url concurrentl
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err = fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 safeMap.Lock()
 safeMap.URLs[url] = true
 // Unlock
 safeMap.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}
func StartCrawling(url string, depth int, fetcher Fetcher) {
 fmt.Println()
 fmt.Println("Start crawling ...")
 fmt.Println("==================================================================")
 go func(u string, i int, f Fetcher) {
 Crawl(u, i, f, 0)
 }(url, depth, fetcher)
}
/*
 * End Webcrawler implementation
 * ===================================================================================
 */
/*
 * Main
 * ====================================================================
 */
func main() {
 depth := len(fetcher)
 safeMap = SafeMap{URLs:map[string]bool{}}
 //safeMap = SafeMap{Urls: make(map[string]bool)}
 url := "http://golang.org/"
 urlChan = make(chan string, len(fetcher))
 go StartCrawling(url, depth, fetcher)
 waitUntilDone(depth)
}
/*
 * End Main
 * =====================================================================
 */

Playground: https://play.golang.org/p/2dPgUKshm6 Exercise: https://tour.golang.org/concurrency/10

Question 6

It seems odd to me to use an algorithm that relies on knowing exactly how many URLs will need to be fetched.

You use len(fetcher) to determine your starting depth as well as the buffer size for the urlChan. So basically you're using what you know about the mock data to write code that would need to be deployed on the unknowable internet.

Thanks for sharing your solution. If you have any feedback about mine (written before I saw this), I'd be happy to hear it: My solution

matt.s matt.s 1712 silver badges7 bronze badges · Answer 1 · 2016-04-07 17:25:20Z

Combine the mutex with the thing it is locking into a single struct. Often mutex elements of a struct are unnamed, since you only need to lock and unlock the whole thing

type SafeMap struct {
 sync.Mutex
 URLs map[string]bool
}

which you can then use like so:

url := "http://some_example.com"
fetched := SafeMap{URLs:map[string]bool{}}
fetched.Lock()
fetched.URLs[url] = true
fetched.Unlock()

jnk jnk 615 bronze badges · Answer 2 · 2016-04-08 09:18:26Z

UPDATE: I have refactored some code and put in the suggestion above

I changed:

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if fetched[url] {
 // already fetched?
 str, strArr, err = "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 }else if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}

To:

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}

And:

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 // Start fetching url concurrently
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err := fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 lock.Lock()
 fetched[url] = true
 // Unlock
 lock.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}

To:

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 var body string; var urls [] string; var err error
 // lock
 safeMap.Lock()
 if safeMap.URLs[url] { //already fetched?
 body, urls, err =
 "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 fmt.Println(err)
 safeMap.Unlock()
 return
 }
 safeMap.URLs[url] = true
 safeMap.Unlock()
 // Start fetching url concurrentl
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err = fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 safeMap.Lock()
 safeMap.URLs[url] = true
 // Unlock
 safeMap.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}

Whole updated Code:

package main
import (
 "fmt"
 "sync"
 "strconv"
 "time"
)
/*
 * Data and Types
 * ===================================================================================
 */
type SafeMap struct{
 sync.Mutex // locks write access to fetched-map
 URLs map[string]bool // Map of fetched URLs -> true: fetched
}
var urlChan chan string // Channel to Write fetched URL
var safeMap SafeMap
type Fetcher interface {
 // Fetch returns the body of URL and
 // a slice of URLs found on that page.
 Fetch(url string) (body string, urls []string, err error)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
 body string
 urls []string
}
// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
 "http://golang.org/": &fakeResult{
 "The Go Programming Language",
 []string{
 "http://golang.org/pkg/",
 "http://golang.org/cmd/",
 },
 },
 "http://golang.org/pkg/": &fakeResult{
 "Packages",
 []string{
 "http://golang.org/",
 "http://golang.org/cmd/",
 "http://golang.org/pkg/fmt/",
 "http://golang.org/pkg/os/",
 },
 },
 "http://golang.org/pkg/fmt/": &fakeResult{
 "Package fmt",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
 "http://golang.org/pkg/os/": &fakeResult{
 "Package os",
 []string{
 "http://golang.org/",
 "http://golang.org/pkg/",
 },
 },
}
/*
 * End Data and Types
 * ===================================================================================
 */
/*
 * Webcrawler implementation
 * ===================================================================================
 */
func waitUntilDone(d int) {
 fMap := make(map[string]string)
 for i := 0; i < d; i++ {
 fMap[<-urlChan] = strconv.Itoa(time.Now().Nanosecond())
 }
 time.Sleep(time.Millisecond * 100)
 fmt.Println()
 fmt.Println("Fetch stats")
 fmt.Println("==================================================================")
 for k, v := range fMap {
 fmt.Println("Fetched: " + k + " after: " + v + " ns")
 }
 fmt.Println("==================================================================")
 fmt.Println()
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
 var str string
 var strArr [] string
 var err error
 if res, ok := f[url]; ok {
 str, strArr, err = res.body, res.urls, nil
 urlChan <- url
 }else {
 str, strArr, err = "", nil, fmt.Errorf("not found: %s", url)
 }
 return str, strArr, err
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, goRoutNum int) {
 if depth <= 0 {
 return
 }
 var body string; var urls [] string; var err error
 // lock
 safeMap.Lock()
 if safeMap.URLs[url] {
 body, urls, err =
 "", nil, fmt.Errorf("already fetched: %s this will be ignored", url)
 fmt.Println(err)
 safeMap.Unlock()
 return
 }
 safeMap.URLs[url] = true
 safeMap.Unlock()
 // Start fetching url concurrentl
 fmt.Println("Goroutine " + strconv.Itoa(goRoutNum) + " is fetching: " + url)
 body, urls, err = fetcher.Fetch(url)
 if err != nil {
 fmt.Println(err)
 return
 }
 // Lock map
 safeMap.Lock()
 safeMap.URLs[url] = true
 // Unlock
 safeMap.Unlock()
 fmt.Printf("found: %s %q\n", url, body)
 for i, u := range urls {
 go func(url string, goRoutNumber int) {
 Crawl(url, depth - 1, fetcher, goRoutNumber)
 }(u, i + 1)
 }
 return
}
func StartCrawling(url string, depth int, fetcher Fetcher) {
 fmt.Println()
 fmt.Println("Start crawling ...")
 fmt.Println("==================================================================")
 go func(u string, i int, f Fetcher) {
 Crawl(u, i, f, 0)
 }(url, depth, fetcher)
}
/*
 * End Webcrawler implementation
 * ===================================================================================
 */
/*
 * Main
 * ====================================================================
 */
func main() {
 depth := len(fetcher)
 safeMap = SafeMap{URLs:map[string]bool{}}
 //safeMap = SafeMap{Urls: make(map[string]bool)}
 url := "http://golang.org/"
 urlChan = make(chan string, len(fetcher))
 go StartCrawling(url, depth, fetcher)
 waitUntilDone(depth)
}
/*
 * End Main
 * =====================================================================
 */

Playground: https://play.golang.org/p/2dPgUKshm6 Exercise: https://tour.golang.org/concurrency/10

BeetleJuice BeetleJuice 5462 silver badges5 bronze badges · Answer 3 · 2021-01-07 03:23:18Z

It seems odd to me to use an algorithm that relies on knowing exactly how many URLs will need to be fetched.

You use len(fetcher) to determine your starting depth as well as the buffer size for the urlChan. So basically you're using what you know about the mock data to write code that would need to be deployed on the unknowable internet.

Thanks for sharing your solution. If you have any feedback about mine (written before I saw this), I'd be happy to hear it: My solution

Stack Exchange Network

Golang Tour Webcrawler exercise

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Golang Tour Webcrawler exercise

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions