Joe Kraus points out that the costs of doing a tech startup have dropped in recent years. Entrepreneurs are often able to build a product that would have required 3ドルM of investment a few years ago using just 100ドルk of seed capital. Jeff Clavier adds some additional thoughts.
A couple months ago, Paul Graham wrote an interesting essay, "Hiring is obsolete", about small, self-funded startups. It is also well worth reading.
Update: Don't miss Mark Fletcher's post on how he funded and built Bloglines.
Thursday, June 30, 2005
Challenges at MSNBC
Mark Glaser at OJR interviews MSNBC GM Charlie Tillinghast. Some excerpts on the "challenges" at MSNBC:
See also my previous post, "MSNBC Recommended Stories".
Microsoft and NBC Universal have been trying to get out of their TV joint venture for more than a year; the MSN portal's traffic growth and vision have lapsed; and four key people -- including the president and editor-in-chief of MSNBC.com -- have exited.And, some quotes from Charlie on MSNBC's experiments with personalized news:
Still, the site has a lot to smile about ... Despite traffic falling off at MSN.com over the past year, MSNBC.com has boosted its traffic by 12 percent to lead CNN.com for the past three months. And MSNBC.com's new redesign adds a unique recommendation engine that highlights similar stories depending on what articles you've viewed before.
"There's no doubt they're still in the game despite Yahoo's advances," said former founding editor of MSNBC.com Merrill Brown.
You'll see a box in the middle that has either Editor's Choice or Recommended Stories. After you click on seven or so stories, that will switch over to Recommended Stories. The Recommended Stories will be based on what you've been looking at ... This is an effort-free use of personalization.As MSN experiments with personalized news, it will be interesting to see if Yahoo News and Google News follow suit.
If you click on "What's This" link next to the Recommended Stories on the front page, this page will tell you what was recommended and why they're recommended.
It's not that different from the experience on commerce sites now, like Amazon, where you're shopping for products and they say 'Here are recommended products for you,' or 'People who bought this book also bought these books.' The concept isn't new, but it's new to news.
See also my previous post, "MSNBC Recommended Stories".
Wednesday, June 29, 2005
A9 Maps
The announcements are coming fast and furious these days. A9.com just launched A9.com Maps.
I am a little embarrassed to admit that I started looking at this with a sigh. Another map product, I thought? Could it really be that interesting?
But it is very much worth checking out. The folks at A9 did some amazing work integrating their local search photos into their maps.
When you bring up an address with A9 Maps, check the little box at the top that says "Mark streets with block view". A little blue line will be added to the map that shows the streets where A9 drove down the street wildly taking pictures out of both sides of the truck. Click somewhere on the blue line. The pictures will be shown in a nifty AJAX interface that allows you to virtually "drive" down the street, seeing the homes and storefronts on both sizes.
It's a neat experience and a nice interface. Certainly a lot of fun.
Whether it is practical or not, I'm not so sure. I could imagine someone who is buying a house might use it to view the surrounding neighborhood, but I was disappointed to find that A9 Maps had no coverage of the neighborhoods around the two examples I tried. I thought maybe it'd help me visualize how to get to a building in downtown Seattle that I need to get to tomorrow, but I found the images didn't give me enough of a view of the large downtown buildings to be helpful. Until the coverage improves, I fear this may be more of a toy than a tool.
Nevertheless, you have to hand it to A9. At a time when I thought Google Maps was far ahead, blazing the trail on innovation, A9 steps in and shows another path. Very clever work, folks. Congrats to the A9 team.
[spotted via O'Reilly Radar]
I am a little embarrassed to admit that I started looking at this with a sigh. Another map product, I thought? Could it really be that interesting?
But it is very much worth checking out. The folks at A9 did some amazing work integrating their local search photos into their maps.
When you bring up an address with A9 Maps, check the little box at the top that says "Mark streets with block view". A little blue line will be added to the map that shows the streets where A9 drove down the street wildly taking pictures out of both sides of the truck. Click somewhere on the blue line. The pictures will be shown in a nifty AJAX interface that allows you to virtually "drive" down the street, seeing the homes and storefronts on both sizes.
It's a neat experience and a nice interface. Certainly a lot of fun.
Whether it is practical or not, I'm not so sure. I could imagine someone who is buying a house might use it to view the surrounding neighborhood, but I was disappointed to find that A9 Maps had no coverage of the neighborhoods around the two examples I tried. I thought maybe it'd help me visualize how to get to a building in downtown Seattle that I need to get to tomorrow, but I found the images didn't give me enough of a view of the large downtown buildings to be helpful. Until the coverage improves, I fear this may be more of a toy than a tool.
Nevertheless, you have to hand it to A9. At a time when I thought Google Maps was far ahead, blazing the trail on innovation, A9 steps in and shows another path. Very clever work, folks. Congrats to the A9 team.
[spotted via O'Reilly Radar]
An API to Google Maps
Google just announced a new API to Google Maps that, with just a few lines of Javascript, allows web developers to embed maps on any website.
It even allows markers to be added to the maps, making it much easier to build mashups like housingmaps.com. HousingMaps shows Craigslist housing listings on top of Google Maps. If you haven't seen it yet, definitely check it out.
Very cool of Google to work to promote further innovation on top of Google Maps.
Update: In response, Yahoo launched the Yahoo Maps API today. At first glance, there appear to be some differences. For example, Yahoo's API issues XML. Google's is all Javascript. Yahoo doesn't allow commercial use of their API. Google's terms contain no such restrictions. Well, anyway, it's good to have options. More APIs can only be good news for developers.
It even allows markers to be added to the maps, making it much easier to build mashups like housingmaps.com. HousingMaps shows Craigslist housing listings on top of Google Maps. If you haven't seen it yet, definitely check it out.
Very cool of Google to work to promote further innovation on top of Google Maps.
Update: In response, Yahoo launched the Yahoo Maps API today. At first glance, there appear to be some differences. For example, Yahoo's API issues XML. Google's is all Javascript. Yahoo doesn't allow commercial use of their API. Google's terms contain no such restrictions. Well, anyway, it's good to have options. More APIs can only be good news for developers.
Tuesday, June 28, 2005
Yahoo gets social with MyWeb
A big day of announcements for Google, and Yahoo appears to have some announcements of their own.
Yahoo just announced the launch of MyWeb 2.0, "search with a little help from your friends." The idea appears to be that you tag a bunch of web pages and search results, share them with all your friends, and everyone in your social network gets better results.
A lovely idea in theory, but I think it has some problems.
First, this is a hell of a lot of work. Not only do I have to list my entire social network at Yahoo, but also I have to manually tag vast numbers of web pages. Who has that kind of time? The benefits would need to be absolutely extraordinary to convince people to devote this much effort to seeing improved search results.
Second, as Chris Anderson said, social networks don't work well for personalization because "the assumption that there's a correlation between the people I like and the products I like is a flawed one." Personalized search should find like-minded people from the entire community who can help you find what you need.
Third, as John Dvorak said, any mainstream tagging system is "doomed to failure" because it will succumb to "vandalism and spam." I've already seen people talking about manipulating del.icio.us to drive traffic to their site. This problem will become much worse if tagging systems become popular.
I admire Yahoo's innovative work and their attempt to build on the early success of services like del.icio.us, but I think this one is going to be a hard slog for them. Yahoo MyWeb 2.0 might win some converts in the early adopter crowd, but it isn't a system built for the mainstream. Grandma won't be coming to Yahoo MyWeb.
See also reviews from Chris Sherman and Michael Bazeley.
Update: Good comments on MyWeb 2.0 from John Battelle.
Update: Danny Sullivan follows up with two long articles, "Yahoo My Web: An eBay For Knowledge" and "Yahoo My Web Tagging & Why (So Far) It Sucks".
Update: Flickr co-founder Caterina Fake rebuts my post on the Yahoo MyWeb 2.0 Blog. Caterina certainly knows what she's talking about, so make sure to give her post a read.
Update: Five months later, I ask, "How is Yahoo My Web 2.0 doing?"
Yahoo just announced the launch of MyWeb 2.0, "search with a little help from your friends." The idea appears to be that you tag a bunch of web pages and search results, share them with all your friends, and everyone in your social network gets better results.
A lovely idea in theory, but I think it has some problems.
First, this is a hell of a lot of work. Not only do I have to list my entire social network at Yahoo, but also I have to manually tag vast numbers of web pages. Who has that kind of time? The benefits would need to be absolutely extraordinary to convince people to devote this much effort to seeing improved search results.
Second, as Chris Anderson said, social networks don't work well for personalization because "the assumption that there's a correlation between the people I like and the products I like is a flawed one." Personalized search should find like-minded people from the entire community who can help you find what you need.
Third, as John Dvorak said, any mainstream tagging system is "doomed to failure" because it will succumb to "vandalism and spam." I've already seen people talking about manipulating del.icio.us to drive traffic to their site. This problem will become much worse if tagging systems become popular.
I admire Yahoo's innovative work and their attempt to build on the early success of services like del.icio.us, but I think this one is going to be a hard slog for them. Yahoo MyWeb 2.0 might win some converts in the early adopter crowd, but it isn't a system built for the mainstream. Grandma won't be coming to Yahoo MyWeb.
See also reviews from Chris Sherman and Michael Bazeley.
Update: Good comments on MyWeb 2.0 from John Battelle.
Update: Danny Sullivan follows up with two long articles, "Yahoo My Web: An eBay For Knowledge" and "Yahoo My Web Tagging & Why (So Far) It Sucks".
Update: Flickr co-founder Caterina Fake rebuts my post on the Yahoo MyWeb 2.0 Blog. Caterina certainly knows what she's talking about, so make sure to give her post a read.
Update: Five months later, I ask, "How is Yahoo My Web 2.0 doing?"
More on Google personalized search
Sep Kamvar from the Personalized Search Team at Google has the post on the Google Blog about the launch of Google's personalized search.
Sep Kamvar has posted a list of his publications, including a short paper with an overview of some techniques for personalized search called "An Analytical Comparison of Approaches to Personalizing PageRank". Note that one of the co-authors of this paper is Glen Jeh. Glen Jeh was lead author on the "Scaling Personalized Web Search" paper that describe the technology behind Kaltix, the personalized search company acquired by Google about two years ago. It seems all roads lead back to Kaltix.
After seeing Sep's post, I have to say that I am in awe of the boldness of this rollout. Google isn't just sticking their personalized search in a corner of Google Labs. No, as long as you enable Google's My Search History feature (which is off by default), every search you do at google.com is personalized. Wow, very cool, and surprisingly aggressive.
As for the feature itself, it's a little hard to tell. First of all, not all searches are personalized. I tried a search for "news" which was not personalized. I tried another search for "personalized search" and that was personalized. How could I tell? The only indication is a subtle link on the right upper corner of the page that says, "Turn off Personalized Search for these results". Clicking that link yields a page with what appears to be the same search results in a very slightly different order, items moved up or down by just one or two positions.
It's fine to be subtle, but this might be a bit too subtle. Unlike Findory, there is no indication of which search results are personalized. Unlike Findory, there is no explanation of why a search result was reranked. It seems confusing. It basically says, "Don't look at what we're doing. Just trust us. We'll make your search better. Don't worry your pretty little head about it."
The problem with this is that, as good as personalization is, it isn't perfect. When you make a mistake -- and you will make mistakes -- it's important to explain to users why you did what you did and give them a way to fix it. Both Amazon and Findory explain why they made a recommendation and give users an opportunity to change the personalization.
It is possible that Google doesn't explain their recommendations because they can't. Only some techniques for personalization are able to easily provide clear explanations. If Google is using the subject-based Kaltix techniques, for example, it would be difficult to provide explanations, since they would be using completely different relevance ranks from the generic search.
Regardless, I am amazed and impressed by Google's aggressive move into personalized search. Personalized search is the future, and Google just took one giant leap forward. Yahoo, MSN, Ask, and AOL suddenly look to be far, far behind.
See also my post from earlier today, "A real personalized search from Google".
Sep Kamvar has posted a list of his publications, including a short paper with an overview of some techniques for personalized search called "An Analytical Comparison of Approaches to Personalizing PageRank". Note that one of the co-authors of this paper is Glen Jeh. Glen Jeh was lead author on the "Scaling Personalized Web Search" paper that describe the technology behind Kaltix, the personalized search company acquired by Google about two years ago. It seems all roads lead back to Kaltix.
After seeing Sep's post, I have to say that I am in awe of the boldness of this rollout. Google isn't just sticking their personalized search in a corner of Google Labs. No, as long as you enable Google's My Search History feature (which is off by default), every search you do at google.com is personalized. Wow, very cool, and surprisingly aggressive.
As for the feature itself, it's a little hard to tell. First of all, not all searches are personalized. I tried a search for "news" which was not personalized. I tried another search for "personalized search" and that was personalized. How could I tell? The only indication is a subtle link on the right upper corner of the page that says, "Turn off Personalized Search for these results". Clicking that link yields a page with what appears to be the same search results in a very slightly different order, items moved up or down by just one or two positions.
It's fine to be subtle, but this might be a bit too subtle. Unlike Findory, there is no indication of which search results are personalized. Unlike Findory, there is no explanation of why a search result was reranked. It seems confusing. It basically says, "Don't look at what we're doing. Just trust us. We'll make your search better. Don't worry your pretty little head about it."
The problem with this is that, as good as personalization is, it isn't perfect. When you make a mistake -- and you will make mistakes -- it's important to explain to users why you did what you did and give them a way to fix it. Both Amazon and Findory explain why they made a recommendation and give users an opportunity to change the personalization.
It is possible that Google doesn't explain their recommendations because they can't. Only some techniques for personalization are able to easily provide clear explanations. If Google is using the subject-based Kaltix techniques, for example, it would be difficult to provide explanations, since they would be using completely different relevance ranks from the generic search.
Regardless, I am amazed and impressed by Google's aggressive move into personalized search. Personalized search is the future, and Google just took one giant leap forward. Yahoo, MSN, Ask, and AOL suddenly look to be far, far behind.
See also my post from earlier today, "A real personalized search from Google".
A real personalized search from Google
Danny Sullivan reports that Google is about to launch a real personalized search, web search that shows different search results to different people based on your past behavior.
I suspect Google's personalized search is layered on top of their old personalized search which itself was layered on top of technology from Kaltix. So, if I were to guess, it probably works by building a high-level subject profile of your interests (e.g. sports, computers) from your history and biasing the search results toward those interests. That would be similar to the old personalized search where you had to explicitly specify that profile, but now the profile is generated implicitly using your search history.
In contrast, Findory's personalized web search is fine-grained, using information about individual pages you have viewed instead of high level subject interests. Our approach should allow the personalization to focus in on much more detailed interests and make more useful and relevant adjustments to the search results.
I think we will see the other search engines also move toward personalized web search. Many of the search engines -- Google, A9, Ask, Yahoo -- have a search history feature that helps you keep track of the searches you've made and search results you've seen. Personalized search is a natural extension of search history. As I said when Google Search History launched:
See also my previous posts, "Why do personalized search?" and "A real personalized search from Google?".
[The new] Google Personalized Search uses My Search History data to refine your results based on your searching habits.Danny also quotes from a page at Google that apparently says:
Personalized Search is an improvement to Google search that orders your search results based on what you've searched for before. Learning from your history of searches and search results you've clicked on, Personalized Search brings certain results closer to the top when it's clear they're most relevant to you.It is excellent to see Google doing a real personalized search. Until now, tiny little Findory has been the only commercial search engine doing real web personalized search, changing search results based on your past behavior.
I suspect Google's personalized search is layered on top of their old personalized search which itself was layered on top of technology from Kaltix. So, if I were to guess, it probably works by building a high-level subject profile of your interests (e.g. sports, computers) from your history and biasing the search results toward those interests. That would be similar to the old personalized search where you had to explicitly specify that profile, but now the profile is generated implicitly using your search history.
In contrast, Findory's personalized web search is fine-grained, using information about individual pages you have viewed instead of high level subject interests. Our approach should allow the personalization to focus in on much more detailed interests and make more useful and relevant adjustments to the search results.
I think we will see the other search engines also move toward personalized web search. Many of the search engines -- Google, A9, Ask, Yahoo -- have a search history feature that helps you keep track of the searches you've made and search results you've seen. Personalized search is a natural extension of search history. As I said when Google Search History launched:
Keeping search and clickthrough history is a first step toward personalized search. The next big step is to use this data to reorder search results, making the results more relevant to your particular interests and needs.Danny Sullivan has long predicted this move toward personalized search:
This is where search is eventually headed. Everything will be personalized to make you feel like you have a more personal relationship with the Web site.And I have as well:
Personalized search is inevitable. With only one general relevance rank, it is increasingly difficult to improve search quality because not everyone agrees on how relevant a particular page is to a particular search. At some point, to get further improvements, relevance rank will have to be customized to each person's definition of relevance. When that happens, you have personalized search.See also quotes from Google Director Marissa Mayer and Google CEO Eric Schmidt on personalized search at Google.
See also my previous posts, "Why do personalized search?" and "A real personalized search from Google?".
Get your free Google Earth
Nathan Weinberg reports that Google Earth has launched and is free.
It will be interesting to see how much of this gets integrated into Google Maps.
Update: Bill Kilday has the post about Google Earth on the official Google weblog.
Update: I have installed it and... wow. Wow, wow, wow. This is like Google Maps on steroids. The flybys, rotations, and tilts combined create a jaw-dropping experience. Go download it now. It really is remarkable.
Update: Rob Pegoraro at the Washington Post has a detailed review of Google Earth. [Found on Findory]
The program lets you do smooth sailing flybyes of the entire Earth. You can easily fly to any spot on the globe, by entering any associated data, like street addresses, place names or lat/long coordinates.Nathan posted some amazing screenshots of the 3D buildings, fly through directions, and tilted satellite image views.
It will be interesting to see how much of this gets integrated into Google Maps.
Update: Bill Kilday has the post about Google Earth on the official Google weblog.
Update: I have installed it and... wow. Wow, wow, wow. This is like Google Maps on steroids. The flybys, rotations, and tilts combined create a jaw-dropping experience. Go download it now. It really is remarkable.
Update: Rob Pegoraro at the Washington Post has a detailed review of Google Earth. [Found on Findory]
Monday, June 27, 2005
Ads should not be annoying
Gregory Lamb at the Christian Science Monitor writes about Google, "the world's most intriguing company." A light article, but interesting here and there.
Here's a good excerpt about targeted advertising:
[CSM article via Brad Hill]
Here's a good excerpt about targeted advertising:
Google believes it can target ads so specifically to each user that the ads will been seen as valuable content, not annoyances. After all, people like to read catalogs -- collections of ads -- points out Google cofounder Sergey Brin. The more knowledge Google has about each user, the more it can make the online experience convenient and productive.See also my previous post, "Make advertising useful".
[CSM article via Brad Hill]
Saturday, June 25, 2005
MSNBC Recommended Stories
MSNBC just launched "Recommended Stories", a personalized list of stories selected using your reading history on the MSNBC site. From their about page:
The feature sounds somewhat similar to what was launched in MSN Newsbot about a year ago.
[via CyberJournalist]
After you read seven or more stories, the ["Editor's Choices"] box automatically changes to "Recommended Stories" and contains headlines selected according to the types of stories you read most often.On a help page on the site, MSNBC explains that they do the recommendations using text analysis.
The feature sounds somewhat similar to what was launched in MSN Newsbot about a year ago.
[via CyberJournalist]
MSN Search advanced queries
Erik Selberg (creator of Metacrawler, one of the first metasearch engines) posts on the MSN Search blog about the new advanced operators (e.g. InURL, filetype, etc.) for MSN Search.
Thursday, June 23, 2005
Yahoo testing ads targeted to behavior
Brian Morrissey at AdWeek reports that:
If you're curious about some of the details, Revenue Science has an overview of their technology on their website.
By the way, a month ago, Findory launched our personalized advertising. It is relevant, useful, targeted advertising selected by paying attention to what you have read at Findory. Unlike other efforts out there, our advertising is fine-grained and completely automated. It surfaces ads from a large pool of advertising content that are most likely to be of interest to the reader.
[via PaidContent]
Yahoo has begun testing a program to show text listings on Web pages based on user behavior ... in a pilot program with Revenue Science.I don't quite agree with Omar on this one. I think sites, especially heavily dynamic sites, are best served by focusing on both the user and the content. Both have a story to tell. Both are useful when you are trying to find relevant, interesting advertising for the reader.
Omar Tawakol, Revenue Science's SVP of Marketing, said prior site behavior often yields better results than page content ... "[Many] sites are better served by focusing on the user, not what's on the page."
If you're curious about some of the details, Revenue Science has an overview of their technology on their website.
By the way, a month ago, Findory launched our personalized advertising. It is relevant, useful, targeted advertising selected by paying attention to what you have read at Findory. Unlike other efforts out there, our advertising is fine-grained and completely automated. It surfaces ads from a large pool of advertising content that are most likely to be of interest to the reader.
[via PaidContent]
Recommending websites
Jon Udell says he is prototyping a simple website recommendation system built on top of del.icio.us.
If you want to play around with some website recommender systems, Spurl and StumbleUpon are both fun. A9 also tries to do website recommendations in their "Discover" feature.
If you want to play around with some website recommender systems, Spurl and StumbleUpon are both fun. A9 also tries to do website recommendations in their "Discover" feature.
Wednesday, June 22, 2005
Louis Monier going to Google?
Louis Monier -- founder and former CTO of AltaVista, current Director of Advanced Technologies at eBay, and all around search guru -- is rumored to be leaving eBay and going to Google.
For a peek at Louis Monier philosophy toward search, see what he said at last year's Web 2.0 conference:
[via Matt Marshall and Danny Sullivan]
Update: John Battelle interviews Louis, confirms he is going to Google, and has some great details on why. Some excerpts:
For a peek at Louis Monier philosophy toward search, see what he said at last year's Web 2.0 conference:
Under one percent of the public use any of the advanced features that many search engines offer. Louis Monier, director of eBay's Advanced Technology Group, said that enhancements to search cannot depend on training users to do more. Instead, he suggested, the metaphor is that you bring them the dish that they want but you also bring other dishes that they may be interested in.Search should be simple. Search should be easy. Search should be helpful.
[via Matt Marshall and Danny Sullivan]
Update: John Battelle interviews Louis, confirms he is going to Google, and has some great details on why. Some excerpts:
I'm very tempted to play with radically new stuff: satellites images, machine translation, ways to extract knowledge from giant bodies of data ... who knows what else? And frankly, I'm dying to peek under the hood and see the infrastructure [Google has] created. For someone like me, it's the ultimate Christmas toy.The potential of the data, creating knowledge from noise, that is what excites Louis about Google.
I find the most interesting problem in search is to think of it as a dialog rather than a one-shot thing.
I'm fascinated by the many ways we can extract real knowledge from billions of tidbits, whether they'd be Web pages, queries, links, reviews, social networks... We have a few tools today, mostly statistics to isolate repeating data from the noise, but I think we will eventually go much further. What we need are generic pattern recognition engines.
Tuesday, June 21, 2005
MSN Search and Learning to Rank
In a post on MSN Search's Weblog, Ken Moss (GM, MSN Search) says:
While applying machine learning techniques to relevance rank for web search is common, using neural networks is not. I am surprised to see neural networks used as part of the relevance rank in a system of this size and scope.
On an interesting side note, one of the co-authors on the paper, Tal Shaked, is now at Google. Tal appears to have been a PhD student of Oren Etzioni.
Update: A couple people ([1] [2]) have asked for someone to take this rather cryptic paper and translate it into something resembling English. I'm a geek, so I speak Geekish, but I'll do my best to translate this into English.
At a high level, the idea is to learn what search result documents are relevant to specific search queries. We're doing this so we can reorder the search results and put the most interesting ones up at the top.
This paper is using a neural network for the learning. Neural networks are pretty simple, no magic here. We take a bunch of data, "propagate" it through the network (basically, take a bunch of weighted sums of the inputs and munch them together), and get values out of the network.
This is supervised learning, so we start with a bunch of data that says things like, "For a search for 'personalized news', the most relevant search result is Findory.com." We then take that and a bunch of other data, run it through our network, and try to teach our system to do the right thing, to always say "Findory.com" when we ask what the most relevant search result is for a query for "personalized news".
The tricky part of this is "training" the network, which means learning what all the weights should be on all the links in the network. The cryptic parts of the paper (sections 3-5) are discussing the details of how they train the network. It's not really important for our purposes. What we care about is whether they were successful in finding weights that allow them to predict how relevant a document is to a given search query.
Some of the details of what they did are pretty interesting. For example, in Section 6.1, they say they "use 569 features" of documents as part of the input to their network. What that means is that they summarize each document by a list of 569 generalized properties of the document and then predict the relevance of the document using those properties. At least one of Chris Burges' other papers is on dimensionality reduction, so I assume these 569 features are automatically summarized from the documents using a preprocessing step, not simple features like the size of the document.
In plain English, they're not trying to predict how relevant each individual document is to each search query. They're trying to predict how certain features of documents determine the relevance of those documents to various search queries.
In Section 6.1, they describe their training set, just 17k queries with 1..5 labels for the relevance of just some of the documents. That's not a lot, especially because they say (in Section 6.2) that "only approximately 1%" of the documents are labeled. However, in a production system, you might be able to supplement this data using data from the logs. For example, if a lot of people click on "Geeking with Greg" on a search for "geeking", that document probably is relevant.
Also in Section 6.1, they say, "We chose to compute the NDCG at rank 15, a little beyond the set of documents initially viewed by most users." I think what this means is that the system implemented for this paper is a post-processing step over the normal search results. So, if you do a query at search.msn.com for "geeking", you get back 32981 results. Before you get to see the first page of results, this system would look at the first 15 of them and rerank them, possibly moving a more relevant result up to the #1 slot. Looking at only the first 15 results helps explain how the system is scalable, but it does limit the power of the system to surface relevant documents. However, it might be possible to integrate this neural network into the build of the search indexes, allowing it to look at many more documents.
In Section 6.2, they have a couple tables showing the predictive accuracy of the system, which appears to be under 50%. That accuracy seems pretty mediocre to me, but it's hard to tell without understanding the accuracy of other machine learning approaches. It is unfortunate that the paper doesn't spend more time on this. Offhand, it is not clear to me that a neural network is the best tool for this task, and the paper does little to address that question.
In collaboration with Chris Burges and other friends from Microsoft Research, we now have a brand new ranker. The new ranker has improved our relevance and perhaps most importantly gives us a platform we think we can move forward on quicker than before. This new ranker also is based on technology with an awesome name -- it's a "Neural Net."Microsoft Researcher Chris Burges was the lead author on a 2005 paper called "Learning to Rank using Gradient Descent" that does appear to try to use neural networks for relevance rank.
While applying machine learning techniques to relevance rank for web search is common, using neural networks is not. I am surprised to see neural networks used as part of the relevance rank in a system of this size and scope.
On an interesting side note, one of the co-authors on the paper, Tal Shaked, is now at Google. Tal appears to have been a PhD student of Oren Etzioni.
Update: A couple people ([1] [2]) have asked for someone to take this rather cryptic paper and translate it into something resembling English. I'm a geek, so I speak Geekish, but I'll do my best to translate this into English.
At a high level, the idea is to learn what search result documents are relevant to specific search queries. We're doing this so we can reorder the search results and put the most interesting ones up at the top.
This paper is using a neural network for the learning. Neural networks are pretty simple, no magic here. We take a bunch of data, "propagate" it through the network (basically, take a bunch of weighted sums of the inputs and munch them together), and get values out of the network.
This is supervised learning, so we start with a bunch of data that says things like, "For a search for 'personalized news', the most relevant search result is Findory.com." We then take that and a bunch of other data, run it through our network, and try to teach our system to do the right thing, to always say "Findory.com" when we ask what the most relevant search result is for a query for "personalized news".
The tricky part of this is "training" the network, which means learning what all the weights should be on all the links in the network. The cryptic parts of the paper (sections 3-5) are discussing the details of how they train the network. It's not really important for our purposes. What we care about is whether they were successful in finding weights that allow them to predict how relevant a document is to a given search query.
Some of the details of what they did are pretty interesting. For example, in Section 6.1, they say they "use 569 features" of documents as part of the input to their network. What that means is that they summarize each document by a list of 569 generalized properties of the document and then predict the relevance of the document using those properties. At least one of Chris Burges' other papers is on dimensionality reduction, so I assume these 569 features are automatically summarized from the documents using a preprocessing step, not simple features like the size of the document.
In plain English, they're not trying to predict how relevant each individual document is to each search query. They're trying to predict how certain features of documents determine the relevance of those documents to various search queries.
In Section 6.1, they describe their training set, just 17k queries with 1..5 labels for the relevance of just some of the documents. That's not a lot, especially because they say (in Section 6.2) that "only approximately 1%" of the documents are labeled. However, in a production system, you might be able to supplement this data using data from the logs. For example, if a lot of people click on "Geeking with Greg" on a search for "geeking", that document probably is relevant.
Also in Section 6.1, they say, "We chose to compute the NDCG at rank 15, a little beyond the set of documents initially viewed by most users." I think what this means is that the system implemented for this paper is a post-processing step over the normal search results. So, if you do a query at search.msn.com for "geeking", you get back 32981 results. Before you get to see the first page of results, this system would look at the first 15 of them and rerank them, possibly moving a more relevant result up to the #1 slot. Looking at only the first 15 results helps explain how the system is scalable, but it does limit the power of the system to surface relevant documents. However, it might be possible to integrate this neural network into the build of the search indexes, allowing it to look at many more documents.
In Section 6.2, they have a couple tables showing the predictive accuracy of the system, which appears to be under 50%. That accuracy seems pretty mediocre to me, but it's hard to tell without understanding the accuracy of other machine learning approaches. It is unfortunate that the paper doesn't spend more time on this. Offhand, it is not clear to me that a neural network is the best tool for this task, and the paper does little to address that question.
My AOL and MSN's start.com converge
It's interesting to compare the screenshot of what the new My AOL will look like with MSN's (IE only) start.com/3 prototype.
Both are web-based feed readers and appear to be remarkably similar in UI and apparent functionality. Both appear to focus on useful defaults to provide a good experience for people who don't bother doing any customization.
Update: More details on AOL's plans at BetaNews and Slashdot.
Both are web-based feed readers and appear to be remarkably similar in UI and apparent functionality. Both appear to focus on useful defaults to provide a good experience for people who don't bother doing any customization.
Update: More details on AOL's plans at BetaNews and Slashdot.
Monday, June 20, 2005
MSN Local in beta
Gary Price reports that MSN just launched MSN Local.
Not as feature rich as offerings from Yahoo and Google, but I was surprised to see that the maps included the nifty interactive click-and-drag feature that Google Maps has. For example, try a MSN Local search for "Victrola Seattle, WA" and then click and drag on the map.
Still missing are the very cool detailed information and reviews that you can find for the same search on Google Local or Yahoo Local.
Not as feature rich as offerings from Yahoo and Google, but I was surprised to see that the maps included the nifty interactive click-and-drag feature that Google Maps has. For example, try a MSN Local search for "Victrola Seattle, WA" and then click and drag on the map.
Still missing are the very cool detailed information and reviews that you can find for the same search on Google Local or Yahoo Local.
Saturday, June 18, 2005
Launch, learn, and repeat
In his post, "Stealth startups suck", Mark Fletcher (CEO of Bloglines) gives some good reasons why startups should move quickly:
Launching early and often is particularly important when you are exploring a new space. You don't really know what works or what doesn't. No one does. How can you find out? Launch something, test it, learn, and iterate. Keep working and improving.
For one of many examples of this at Findory, we launched personalized advertising about two weeks ago. Since then, we have quietly tested three different variations on our advertising engine. We watch the reaction from our readers. We pour over the data. We learn what works and what doesn't. And it just keeps getting better.
Launch, learn, and repeat. It's the cycle of innovation.
Update: Good comments on Mark's post from Chris DiBona and Om Malik.
- First mover advantage is important.
It forces you to focus on the key functionality of the site.
The sooner you get something out there, the sooner you'll start getting feedback from users.
Launching early and often is particularly important when you are exploring a new space. You don't really know what works or what doesn't. No one does. How can you find out? Launch something, test it, learn, and iterate. Keep working and improving.
For one of many examples of this at Findory, we launched personalized advertising about two weeks ago. Since then, we have quietly tested three different variations on our advertising engine. We watch the reaction from our readers. We pour over the data. We learn what works and what doesn't. And it just keeps getting better.
Launch, learn, and repeat. It's the cycle of innovation.
Update: Good comments on Mark's post from Chris DiBona and Om Malik.
Friday, June 17, 2005
Google Wallet
Gary Price quotes a Wall Street Journal article that reports that Google is working on a PayPal-like payment system called Google Wallet.
If you can't get to the subscription-only WSJ article, there's also an article on Reuters.
A curious move by Google but, no matter how you look at it, it can't be good news for eBay's PayPal service.
See also my previous posts, Google vs. eBay and Google and eBay pursue sellers.
Update: Some good thoughts on Google Wallet over at TechDirt.
Update: There now is coverage of the story in several newspapers, but none of the articles expand much on the Reuters article.
Update: John Battelle quotes analyst Safa Rashtchy as saying that Google not only will be launching a payment service, but also will be launching a sophisticated "listing product" that is "similar to Craigslist but much more powerful."
Update: Charlene Li speculates that Google is going after micropayments.
If you can't get to the subscription-only WSJ article, there's also an article on Reuters.
A curious move by Google but, no matter how you look at it, it can't be good news for eBay's PayPal service.
See also my previous posts, Google vs. eBay and Google and eBay pursue sellers.
Update: Some good thoughts on Google Wallet over at TechDirt.
Update: There now is coverage of the story in several newspapers, but none of the articles expand much on the Reuters article.
Update: John Battelle quotes analyst Safa Rashtchy as saying that Google not only will be launching a payment service, but also will be launching a sophisticated "listing product" that is "similar to Craigslist but much more powerful."
Update: Charlene Li speculates that Google is going after micropayments.
A new MSN Shopping
Deborah Rothberg at Microsoft Watch reports that MSN will be launching a new shopping site soon that "will go head-to-head with offerings from Google, Yahoo, eBay and Amazon.com."
I have a few quotes in the article, mostly about getting authoritative product information and reviews. More on that in some of my previous posts ([1] [2] [3]).
[Found on Findory]
I have a few quotes in the article, mostly about getting authoritative product information and reviews. More on that in some of my previous posts ([1] [2] [3]).
[Found on Findory]
Wednesday, June 15, 2005
Going to Gnomedex
I'll be at Gnomedex here in Seattle June 23-25.
It promises to be a good time. Speakers and attendees include John Battelle, David Sifry, Bob Wyman, Scott Rafer, Scott Johnson, Dan Gillmor, Mark Fletcher, Robert Scoble, JD Lasica, Steve Rubel, Steve Gillmor, Scott Gatz, Greg Reinacker, Brad Feld, and many others.
Looking forward to it!
Update: It was a fun, lighthearted conference. It was great finally meeting John Battelle, David Sifry (Technorati), Bob Wyman (PubSub), Scott Rafer (Feedster), Scott Johnson (Feedster), Dan Gillmor, Mark Fletcher (Bloglines), Robert Scoble, Steve Rubel, Dave Winer, and Steve Gillmor in person. Microsoft and Yahoo were here in strength, but I was surprised to see almost no one from Google.
It promises to be a good time. Speakers and attendees include John Battelle, David Sifry, Bob Wyman, Scott Rafer, Scott Johnson, Dan Gillmor, Mark Fletcher, Robert Scoble, JD Lasica, Steve Rubel, Steve Gillmor, Scott Gatz, Greg Reinacker, Brad Feld, and many others.
Looking forward to it!
Update: It was a fun, lighthearted conference. It was great finally meeting John Battelle, David Sifry (Technorati), Bob Wyman (PubSub), Scott Rafer (Feedster), Scott Johnson (Feedster), Dan Gillmor, Mark Fletcher (Bloglines), Robert Scoble, Steve Rubel, Dave Winer, and Steve Gillmor in person. Microsoft and Yahoo were here in strength, but I was surprised to see almost no one from Google.
The resurgence of AOL
David Card (VP, Jupiter Research) writes about AOL.com's upcoming re-launch and the future of AOL:
- Objective and Strategy: Continue winding down the dial-up access business profitably (it throws off tons of cash) while turning into the Number Two portal after Yahoo.
One of the key challenges is getting non-members to use AOL.com as a hub.
AOL is positioned to be the leader in RSS among the big portals, search engines, and Internet media companies ... If AOL does it right, it could teach a lot of mainstream users to use [RSS].
Tuesday, June 14, 2005
What is Findory?
When I talk to people who haven't seen Findory before, I describe Findory as a personalized newspaper, a newspaper that learns what you like. "It is as if the newspaper on your front porch was different than your neighbors," I say, "each individualized copy emphasizing the news that is important to you."
But this is merely a description of Findory's current website. Where is Findory going? What are we building? What is Findory?
Findory is personalizing information. You are flooded with information in your daily life. There are hundreds of messages, thousands of news sources, millions of products, billions of web pages, all screaming for your attention. Personalized information provides focus. It surfaces the information you need.
You might ask, "Why can't I search for what I need?" Sometimes you can, sometimes you can't. Search works well when you already know exactly what you want. It works poorly when you don't know or can't say exactly what you want. For example, you can't search for "news that is important to me" or "weblogs that I will find interesting." Personalization learns from your behavior and helps you discover what you want.
At its core, Findory matches content to interested audiences. All information is content. News, weblogs, and advertising are a first step. Every information stream can and will be personalized.
What is Findory? Personalized information. That is Findory.
But this is merely a description of Findory's current website. Where is Findory going? What are we building? What is Findory?
Findory is personalizing information. You are flooded with information in your daily life. There are hundreds of messages, thousands of news sources, millions of products, billions of web pages, all screaming for your attention. Personalized information provides focus. It surfaces the information you need.
You might ask, "Why can't I search for what I need?" Sometimes you can, sometimes you can't. Search works well when you already know exactly what you want. It works poorly when you don't know or can't say exactly what you want. For example, you can't search for "news that is important to me" or "weblogs that I will find interesting." Personalization learns from your behavior and helps you discover what you want.
At its core, Findory matches content to interested audiences. All information is content. News, weblogs, and advertising are a first step. Every information stream can and will be personalized.
What is Findory? Personalized information. That is Findory.
Friday, June 10, 2005
What the long tail is not
Don't miss Chris Anderson's post about what the long tail is and is not.
Technorati Beta and surfacing data
The Technorati public beta has an attractive new design, but the most interesting part of it for me is the prominent placement of "most popular" news, books, and movies.
The clever feature shows books, movies, and news stories that are getting a lot of discussion in weblogs with links out to the weblog posts. Cool, fun, and useful.
In an earlier post, I wrote that, with the imminent entry of the search giants, blog search soon will become crowded. Technorati needs to differentiate themselves. How can they do that?
[via David Sifry]
The clever feature shows books, movies, and news stories that are getting a lot of discussion in weblogs with links out to the weblog posts. Cool, fun, and useful.
In an earlier post, I wrote that, with the imminent entry of the search giants, blog search soon will become crowded. Technorati needs to differentiate themselves. How can they do that?
- [Technorati is] swimming in an ocean of information. If they can surface the right data to the right people, bubble up the news people need, then they'd have something truly unusual.
[via David Sifry]
Disintermediation of eBay
On eBay's recent acquisition of Shopping.com, David Beisel writes:
Update: Two years later, a BusinessWeek article reports the eBay "magic is gone ... Shoppers are simply not buying all the inventory anymore. Some items languish without a single bidder. Many shoppers opt for other sites including Amazon.com, use sophisticated search engines such as Google and Yahoo!, or head to store sites directly."
- Consumers are heading from Google or comparison engines straight to merchants’ sites that have “graduated” from the eBay platform ... eBay is cut out of the process entirely.
eBay’s acquisition of Shopping.com was partially a defensive power play against Google, as a way to capture searching shoppers before the traffic is sent elsewhere.
Update: Two years later, a BusinessWeek article reports the eBay "magic is gone ... Shoppers are simply not buying all the inventory anymore. Some items languish without a single bidder. Many shoppers opt for other sites including Amazon.com, use sophisticated search engines such as Google and Yahoo!, or head to store sites directly."
Wednesday, June 08, 2005
3D Google Maps
Tom Foremski reports that:
Update: Brad Hill has a post with links to details of how all this works, including pictures and a technical paper (PDF).
Update: Danny Sullivan points out that Google's Keyhole software already does 3D city maps.
- Google plans to use trucks equipped with lasers and digital photographic equipment to create a realistic 3D online version of San Francisco, and eventually other major US cities.
The move would trump Amazon's A9 service, which offers two-dimensional photos of buildings on US city streets.
Update: Brad Hill has a post with links to details of how all this works, including pictures and a technical paper (PDF).
Update: Danny Sullivan points out that Google's Keyhole software already does 3D city maps.
Monday, June 06, 2005
Yahoo Auctions is now free
Yahoo Auctions is free as of today, no listing or closing fees.
Both Amazon and Yahoo should make their auctions free or near free. Auctions are not that much different than newspaper classifieds. As Craigslist has shown, the classified advertising market is ripe for disruption. Amazon and Yahoo have a similar opportunity in online auctions.
The only quibble I have with Yahoo's move is with eliminating the listing fee. I think you want the pricing structure to encourage sellers to list quality goods at reasonable prices. If the listing fee is zero, that might encourage people to list huge piles of crap at absurd prices. If the listing fee is small but not zero (and, perhaps, refunded if the item sells), it discourages sellers from listing items that won't sell.
[via TechDirt]
Update: About two years later, Yahoo gives up and shuts down Yahoo Auctions.
Unfortunate. I still think eBay could have been defeated, but just making the auctions free isn't enough. All that does is guarantee that the listings will be filled with crap.
I suspect a more successful strategy might have been closer to what I suggested in my post "Kill eBay, Vol. 1". Focus on dominating specific verticals like music or electronics. Make deals with liquidators to ensure that there always are good deals on the site.
See also my April 2006 post, "Early Amazon: Auctions".
Both Amazon and Yahoo should make their auctions free or near free. Auctions are not that much different than newspaper classifieds. As Craigslist has shown, the classified advertising market is ripe for disruption. Amazon and Yahoo have a similar opportunity in online auctions.
The only quibble I have with Yahoo's move is with eliminating the listing fee. I think you want the pricing structure to encourage sellers to list quality goods at reasonable prices. If the listing fee is zero, that might encourage people to list huge piles of crap at absurd prices. If the listing fee is small but not zero (and, perhaps, refunded if the item sells), it discourages sellers from listing items that won't sell.
[via TechDirt]
Update: About two years later, Yahoo gives up and shuts down Yahoo Auctions.
Unfortunate. I still think eBay could have been defeated, but just making the auctions free isn't enough. All that does is guarantee that the listings will be filled with crap.
I suspect a more successful strategy might have been closer to what I suggested in my post "Kill eBay, Vol. 1". Focus on dominating specific verticals like music or electronics. Make deals with liquidators to ensure that there always are good deals on the site.
See also my April 2006 post, "Early Amazon: Auctions".
Saturday, June 04, 2005
MSN's start.com/3 is out
MSN's next experiment with a customizable home page, start.com/3, is out.
This latest version looks like a combination of My Google and My Yahoo. It has the simplicity and drag-and-drop of My Google (in IE, at least) and some of the additional functionality of My Yahoo (such as including any RSS feed).
It's a nice effort, on the same level as the new My Google or the venerable My Yahoo. But, as Mark Fletcher said, imitating My Yahoo might not be the best strategy given the problems My Yahoo has with information overload.
Bizarrely, when you first go to start.com/3, they throw up an annoying list of questions you have to answer before you get access. What were they thinking? "Gee, this product is too easy to use. What can we do to make it more painful?"
Update: Steve Rider at MSN talks a little about Start.com and where they want to go with it. [Found on Findory]
This latest version looks like a combination of My Google and My Yahoo. It has the simplicity and drag-and-drop of My Google (in IE, at least) and some of the additional functionality of My Yahoo (such as including any RSS feed).
It's a nice effort, on the same level as the new My Google or the venerable My Yahoo. But, as Mark Fletcher said, imitating My Yahoo might not be the best strategy given the problems My Yahoo has with information overload.
Bizarrely, when you first go to start.com/3, they throw up an annoying list of questions you have to answer before you get access. What were they thinking? "Gee, this product is too easy to use. What can we do to make it more painful?"
Update: Steve Rider at MSN talks a little about Start.com and where they want to go with it. [Found on Findory]
Thursday, June 02, 2005
Make advertising useful
Jack Schofield at The Guardian writes about Google and gives us a perfect description of why Google AdWords is so successful:
See also my earlier posts, "Behavioral targeted advertising" and "Bringing sense to web advertising".
- Instead of selling mass-market ad banners that were boring and slowed pages, it created AdWords. These small text ads were targeted to the search each user was making, and could be as useful as the search results. Instead of reaching thousands of people who were not interested, AdWords reached the handful who were.
- Unlike the earlier Internet advertising efforts, we didn't just show any ad along with the search. We ... take a search term and figure out which ads were most likely to be relevant. Whereas people tend to ignore untargeted ads, we found that people actually like these ads because they provide additional, relevant information ... That's really the secret of why the model has worked so well for us. We found a way to make advertising useful, not annoying.
See also my earlier posts, "Behavioral targeted advertising" and "Bringing sense to web advertising".
Wednesday, June 01, 2005
Research papers @ Google
Gary Price points to a new list at Google Labs of research papers on Google's technology.
The list is new, but most of the papers have been out for a while. The Google File System, Google Cluster Architecture, and MapReduce papers are particularly interesting. If you're a geek interested in large scale search, those papers are required reading.
The list is new, but most of the papers have been out for a while. The Google File System, Google Cluster Architecture, and MapReduce papers are particularly interesting. If you're a geek interested in large scale search, those papers are required reading.
The content should find you
David Beisel groks Findory:
- I not only want to listen/read/view media that I know I want, but also want have media served up to me that I don't even know that I want.
Findory ... is pushing forward with a vision of delivering content that is both personalized and predictive ... For both news and blogs, the company's service recommends content based on what I've read in the past ... Findory allows the right article to find me, as opposed to me looking for the article.
Interestingly, earlier this week Findory launched its personalized advertising engine. So not only is the company serving up content that's personalized and predictive, but it's attempting to do the same with advertisements as well ... When advertising becomes both personalized and predictive, it actually becomes content -- advertorial content.
Five years ago ... we were making only the first steps towards a vision of personalized predictive advertising. Findory, however, is now making much longer strides towards both personalized predictive content and advertising. And I believe that is the future.
Subscribe to:
Comments (Atom)