27 August 2008
flavors of English on Google
i was just looking through the site statistics for this here blog. one of the most interesting and useful bits of information that statcounter provides me are the search terms that people use. i would say that 99% of these searches are done on Google — we really have drunk the pagerank kool-aid. a lot of searches are pretty lengthy and specific (e.g. "kobe bryant interview in italian" or "who is the girl in the benny lava video?"). one recent search stuck out to me, though. somebody searched for just the word "whomever", and wound up at my previous post "The Office on whomever". i thought that was pretty remarkable. i clicked through on the link that statcounter provided me and saw that the search was made on google.co.uk, and that descriptively adequate was on the front page of results, at position number 6.
then, for whatever reason, i decided to re-run the search using google.com. my post was nowhere to be found on the first page. the results were entirely different. descriptively adequate finally showed up at #14 on the list of results. what's going on? certainly google hasn't written different versions of pagerank to deal with different localizations of English? as far as cataloguing search results goes, the fact that a bunch of Americans in California wrote the algorithm shouldn't adversely affect Brits and the like.
i couldn't stop there. i ran the search on all of the English Google localizations that i could think of, and got even more different results. i've also noted the number of total results that Google estimates, which also (oddly) vary by localization.
| localization | # | total hits |
|---|---|---|
| google.com | 14 | 7,480,000 |
| google.co.uk | 6 | 8,200,000 |
| google.ca | 7 | 8,180,000 |
| google.com.au | 10 | 8,190,000 |
| google.com.nz | 7 | 8,460,000 |
as i was compiling this table i remembered that Google mucks with your search results if you're signed in (which i of course had to be in order to access blogger, without which i couldn't be writing this post). i signed out, and on google.com the DA link rose to #4. i guess i should just be happy i'm on the front page on all of these searches. but there are still lingering, bizarre questions.
why does Google report different numbers of hits for different localizations?
no clue. (comments are open!)
what is causing the rank fluctuations even when i'm not logged in?
some clue. on all of the non-US localizations there is a feature "search pages from [country name]". perhaps i've got fewer australian sites linking to my blog, so my rank is slightly lower in australia than in the US or great britain.
why the hell is Google biasing my custom algorithm against my own damn blog?!
i mean throw me a bone here, guys.
and the baffler...
why do i get this on google.ca?
i mean, you're kidding, right? i'm sure that the frequency of whatever is much higher than that of whomever, but 8 million hits on a word that's in the dictionary should be enough data for google to not question my intent. and why only canadians, eh? this, of course, isn't the first time that i've seen weird spelling suggestions on Google. so perhaps they really do think they know something about English varieties that i don't?
Posted by Ed Cormany at 9:46 PM 0 comments
30 October 2007
foreign grass
this past weekend i watched some of the NFL's first regular season game held overseas (yes, when i'm not actively being a linguist i'm frequently watching sports). the game was at the new Wembley Stadium in London. during the pregame, Fox's announcing crew was getting disproportionately excited about the whole event, as i'm sure their producers and the NFL instructed them to do. the strangest part of their reporting came when Tony Siragusa, the sideline reporter, gave an update on the playing conditions.
now i don't have an audio or video recording of exactly what it said, so i hope i'm not embellishing too much, but he said something along the lines of:
now this isn't a field, it's what they call a pitch...something in the way he said it just made it sound like "see this rectangular expanse of grass? well this is no American grass. this is crazy British grass, which is so different that they have to call it something else." to be fair, he did go on and explain the difference in the types of grass seed used and the cut and drainage of the field—and these differences did play a factor, as the Giants and Dolphins churned up the ground into little more than mud during the course of the game. it was just the way that a simple lexical choice was preyed upon to create such drama, throwing arbitrariness of the sign out the window. i could imagine him standing next to the back end of a car in London reporting "now this car doesn't have a trunk, it's got what they call a boot! what will they think of next?"
in any event, it bugged me. i'm sure most American viewers barely noticed, and i'm not even sure if the game was broadcast in England. if it was, they were all probably watching Liverpool - Arsenal anyway.
Posted by Ed Cormany at 2:17 PM 0 comments
11 May 2007
more lolguistics
i don't know whether i should promise that this will be my last post on lolcats, because there has been a flurry of new content about it recently. a surprisingly large group of people have taken interest in actual linguistic analysis of the lolcat idiom. the latest comes from David McRaney at Zero Sum Mind. the fact that his article concludes with this chart indicates the seriousness of his study of lolcats and related memes:
The great thing about all of this is how we can see new languages forming out of a new medium, and since the pace is abnormally fast, we can watch it evolve over weeks instead of decades.these are, of course, constructed dialects, not actually languages. terminology aside, this is a fascinating opportunity who are interested in dialect and language change.
It also demonstrates how the Internet changes the way we connect and communicate. These words and macros depend on the users manipulating not only the information being passed back and forth, but the format of the codes we agree on to represent the information. Strunk and White would probably be appalled...the linguists hope for nothing less.
Posted by Ed Cormany at 9:40 PM 0 comments
tags: dialects
10 May 2007
literecy cat ≠ linguist cat
the lolcats phenomenon is ridiculous, if not a bit amusing. (i think the real reason i enjoy it is because it pokes fun at those people who take photos of their cats and post them to flickr ad nauseam.) recently the phraseology of lolcats has been noticed by linguists, and it's been asked whether there is an actual rule-based lolcat dialect, and therefore, sentences which are ungrammatical in lolcat. i think this is true, even if the rules were historically derived from snowclones. there are morphological changes, such as simplification of the paradigm of have (has is used in for all persons and numbers). there are syntactic changes, including modifications to the ways that modals combine with predicates (i can has X, i is be Y, etc.). so i was all ready to conclude that yes, cats can has grammar, when this appears on I CAN HAS CHEEZBURGER, the canonical lolcats site:
dammit! here i am, ready to call lolcat a real, if contrived, dialect of English, and this guy has to go fly in the face of it. things that are wrong with it:
it doesn't know what grammar is. the sentence "Literacy Cat is amazed at your perfect grammar" is itself a perfectly grammatical sentence of standard English. deliberate misspellings have been conflated with ungrammaticality.
it enforces the concept of a standard. even though it doesn't say so explicitly, it implies that lolcat is "not perfect" and therefore stigmatizes it. (please though, don't take this as an endorsement that should there be people out there actually speaking lolcat that we shouldn't stigmatize them. there are no native speakers of this dialect, and deliberately invented dialects shouldn't have equal status as native dialects. it's just the principle of the matter.)
literacy is not language. everyone in the world, with a handful of exceptions, can and does speak a language. most of the world cannot read or write. i guess literacy cat really isn't qualified to be making linguistic generalizations. it's just that no one else realized that. and that is be my problem with stuff like this.
Posted by Ed Cormany at 2:26 PM 0 comments
tags: dialects, orthography