Some notes on iTap / T9 / "predictive text"


Legal disclaimer

Both iTap and T9 use the mapping mentioned below, and I wasn't careful to distinguish between the two products when I wrote this page, because I wasn't familiar with the difference between iTap and T9. But I have now been informed: iTAP and T9 are two different products, and compete with each other. T9 is created by Tegic, a company in Seattle, and is used on most Nokia phones. iTAP is a product of a division of Motorola in Silicon Valley. For a list of differences between iTap and T9, see bottom of page.


The "iTap/T9" code for English encodes the characters [A-Z] into the characters [2-9] using this mapping:

sub encode {
 s/[ABC]/2/g;
 s/[DEF]/3/g;
 s/[GHI]/4/g;
 s/[JKL]/5/g;
 s/[MNO]/6/g;
 s/[PQRS]/7/g;
 s/[TUV]/8/g;
 s/[WXYZ]/9/g;
}
[This mapping is the one that has been used on telephones for decades.]

The patented T9 system uses a dictionary (does it also use a general language model?) to decode the string back into english characters. In the event of any ambiguity, the user uses scroll commands (embodied on one or more other keys) to select the desired word from a list offered by the machine.

I was curious to find out how good this code is for English. I could see it being bad in two possible ways.

  1. Assuming that the user is purely typing words from a dictionary, how often will the user have to use the disambiguator, (Example: KISS = LIPS = 5477, and PUFFS = QUEER = 78337!) and how often will this disambiguation be tiresome to perform? I could imagine there being clumps of words that have the same code; the user might have to do a lot of scrolling to pick the right one!
  2. What happens if you've typed in a long word that it does not know? If the machine does not know lewinksy, say, what happens after you have typed in "53946759"? Is the user confronted with an enourmous list of possible strings, none of which is in the dictionary? I would guess the system is virtually useless for sending long strings that are not in the dictionary. (The answer, in T9, is that in such a case, the user is invited to Spell the word, which involves rewriting the whole word using unambiguous multitap. Slightly annoying, but worth it because such words are then added to the T9 dictionary.)
Below I give the answer to the first question.

I used a perl program and the linux dictionary /usr/dict/words: I found that 45373 distinct words mapped to 41439 distinct codes, and 3934 clashes occurred. So if the user picks words at random from the dictionary and the system knows all those words and no others, then the disambiguator will have to be used for less than 10% of words. In fact, for even fewer than that, if the first guess made by T9 is correct, as is often the case. [Dale Grover told me that in practice T9 gets it right better than 97% of the time.]

Here are some of the worst clashes (i.e. clashes involving most words). I note with a smile that the two examples chosen by Motorola (HELLO) and Tegic (HOW) do not suffer clashes with any dictionary words.


Conclusion: The worst dictionary clash is the eleven words:
ACRES, BARDS, BARER, BARES, BASER, BASES, CAPER, CAPES, CARDS, CARES, CASES
The worst-case number of keypresses that the user has to make, given a dictionary word, is thus ten scroll-commands on top of the original word. That does not seem excessive, though I can imagine that people who send messages about acres, bases, cards, cares and cases, all of which are fairly frequent five-letter words, might get a little bored of having to do this. But it is certainly better than the multi-tap code!

A few of the more amusing (though not if your name is Amy) confusable sets are:

 269 : AMY, ANY, BOW, BOX, BOY, COW 
 7467 : PIMP, PINS, RIMS, SHOP, SIMS, SINS 
 25663 : ALONE, ALOOF, BLOND, BLOOD, CLONE
 74687 : PINTS, PIOTR, PIOUS, RIOTS, SHOTS, SINUS
 5477 : KISS, LIPS, LISP, LISS

Eight or more words with same code

 729 : 	 PAW, PAY, PAZ, RAW, RAY, SAW, SAX, SAY 
 76737 : 	 PORES, POSER, POSES, ROPER, ROPES, ROSES, SORER, SORES 
 46637 : 	 GONER, GOODS, GOOFS, HOMER, HOMES, HONER, HONES, HOODS, HOOFS, INNER 
 22737 : 	 ACRES, BARDS, BARER, BARES, BASER, BASES, CAPER, CAPES, CARDS, CARES, CASES 
 7283 : 	 PATE, PAVE, RATE, RAVE, SATE, SAUD, SAVE, SCUD 
 2273 : 	 ACRE, BARD, BARE, BASE, CAPE, CARD, CARE, CASE 

Seven words with same code

 4663 : 	 GONE, GOOD, GOOF, HOME, HONE, HOOD, HOOF 
 726 : 	 PAM, PAN, RAM, RAN, SAM, SAN, SAO 
 72837 : 	 PAVES, RATER, RATES, RAVES, SATES, SAVER, SAVES 
 227837 : 	 BARTER, BASTES, CARTER, CARVER, CARVES, CASTER, CASTES 
 2277 : 	 BARR, BARS, BASS, CAPS, CARP, CARR, CARS 
 752837 : 	 PLATES, SKATER, SKATES, SLATER, SLATES, SLAVER, SLAVES 
 7867 : 	 PUMP, PUNS, RUMP, RUNS, STOP, SUMS, SUNS 

Six words with same code

 2877 : 	 BURP, BURR, BUSS, CUPS, CURS, CUSP 
 742737 : 	 PHASER, PHASES, SHAPER, SHAPES, SHARER, SHARES 
 786 : 	 PUN, QUO, RUM, RUN, SUM, SUN 
 34637 : 	 DIMES, DINER, DINES, FINDS, FINER, FINES 
 26637 : 	 BONDS, BONER, BONES, COMER, COMES, CONES 
 787433 : 	 PURGED, PUSHED, RUSHED, STRIDE, STRIFE, SURGED 
 787437 : 	 PURGES, PUSHER, PUSHES, RUSHER, RUSHES, SURGES 
 7243 : 	 PAGE, PAID, RAGE, RAID, SAGE, SAID 
 74687 : 	 PINTS, PIOTR, PIOUS, RIOTS, SHOTS, SINUS 
 7277 : 	 PARR, PARS, PASS, RAPS, RASP, SAPS 
 7627 : 	 ROAR, ROBS, SNAP, SOAP, SOAR, SOBS 
 7673 : 	 POPE, PORE, POSE, ROPE, ROSE, SORE 
 27437 : 	 ARIES, ASHER, ASHES, BRIER, CRIER, CRIES 
 26737 : 	 BORER, BORES, COPES, CORDS, CORER, CORES 
 2253 : 	 ABLE, BAKE, BALD, BALE, CAKE, CALF 
 2263 : 	 ACME, ACNE, BAND, BANE, CAME, CANE 
 74337 : 	 RIDER, RIDES, SHEDS, SHEEP, SHEER, SIDES 
 2666 : 	 AMMO, ANON, BONN, BOOM, BOON, COON 
 529 : 	 JAW, JAY, KAY, LAW, LAX, LAY 
 7327 : 	 PEAR, PEAS, REAP, REAR, SEAR, SEAS 
 7337 : 	 PEEP, PEER, REDS, SEEP, SEER, SEES 
 782537 : 	 PUCKER, QUAKER, QUAKES, RUBLES, STAKES, SUCKER 
 269 : 	 AMY, ANY, BOW, BOX, BOY, COW 
 22537 : 	 ABLER, BAKER, BAKES, BALER, BALES, CAKES 
 7463 : 	 PINE, RIME, RIND, SHOD, SHOE, SINE 
 7467 : 	 PIMP, PINS, RIMS, SHOP, SIMS, SINS 

Five words with same code

 3937 : 	 DYER, DYES, EWES, EYER, EYES 
 763 : 	 POD, POE, ROD, ROE, SOD 
 769 : 	 POX, ROW, ROY, SOW, SOY 
 32837 : 	 DATER, DATES, EATER, EAVES, FATES 
 72437 : 	 PAGER, PAGES, RAGES, RAIDS, SAGES 
 54637 : 	 KHMER, KINDS, LIMES, LINER, LINES 
 272837 : 	 BRAVER, BRAVES, CRATER, CRATES, CRAVES 
 72833 : 	 PAVED, RATED, RAVED, SATED, SAVED 
 486 : 	 GUM, GUN, HUM, HUN, ITO 
 7263 : 	 PANE, RAND, SAME, SAND, SANE 
 7297 : 	 PAWS, PAYS, RAYS, SAWS, SAYS 
 7653 : 	 POKE, POLE, ROLE, SOLD, SOLE 
 7687 : 	 POTS, POUR, ROTS, SOUP, SOUR 
 25663 : 	 ALONE, ALOOF, BLOND, BLOOD, CLONE 
 42779 : 	 GARRY, GASSY, HAPPY, HARPY, HARRY 
 73257 : 	 PEAKS, PEALS, PECKS, REALS, SEALS 
 72537 : 	 PALER, PALES, RAKES, SAKES, SALES 
 372737 : 	 DRAPER, DRAPES, ERASER, ERASES, FRASER 
 2663 : 	 ANNE, BOND, BONE, COME, CONE 
 2673 : 	 BORE, BOSE, COPE, CORD, CORE 
 7282437 : 	 PATCHES, RAVAGER, RAVAGES, SAVAGER, SAVAGES 
 74737 : 	 PIPER, PIPES, RISER, RISES, SIRES 
 76537 : 	 POKER, POKES, POLES, ROLES, SOLES 
 7325 : 	 PEAK, PEAL, PECK, REAL, SEAL 
 6277 : 	 MAPS, MARS, MASS, NAPS, OARS 
 4867 : 	 GUMS, GUNS, HUMP, HUMS, HUNS 
 72237 : 	 PACER, PACES, RACER, RACES, SABER 
 2337 : 	 ADDS, BEDS, BEEP, BEER, BEES 
 5263 : 	 JANE, KANE, LAME, LAND, LANE 
 5277 : 	 JARS, KARP, LAPS, LARS, LASS 
 728464 : 	 PAVING, RATING, RAVING, SATING, SAVING 
 42937 : 	 GAYER, GAZER, GAZES, HAYES, HAZES 
 2427 : 	 AGAR, BIAS, BIBS, CHAP, CHAR 
 26937 : 	 BOWER, BOWES, BOXER, BOXES, COWER 
 2433 : 	 AGED, AGEE, AIDE, BIDE, CHEF 
 2437 : 	 AGER, AGES, AIDS, BIDS, BIER 
 72737 : 	 PAPER, PARES, RAPER, RAPES, RARER 
 5337 : 	 JEEP, JEER, KEEP, LEER, LEES 

Four words with same code

 2867 : 	 ATOP, BUMP, BUMS, BUNS 
 4653 : 	 GOLD, GOLF, HOLD, HOLE 
 6453 : 	 MIKE, MILD, MILE, NILE 
 739 : 	 PEW, REX, SEW, SEX 
 746 : 	 PIN, RHO, RIM, RIO 
 747 : 	 PIP, RIP, SIP, SIR 
 726737 : 	 PAMPER, SCOPES, SCORER, SCORES 
 9327 : 	 WEAR, WEBS, YEAR, YEAS 
 782 : 	 PUB, QUA, RUB, SUB 
 762733 : 	 ROARED, SNARED, SOAPED, SOARED 
 2877464 : 	 BURPING, BUSSING, CUPPING, CURSING 
 2527 : 	 AJAR, ALAR, ALAS, CLAP 
 36837 : 	 DOTES, DOVER, DOVES, ENTER 
 46639 : 	 GOMEZ, GOODY, GOOFY, HONEY 
 74273 : 	 PHASE, SHAPE, SHARD, SHARE 
 767837 : 	 PORTER, POSTER, ROSTER, SORTER 
 3637 : 	 DOER, DOES, ENDS, FOES 
 4367 : 	 GEMS, HEMP, HEMS, HENS 
 426 : 	 HAM, HAN, IAN, IBN 
 427 : 	 GAP, GAS, HAP, HAS 
 74637 : 	 PINES, RINDS, SHOES, SINES 
 3262437 : 	 DAMAGER, DAMAGES, FANCIER, FANCIES 
 3663 : 	 DOME, DONE, FOND, FOOD 
 3673 : 	 DOPE, DOSE, FORD, FORE 
 472837 : 	 GRATER, GRATES, GRAVER, GRAVES 
 5463 : 	 KIND, LIME, LIND, LINE 
 7253 : 	 PALE, RAKE, SAKE, SALE 
 4747 : 	 GRIP, GRIS, IRIS, ISIS 
 5477 : 	 KISS, LIPS, LISP, LISS 
 78253 : 	 QUAKE, RUBLE, STAKE, STALE 
 92837 : 	 WATER, WAVER, WAVES, YATES 
 22733 : 	 BARED, BASED, CARED, CASED 
 6833537 : 	 MUDDLER, MUDDLES, MUFFLER, MUFFLES 
 7688464 : 	 POTTING, POUTING, ROTTING, ROUTING 
 287733 : 	 BURPED, BUSSED, CUPPED, CURSED 
 26337 : 	 ANDES, BODES, CODER, CODES 
 227437 : 	 BARGES, BASHES, CASHER, CASHES 
 7663 : 	 POND, ROME, ROOF, SOME 
 7667 : 	 POMP, POOR, ROMP, SONS 
 7627464 : 	 ROARING, SNARING, SOAPING, SOARING 
 227464 : 	 BARING, BASING, CARING, CASING 
 27433 : 	 ASIDE, BRIDE, BRIEF, CRIED 
 44537 : 	 GILDS, GILES, HIKER, HIKES 
 2267 : 	 ABOS, BANS, CAMP, CANS 
 2275 : 	 BARK, BASK, CARL, CASK 
 722537 : 	 PACKER, SABLES, SACKER, SCALES 
 73277 : 	 PEARS, REAPS, REARS, SEARS 
 2639 : 	 ANDY, ANEW, BODY, CODY 
 2647 : 	 BOGS, BOHR, BOIS, COGS 
 2653 : 	 BOLD, COKE, COLD, COLE 
 367243 : 	 DOSAGE, ENRAGE, FORAGE, FORBID 
 75433 : 	 PLIED, SKIED, SKIFF, SLIDE 
 2662 : 	 ANNA, BOMB, BOOB, COMB 
 2665 : 	 AMOK, BOOK, COOK, COOL 
 2667 : 	 AMOS, BOOR, BOOS, COOP 
 7338237 : 	 REDUCER, REDUCES, SEDUCER, SEDUCES 
 742537 : 	 PICKER, SHAKER, SHAKES, SICKER 
 22437 : 	 ACHES, ACIDS, CAGER, CAGES 
 546 : 	 JIM, KIM, KIN, LIN 
 66737 : 	 MORES, MOSER, MOSES, NOSES 
 7335 : 	 PEEK, PEEL, REEL, SEEK 
 78337 : 	 PUFFS, QUEER, STEEP, STEER 
 7363 : 	 PEND, REND, RENE, SEND 
 762533 : 	 ROCKED, SNAKED, SOAKED, SOCKED 
 87437 : 	 TRIER, TRIES, URGES, USHER 
 7378 : 	 PERU, PEST, REST, SEPT 
 22837 : 	 BATES, BAUER, CATER, CAVES 
 94737 : 	 WIPER, WIPES, WIRES, WISER 
 75867 : 	 PLUMP, PLUMS, SLUMP, SLUMS 
 8437 : 	 TIER, TIES, VIER, VIES 
 3678437 : 	 EMPTIER, EMPTIES, FORTIER, FORTIES 
 966 : 	 WON, WOO, YON, ZOO 
 729464 : 	 PAWING, PAYING, SAWING, SAYING 
 6333537 : 	 MEDDLER, MEDDLES, NEEDLER, NEEDLES 
 52637 : 	 JAMES, LAMES, LANDS, LANES 
 42837 : 	 GATES, HATER, HATES, HAVES 
 72257 : 	 PACKS, RACKS, SACKS, SCALP 
 2278464 : 	 BASTING, CARTING, CARVING, CASTING 
 7225464 : 	 PACKING, RACKING, SACKING, SCALING 
 73337 : 	 REEDS, REEFS, REFER, SEEDS 
 2682437 : 	 BOTCHER, BOTCHES, BOUCHER, COUCHES 
 73357 : 	 PEEKS, PEELS, REELS, SEEKS 
 73377 : 	 PEEPS, PEERS, SEEPS, SEERS 
 226 : 	 ABO, BAN, CAM, CAN 
 228 : 	 ABU, ACT, BAT, CAT 
 64637 : 	 MINDS, MINER, MINES, NINES 
 54837 : 	 KITES, LITER, LIVER, LIVES 
 3463 : 	 DIME, DINE, FIND, FINE 
 7335464 : 	 PEEKING, PEELING, REELING, SEEKING 
 768733 : 	 POURED, ROUSED, SOUPED, SOURED 
 72687 : 	 PANTS, RANTS, SCOTS, SCOUR 
 266 : 	 ANN, BOO, CON, COO 
 76277 : 	 ROARS, SNAPS, SOAPS, SOARS 
 627537 : 	 MAPLES, MARKER, MASKER, NAPLES 
 7874464 : 	 PURGING, PUSHING, RUSHING, SURGING 
 633 : 	 NED, ODD, ODE, OFF 
 24337 : 	 AIDES, CHEER, CHEFS, CIDER 
 7426 : 	 RICO, SHAM, SIAM, SIAN 
 5646 : 	 JOHN, JOIN, LOGO, LOIN 
 7866464 : 	 RUNNING, STONING, SUMMING, SUNNING 
 7455 : 	 PILL, RILL, SILK, SILL 
 5673 : 	 JOSE, LORD, LORE, LOSE 
 7473 : 	 PIPE, RIPE, RISE, SIRE 
 7477 : 	 PISS, RIPS, SIPS, SIRS 
 727737 : 	 PARSER, PARSES, PASSER, PASSES 
 32733 : 	 DARED, EARED, EASED, FARED 
 32737 : 	 DARER, DARES, EASES, FARES 
 96637 : 	 WOODS, WOOER, WOOFS, ZONES 
 7827 : 	 PUBS, RUBS, STAR, SUBS 
 25837 : 	 ALTER, BLUER, BLUES, CLUES 
 7877 : 	 PUPS, PURR, PUSS, RUSS 
 262937 : 	 AMAZER, AMAZES, COAXER, COAXES 
 46537 : 	 GOLDS, HOLDS, HOLES, INKER 
 36737 : 	 DOPER, DOPES, DOSES, FORDS 
 347437 : 	 DIRGES, DISHES, FISHER, FISHES 
 82537 : 	 TAKER, TAKES, TALES, VALES 
 732533 : 	 PEAKED, PEALED, PECKED, SEALED 
 327 : 	 DAR, EAR, FAQ, FAR 
 786633 : 	 RUNOFF, STONED, SUMMED, SUNNED 
 786637 : 	 RUNNER, STONES, SUMMER, SUMNER 
 4283 : 	 GATE, GAVE, HATE, HAVE 
 346 : 	 DIM, DIN, EGO, FIN 
 75283 : 	 PLATE, SKATE, SLATE, SLAVE 
 2833 : 	 BUDD, BUFF, CUED, CUFF 

Difference between iTap (from Lexicus, Motorola) and T9 (Tegic)

  1. T9 is used on Nokia and on many other brands of phone. iTap on Motorola only?
  2. According to two researchers from Motorola `iTAP is better'
  3. iTap offers word-completions. (In my opinion, this feature, while nice for long words, makes iTap harder to explain, since a novice user, having heard that iTap does word completions, is likely to be demoralized and confused by the bad predictions that are made when only half the word is written. When I explain T9 to people I tell them to ignore the display until they have finished the word.)
  4. ITap's predictions are context-dependent. This means it can predict whole sentences, which is nice, if you are a predictable writer. But T9 advocates would emphasize the advantage of T9's being NOT context-dependent is that you know that to write a particular word, you can memorize a particular key sequence - for example, to write "HOME", you always press "4663**" (or some such), independent of context. This is good for useability, as it means the experienced user can go fast and doesn't need to look at the display.
  5. From this Motorola review: `iTap has its faults. For one, pressing the 1 button defaults to putting the number 1 in the word instead of putting a period. If you enter a space after the word, the 1 key will default to a period, but not if you are at the end of a word. This is real annoying, as you either have to waste a character at the end of each sentence, or you need to waste a keystroke to select the period instead of the 1. What were they thinking?'
  6. You can correct iTap as you write a word, and `lock in' your corrections, by using the arrow buttons. (This option is not available in T9 - and perhaps for good reason, since it is often not necessary to make corrections.) The recommended way of adding a word to iTap's dictionary is to use this `lock in corrections as needed' approach, rather than the simple `multitap' (abc) approach chosen in T9. This means that in iTap, you have to keep switching buttons (from 1-9 to the arrow buttons)
  7. In T9, '0' is used to insert a space (and implicitly to confirm that the displayed word is fine). In iTap, 'Select' is used to terminate words AND to insert a space. Pressing Select twice, in iTap, will send the text message.
  8. You can enter symbols and numbers in iTap without switching mode. (Actually, you can enter numbers in T9 too, by holding down the corresponding key.)
  9. My take on the difference between iTap and T9: T9 is very simple to explain: iTap has more features which make it harder to explain, and perhaps it demands a little more attention from the user too. iTap makes the user make decisions of the form `shall I stop writing the word now, and try to find it in the word completion mode, or shall I continue writing the word?'
    A user faced by such choices may find he regrets his decisions. T9 doesn't bother the user with such choices. You just keep going, and you'll be writing at close to one character per key press, which is fine. I never regret using T9.
    Users may also misunderstand the choices they are offered in itap: they may think that, since they are offered the chance to correct the word on the fly as they write it, they should do so; but doing so leads to slower writing.
further reading on iTAP and a nice shock wave iTap demo.


David MacKay / mackay@mrao.cam.ac.uk - home page.

AltStyle によって変換されたページ (->オリジナル) /