2.17.2011

Watson making Information Management (even more) cool

Editor's note: This is a guest post by IBM's Director of Strategy and Marketing for Database Software and Systems Bernie Spang.

Like many of my colleagues in the IT business, I am often disappointed by the lack of interest I am able to generate among my family and friends when discussing my work. But thanks to Watson, the computing system that can play quiz show Jeopardy! at a champion level, I have experienced a few precious moments where both my kids and parents showed interest in my work

My 15 year old son and 76 year old father both had the same reaction after watching the Watson-Jeopardy! Challenge commercial during the NFL playoffs: “that's cool, but why did IBM build a computer to play games?”

Surprisingly, they both gave me five minutes of their attention – just long enough to sneak in an explanation of how Watson is connected to the IBM Information Management software portfolio.

What is Information Management?

Watson incorporates open technology such as UIMA (Unstructured Information Management Architecture), Eclipse, and Apache Hadoop. The first two of which IBM contributed to the open source community. To build on the medical reference IBM Research Senior Vice President Dr. John Kelly made at the Watson-Jeopardy! press conference in January, a healthcare provider can use this software to analyze patient and treatment information – including doctors’ notes and clinical reports – to pinpoint illness trends and successful treatments.



More about what’s inside Watson

Craig Rhinehart connects Watson and Content Analytics in his blog: 10 Things You Need to Know About the Technology Behind Watson.

These capabilities Watson uses Apache Hadoop to analyze massive amounts of information is also used in IBM’s InfoSphere BigInsights software. And IBM InfoSphere Warehouse uses the UIMA technology for text analytics – just as Watson does.

A new addition to the InfoSphere portfolio, called Streams, analyzes information flowing through systems that may never be stored. Streams can analyze thousands of pieces of vital sign telemetry per second to help save the lives of premature babies (another compelling example that kept my son and father intrigued). While Watson does not use a form of Streams, the two have shared heritage as IBM Research projects.

If you are anything like my dad and son, you are about at your limit of examples to absorb. But hopefully you, too, already get the point. While Watson is an amazing feat of Question Answer technology, my son and dad think the future possibilities for Watson are pretty cool, too.

2.13.2011

Watson’s wagering strategies

Editor’s note: This guest post from IBM Researcher Dr. Gerald Tesauro is the third article in a three-part series about how Watson plays America’s favorite quiz show®.

Daily Doubles and Final Jeopardy! are often the most critical junctures of a Jeopardy! game; the amount wagered can make a big difference in a player’s overall chances to win. How does Watson decide on the amount?



Daily Double wagering

In principle, to compute the best Daily Double (DD) bet, a player must answer two basic questions:

(1) How likely am I to answer the DD clue correctly?

(2) How much will a given bet increase or decrease my winning chances when I get the DD right or wrong?


Match Play

The Watson-Jeopardy Challenge is spread over two games, with combined totals determining the winner. This style of play requires different strategies than a typical game. Final Jeopardy! of game one is analogous to “half time,” so requires different strategies by all competitors, compared to when game two is the last chance to win.

Humans are at best only able to make crude estimates of these quantities. By contrast, Watson uses advanced mathematical models that can answer both questions with far greater precision than humans can achieve.

To address the first question, Watson uses an “in-category DD confidence” model. Based on thousands of tests on historical Jeopardy! categories containing DDs, the model estimates Watson’s DD accuracy, given the number of previously seen clues in the category that Watson got right and wrong.

Watson tackles the second question by using a Game State Evaluator (GSE), a complex regression model that estimates Watson’s winning chances at any stage of the game, given the information set that describes the current game state (for example, the scores of the three players, the number of remaining clues, the value of remaining clues, and the number of remaining DDs).

The GSE was trained over the course of millions of simulated Jeopardy! contests pitting Watson vs. two simulated human opponents. The human opponent models in these simulations capture important statistical profiles of human contestants, such as how often contestants attempt to buzz in; how often they are right when they win the buzz; their accuracy on DDs and Final Jeopardy!.

Optimal wagering

By combining the GSE with the in-category DD confidence, Watson can compute an overall expected chance to win the game for any given DD bet. This analysis runs for every legal betting amount – from the $5 DD minimum, to its entire bankroll for a True Daily Double – to come up with an optimal amount. The calculation also uses risk analytics to trade off expected winning chances against the risk of a particular bet.

Watson’s resulting bet might seem unusual, in that it frequently may be far more aggressive, or far more conservative, than typical human bets. The amount may also take on non-round values (i.e., not an exact multiple of $100). Such values may make the arithmetic a little more challenging for the humans when computing their bets.

Final Jeopardy! wagering

In calculating a Final Jeopardy! (FJ) wager, Watson first needs to know if it is playing a single game or a two-game match [see Call out box: Match Play]. In the latter case, Watson will use very different strategies for game one and game two. The analysis for game one is similar to Daily Double analysis: Watson uses a statistical model of likely human bets, human FJ accuracy, and Watson’s FJ accuracy to calculate its expected winning chances for every legal bet. It then selects the bet giving the best risk-adjusted chance to win the match.

While there are no previously revealed clues in the FJ round, Watson does obtain evidence of its likely FJ accuracy from the category title. Given the title, Watson first computes several salient features via Natural Language Processing analysis. It then consults a “FJ prior accuracy” regression model, based on Watson’s performance on thousands of historical FJ categories, to predict Watson’s accuracy given the category features.


Wagering in game two of a match is similar to FJ in ordinary games. The predominant consideration is score positioning (first, second or third place). In some cases, the contestants may need to use strategic reasoning as in games like Rock-Paper-Scissors – predict the opponents’ bets, while taking into account the fact that the opponents are also trying to predict their bets.

Watson has been programmed with a library of known FJ strategy rules, such as Two-Thirds Betting and Shore’s Conjecture. The research team also added novel rules for some special situations which we discovered.[1]

Depending on the situation, Watson will either bet according to a suitable strategy rule, or it will run a real-time simulation to calculate the best bet, among all legal bets. For the match with Ken and Brad, Watson will also take into account the prize values for second place ($300,000) and third place ($200,000), leading to a different objective than simply trying to win the match.


[1] One such rule in ordinary FJ applies when the leader’s score exactly equals the sum of the other two players’ scores, for example, if Watson has $20,000 and the two humans have $13,000 and $7,000. Watson would normally bet $6,001, to win by $1 when the second place player doubles her score. However, in this case Watson will bet $6,000 to tie for first place. The reason is that if Watson bets $6,001 and is wrong, it gives the third place player a chance to win by $1 ($14,000 to $13,999) if the second place player is wrong.

2.03.2011

Knowing what it knows: selected nuances of Watson's strategy

Editor’s note: This guest post from IBM Researcher Dr. Jon Lenchner is the second article in a three-part series about how Watson plays America’s favorite quiz show®

Watson learns by gathering information, but instead of neural connections, it uses algorithms to understand the natural language that information is written in. These algorithms give it a confidence in a Jeopardy! category and clue, that maps to a probabilistic estimate that the response is correct.

Watson honed this self-assessment of what it does and does not know by training on thousands of historical questions (Watson’s equivalent of taking a few hundred “practice tests”).

The algorithms dealing with natural language are not perfect, so there’s always some degree of uncertainty. Watson calculates its uncertainty and learns which algorithms to trust under which circumstances, such as different Jeopardy! categories.

IBM Researcher Dr. David Gondek, who developed machine learning algorithms and infrastructure that Watson uses to rank and estimate confidence in possible answers, uses an example of how “introducing” and “manufacturing” show how language has many ways to refer to the same relation and can be highly contextual:

The clue was: It was introduced by the Coca-Cola Company in 1963. Watson can find a passage stating that ‘Coca-Cola first manufactured Tab (the correct response) in 1963’, so in order to answer the question, Watson needed to understand that introducing and manufacturing can be equivalent – if a company is introducing a product. But that is highly dependent on context: if you introduce your uncle, it doesn't mean you manufactured him.

Watson also exhibits dynamic learning within categories. Watson observes the correct answers to clues to verify it is interpreting the category correctly. The sparring matches offer good examples of Watson making these in-game adjustments. Not only does Watson get better at answering as clues in a category are revealed, but its understanding of its own in-category ability is also refined.

Note, because Watson cannot hear, it does not know how Jennings or Rutter answer a clue. So, Watson cannot use their responses in its accuracy assessment or to change a response it may be considering.



The Confidence to buzz in

Those who watched the practice round could see a graphic of Watson’s confidence level in its top three possible responses, and a line that established a threshold it must reach to buzz in.

Watson’s default threshold is typically 50 percent. In other words, if its confidence estimation determines a 50 percent or higher chance of correctly responding to a clue, it will try to buzz in.

What is a Daily Double?

When a contestant selects a Daily Double, he or she can wager between $5 and either his or her current score (a True Daily Double), if higher than the highest value on the board, or if not, up to the highest value on the board ($1,000 for Single Jeopardy! and $2,000 for Double Jeopardy!)

The first round of Jeopardy! has one Daily Double; the second round has two.

But the buzz threshold is game-state dependent.

The threshold can change substantially towards the end of a game. For example, Watson will lower the threshold if it gives a higher chance to win or, for example, to avoid a statistical lockout. Analogously, if Watson is leading and its only chance of losing a game is to buzz in and respond incorrectly, it will not buzz in, no matter how confident.

Clue selection

If Watson gets to choose a category and clue, its first priority is finding any remaining of the three Daily Doubles in a game. These clues allow a contestant to wager a specific dollar amount on the clue without worry of the other two contestants buzzing in. Jennings, Rutter and Watson have a high chance to answer these correctly, so Daily Doubles provide three opportunities for a critical score boost.

The Watson Research team studied the historical distribution of Daily Doubles and found they appear most-frequently in the three bottom rows, with the fourth being the most common. Daily Doubles also most frequently appear in the first column. Watson also makes use of even more statistics to dynamically predict their location based on what has been exposed so far in a game.

Once the Daily Doubles are off the board, Watson looks for the lowest clue value in a category, for which there are still a significant number of high value clues. Lower value clues help it get the gist of a category with less risk, so that it has a better shot at the high value clues to come.