Testing for Significant Ability in Identifying Wines

Richard E. Quandt
Princeton University
The Andrew W. Mellon Foundation

1. Introduction

One of the more interesting tests one can perform during blind wine tastings is to attempt to identify the wines. Of course, the tasters would normally receive some guidance from the organizer of the tasting, which would narrow the universe of possible wines to the motif for the day; thus, for example, the tasting might be devoted to 1990 Bordeaux reds. Normally, the tasters are even told what the particular wines are that they will be tasting; they are just not told which bottle contains which of the wines. Thus, they would be told, for example, the the tasting includes Chateau Margaux, Chateau Lafite, Chateau Latour, etc.

The relative success that a person has in identifying the wines is in itself an interesting measure of that person's ability; but if the number of wines is large, it may be too daunting a task to associate with each (covered) bottle the particular name of the wine that is in it. A slightly easier task, but no less interesting, is the task of identifying the "type" of the wine. By type I mean any general characteristic of the wines that sets one group of wines in the tasting apart from another group. Thus, for example, one could have two different vintages of four different chateaux in the tasting: for example, a tasting with 8 wines could include the 1994 and 1995 vintages of Ch. Margaux, Ch. Latour, Ch. Lafite and Ch. Haur Brion. In such a situation, there are two questions that can be asked:

    1. How well do the tasters identify the vintages; i.e., to what extent do they recognize the 1994s as 1994s and similarly for the 1995s?
    2. How well do the tasters identify the chateaux; i.e., do what extent can they pick out which wines are from the same chateau?1

There are numerous other characteristics the tasters could be subjected to. One might have a mix of French, American and South African pinor noir wines in a single tasting, in which the objective would be to identify which wine is from which region. But note that there are two potential stages of this game. First, one would want to identify which wines "belong together," and a person might successfully complete that task, but still not know which group is the South African or the French or the American. A second stage would then be to associate which each grouping determined by the taster the actual region.

This latter step, however, is of potential interest only in the unusual case in which the groupings are established or guessed flawlessly. Imagine that we have a tasting in which there are 4 French, 3 American and 3 South African wines. Denote them by the letters F, A and S. Suppose that a taster identifies which wines "belong together" fairly well: his first group consists of wines F, F, F, A, his second group of wines A, A, S, and his third group S, S, F. The taster has identified seven wines correctly, but none of his groupings is correct as a whole, and so it is not possible for this taster to be right in picking the grouping which is the American and which the South African. We shall therefore concentrate in what follows on simply how many wines are picked correctly. The purpose of this paper is to provide more detailed tables for readers.

2. Statistical Considerations

The underlying statistical theory of testing confronts the choices made by a person with what could have happened if choices had been made at random. Imagine that we have to identify six objects of which two are "X-objects" and four are "Y-objects," and assume that in reality, when the bottles are lined up from left to right, they are as follows:

X    X    Y    Y    Y    Y

and the choices we make are denoted by lower case letters, as in

x    y    x    y    y    y

It is obvious that two errors were made: the second X was identified as a y and the first Y was identified as an x. But if we were to assign x-s and y-s randomly (which might happen if the taster had lost his or her sense of taste), some wines would be correctly identified by accident. In fact, by a random assignment of x-s and y-s we might (rarely) even have all the wines identified correctly! It is possible to calculate what the probability is under a random assignment system to have no wine identified correctly, exactly one wine identified correctly, two wines identified correctly, and so on. (It should be obvious to the reader that if there are n wines, it is impossible to have exactly n-1 wines identified correctly.) The probabilities for the number of correct identifications for some particular value of n is called the probability distribution under the hypothesis of random assignments; an illustrative distribution for n wines with three types wines with, respectively, 4 and 4 and 4 wines in each of the types is given in Table 1 below:


Table 1 Probability Distribution

      Number of  wines            Probability
        identified
		
	   0	    		0.00999
	   1			0.05264
	   2			0.13091
	   3			0.20595
	   4			0.22935
	   5			0.18286
	   6			0.11359
	   7			0.04987
	   8			0.01074
	   9			0.00369
           10			0.00139
           11			0.0 
           12			0.00003	
		

Thus, with a random identification of 12 wines (with groups of 4, 4, and 4) there is a chance of about 23% that 4 wines will be correctly identified, but there is only a little more that one in a thousand chance that as many as 10 wines will be correctly identified. Statisticians then say that if an actual outcome is one which would be extremely unlikely with random identification, we can declare it to be "statistically significant;" that is to say, the departure of the actual result produced by a person is sufficiently unlikely on the assumption of randomness that we have to conclude that that person has not acted randomly, i.e., that the person really knows what he or she doing. But what is the measure of "sufficiently unlikely"? Here statisticians employ a convention and use either 5% or 10% as the measure of unlikeliness; accordingly we would speak of a statistical result as being significant at the 0.1 level or the 0.05 level. Thus, we would declare the number of wines correctly identified by a person as significant if the probability of having that number or a greater number correctly identified is less than or equal to 0.1 or 0.05 respectively. In the above Table, the probability of having 8 or 9 or 10 or 11 or 12 wines correctly identified is 0.02485 (which is the sum 0.01974+0.00369+0.00139+0.00003), but the probability of having 7 or 8 or 9 or 10 or 11 or 12 wines correctly identified is 0.07472 (0.04987+0.01974+0.00369+0.00139+0.00003); hence, to be significantly good at identifying wines in the above situation, one must be able to identify at least 8 wines at the 0.05 level and to be statistically significant at the 0.10 level one must be able to identify at least 7 of the wines. These cut-off values are called the "critical values", and it is immediately obvious that the lower the significance level, the higher the critical levels, i.e., the more stringent the test is for declaring somebody as a "good identifier."

There are many situations in which the critical levels are not terribly meaningful. We discuss only one example in detail. In this example, there are 10 wines which fall into two groups: one group has a single wine in it and the other group has 9. Unbeknownst to our tester, the wines are arrayed in a line as follows:


X     Y    Y    Y    Y    Y    Y    Y    Y    Y

There are only two possible outcomes: the person either identifies all ten wines correctly, or he identifies exactly eigth wines correctly; and those are exactly the outcomes that can occur if identifications are made randomly. In how many ways can a lower case x and 9 lower case y-s be assigned to the above "slots" so that exactly 10 wines are identified correctly? The lone x must be asigned to the first slot (which is an X-slot) and there is no choice in that matter; the 9 y-s can be assigned to the remaining 9 Y-slots in 9! (9 factorial=9×8×7×6×5×4×3×2=362,880) ways. But the total number of ways in which 10 objects can be assigned to 10 slots is 10! (=3,628,800); hence the probability of all ten objects being correctly identified is 0.10. Since the only other possible outcome is that 8 objects are correctly identified, the probability of that outcome under random assignments must be 0.9. Such relatively lopsided results will occur if there are only two groups and one of the groups has very few items in it.

Table 2 provides the 0.05 level and 0.1 level critical values for numerous cases. The left hadn column describes the case using the following notation. An entry such as 10;3,3,4indicates that there are 10 wines with three subgroups containing 3 and 3 and 4 items respectively; 8;2,6 denotes a case with 8 wines, two groups with 2 and 6 items respectively.


Table 2 Critical Values

            Case                 Critical Values
			  0.05 level2    0.10 level
			
           6;3,3                 6             6
	   6:4,2                 -             6
	   6;2,2,2               6             6
	   7;3,4                 7             7
	   7;5,2                 7             7
	   7;2,3,3               7             5
	   7;1,2,4               7             7
           8;4,4                 8             8
	   8;3,5                 8             8
	   8;2,6                 8             8
	   8;2,3,3               6             6
	   8;2,2,4               6             6
	   8;2,2,2,2             6             5
	   9;4,5                 9             9
	   9;3,6                 9             9
	   9;3,3,3               6             6
	   9;2 3 4               7             6
	   9;2,2,5               7             6
	   10;5,5               10            10
	   10;4,6               10            10
	   10;3,7               10            10
	   10;3,3,4              7             6
	   10;2,3,5              7             7
	   10;2,4,4              7             7
	   10;2,2,6              8             7
	   10;2,2,2,2,2          5             5
	   11;5,6               11             9
	   11;4,7               11             9
	   11;3,8               11            11
	   11;3,4,4              8             7
	   11;3,3,5              8             7
	   11;2,3,6              8             7
	   11;2,2,7              8             8
	   11;2,3,3,3            6             6
	   11;2,2,3,4            7             6
	   12;6,6               10            10
	   12;5,7               10            10
	   12;4,4,4              8             7
	   12;3,4,5              8             7
	   12;3,3,6              8             8
	   12;2,3,7              9             8
	   12;2,4,6              8             7
	   12;3,3,3,3            7             6
	   12;2,3,3,4            7             6
	   12;2,2,2,2,2,2        5             4
	   
	 

It is immediately evident that if the total number of wines is less than 11, and if there are only two categories, statistically significant identification requires that every wine be correctly identified. Less than 100% correct identification with two categories begins only if there are 11 or more wines. If the number of categories is greater than 2, it is harder to identify wines correctly by chance, and hence a smaller number of successful identifications serve as critical values. But even for as many as 12 wines with three categories one needs to identify (at the 0.05 level) 8 or 9 wines in order for the result to be significant. It is particularly interesting to compare (12;4,4,4) with (12;3,3,3,3). This corresponds to the case when we might have four different chateaux, each with the same three different vintages. The critical values are smaller in the latter case; hence, on purely probabilistic grounds it is easier to significantly match up the four sets of chateaux than the three sets of vintages. Finally, a similar comparison can be made between (12;6,6) and (12;2,2,2,2,2,2): if we have six sets of two wines (e.g., two specific vintages for six chateaux), it is much easier to reject the hypothesis of randomness by trying to identify which chateaux belong together rather than which vintages.

Return to previous page