Inspired by Glenford Myers's The Art of Software Testing, I did an experiment in July 2002 of having people supply implementations and/or test suites for the following small problem:
Given a function int f(int a, int b, int c) which takes 3 inputs that represent the lengths of sides of triangles, return an integer where the resultParticipants wisely sought clarification on legality: define it as sides must be positive length and the sum of any 2 sides must be greater than the third side. The goal of this little project was to see what sorts of mistakes people might make in implementing a small but not totally trivial function, and what sorts of test suites people might come up with.
- 1 means it is equilateral triangle (all sides equal length)
- 2 means it is isosceles triangle (exactly 2 equal sides)
- 3 means it is scalene (no equal sides)
- 4 means it is illegal triangle
Thanks to cow-orker Greg who supplied a test driver I further hacked, as well as several intentionally wrong-but-plausible implementations to help test the test suites. I built a few intentionally wrong-but-plausible implementations as well. One of them (the one using unsigned ints) turned out to be very subtle indeed as we shall see (and I owe cow-orker Moloch thanks for proving that it was indeed a buggy implementation as I intended it to be). The other participants were all RussCon gaming buddies, plus Moloch's friend kaanchan.
Click here for a zip file with the cpp source file (containing test driver and all the submitted functions) and all the supplied test suites.
russtest.txt 33 tests | gregtests 46 tests | stevetestcases 17 tests | stevetestcases2 17 tests | pjtests 343 tests | claytests 57 tests | dudleytests 14 tests | dudleytests2 14 tests | jimpartialtests.txt 4 tests | seabiscuitpartialtests.txt 5 tests | jptests.txt 30 tests | moloch_tests.txt 46 tests | kaanchan_tests.txt 5010 tests | |
russ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
russ ac ERROR | 3 | 1 | 1 | 1 | 14 | 3 | 1 | 0 | 1 | 1 | 2 | 1 | 466 |
russ ab ERROR | 3 | 6 | 1 | 1 | 14 | 3 | 2 | 2 | 0 | 1 | 3 | 1 | 426 |
russ bc ERROR | 3 | 1 | 1 | 1 | 14 | 3 | 1 | 0 | 1 | 1 | 2 | 1 | 466 |
russ typo lessthan ERROR | 3 | 6 | 0 | 1 | 30 | 6 | 4 | 3 | 0 | 0 | 3 | 6 | 1 |
russ unsigned ERROR | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 3 | 0 |
greg | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
greg isoceles ERROR | 3 | 8 | 2 | 2 | 18 | 6 | 1 | 1 | 1 | 0 | 3 | 3 | 0 |
greg ignores ERROR | 1 | 2 | 0 | 0 | 1 | 1 | 3 | 2 | 0 | 0 | 0 | 1 | 0 |
greg overflow ERROR | 12 | 2 | 0 | 2 | 0 | 12 | 1 | 0 | 2 | 3 | 2 | 0 | 0 |
steve ERROR | 3 | 5 | 1 | 1 | 8 | 3 | 2 | 1 | 1 | 1 | 1 | 2 | 1233 |
pj ERROR | 13 | 3 | 0 | 2 | 0 | 13 | 1 | 0 | 3 | 4 | 2 | 1 | 0 |
sea biscuit ERROR | 3 | 8 | 2 | 2 | 18 | 6 | 1 | 1 | 1 | 0 | 3 | 3 | 0 |
jim 1 ERROR | 13 | 2 | 0 | 1 | 0 | 10 | 1 | 0 | 3 | 4 | 2 | 1 | 0 |
jim 2 (fixed bug) ERROR | 13 | 2 | 0 | 1 | 0 | 10 | 1 | 0 | 3 | 4 | 2 | 1 | 0 |
dudley ERROR | 13 | 4 | 0 | 1 | 0 | 10 | 1 | 0 | 3 | 4 | 2 | 2 | 0 |
dudley 2 ERROR | 13 | 5 | 0 | 2 | 0 | 13 | 1 | 0 | 3 | 4 | 2 | 2 | 0 |
jeff f ERROR | 13 | 3 | 0 | 2 | 0 | 13 | 1 | 0 | 3 | 4 | 2 | 1 | 0 |
jp | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
moloch ERROR | 3 | 3 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
kaanchan ERROR | 13 | 3 | 0 | 2 | 0 | 13 | 1 | 0 | 3 | 4 | 2 | 1 | 0 |
dave | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Only 4 people submitted correct implementations (myself, Greg, JP, and Dave). [Note that the first version of this report accidentally omitted Dave; his email to me with an implementation had been misplaced, alas. Sorry Dave!] (Actually I'm not 100% sure about JP's since it does wacky stuff with absolute values which makes it harder to reason about, but the various submitted test suites haven't found obviously wrong stuff with it... Our Most Assiduous Reader is encouraged to prove the correctness of JP's.)
By far the most common implementation error was forgetting about integer overflow. Most of the incorrect implementations included this as a bug. Solutions typically looked something like:
int f(int a, int b, int c) { if (a <= 0 || b <= 0 || c <= 0) { return 4; } if (! (a + b > c && a + c > b && b + c > a)) { return 4; } if (a == b && b == c) { return 1; } if (a == b || b == c || a == c) { return 2; } return 3; }and the calculation of (e.g.) a + b > c was subject to overflow. There is a simple solution: notice that you can subtract b from both sides and preserve the truth of the expression, so rewrite the expression to the mathematically equivalent a > c - b which is guaranteed not to overflow (since the 3 inputs have already been verified to be positive by this point in the function).
Several folks took more unnecessarily complex approaches to dealing with overflow, e.g. JP's
if ( a <= abs(b-c) || b <= abs(a-c) || c <= abs(a-b) ) { return 4; }(which was thus one of the few needing to #include <math.h>) [Update: emails with JP and JeffS reveal that indeed it is harder to rigorously prove that the abs method works, but we were eventually able to prove it; there is an intuitive geometric argument that supports it easily. There was an interesting discussion of intuition and obviousness and elegance. Since non-absolute-value method is more concise, faster to execute, easier to prove correct, etc, I was befuddled that JP and JeffS found the abs more natural. Eventually I grokked where they were coming from: the visualization of subtracting one side from another and seeing if the remaining side is long enough to cover that remaining amount. Working from the definition of legality given (that the sum of 2 sides is greater than the third) was not making it obvious at all to me why one would be doing absolute values on differences.]
Sea Biscuit's looks tragically like it correctly (though needlessly complicatedly) handles overflow in testing for legality of side lengths, but sadly does the test too late, after the isosceles test, so that it mistakenly says 1 1 2 is isosceles instead of illegal:
if ((a == b) || (a == c) || (b == c)) return 2; if (((a<c)&&(b<c)&&(-c+a+b<=0)) || ((b<a)&&(c<a)&&(-a+b+c<=0)) || ((a<b)&&(c<b)&&(-b+a+c<=0))) return 4;
Then there was Moloch's resort to floating point and trigonometry:
int f_moloch( int side_a, int side_b, int side_c ) { double a = (double) side_a; double b = (double) side_b; double c = (double) side_c; double cos_gamma = ( pow(c,2) - pow(a,2) - pow(b,2) ) / (-2 * a * b); if ( side_a < 1 || side_b < 1 || side_c < 1 ) return 4; else if ( cos_gamma >= 1 || cos_gamma <= -1 ) return 4;
This solution fails on test cases such as my 2147483647 2147483647 1 2 (claiming this is an illegal triangle). I'm not sure, but I'll guess this might be due to floating point error, which would demonstrate the wisdom of staying with discrete math when possible.
Another over-engineered solution was Dudley's which bubble-sorted the 3 inputs into descending order before further analysis, leading to a much longer function than most. He later got the aha insight and submitted a second simpler implementation. Jim also sorted the inputs before further analysis.
Note there is a tendency to want to forget one of the necessary disjuncts in the isosceles tests (by analogy to only needing 2 conjuncts in the equilateral test), e.g.:
if (a == b && b == c) { return 1; } if (a == b || b == c) { return 2; }which I know several of us felt, but caught ourselves.
Another interesting mistake was Steve's if (a == b == c) return 1; instead of if (a == b && b == c) return 1; ... this is of course a classic C/C++ gotcha. Too bad, since he dealt with overflow correctly (though with twice as many tests as needed, essentially analogous to JP's abs method):
if (a <= (c-b)) return 4; if (b <= (c-a)) return 4; if (c <= (b-a)) return 4; if (a <= (b-c)) return 4; if (b <= (a-c)) return 4; if (c <= (a-b)) return 4;
Jim's code actually was uncompilable as submitted (missing semicolon, which I corrected). But I'll give him credit for making a template function so one could use other numeric types than int if desired.
I'd say Dave's correct code to classify (once legality was determined) is less clear than it could be:
if(a == b) { if(a == c) return EQUILATERAL_TRIANGLE; else return ISOSCELES_TRIANGLE; } else { if(a == b || a == c || b == c) return ISOSCELES_TRIANGLE; else return SCALENE_TRIANGLE; }E.g. note that the second a==b test can be omitted since it will never be true (since it's in the else clause of the main if(a==b) test). This is related to the confusing special treatment of an a==b isosceles triangle (as opposed to a==c or b==c). The more common variations on the following idea
if (a == b && b == c) { return 1; } if (a == b || b == c || a == c) { return 2; } return 3;seem more natural and readable to me. Anyone disagree? I've seen code style guidelines that suggest avoiding if statements inside both substatements of if statements, recommending instead doing a chain of tests rather than a branching tree of tests, and I find such advice to be useful. I think branching trees of tests often result from an attempt to optimize/minimize the number of conditions that get executed, at the expense of clarity. E.g. if Dave had omitted his redundant 2nd a==b test, then his solution would indeed be guaranteed never to test the same 2 sides for equality twice, unlike the more common answer. If the 2nd a==b was left in intentionally for clarity, then I submit the whole thing should just have been structured more like the common answer which seems much clearer still. I am glad Dave reminded me he'd sent in an implementation, since it coincidentally happened to affirm my claim that using unsigned ints to solve the overflow problem was a plausible approach...
Another example of redundant/cluttering code was in kanchaan's:
if ( len1 == len2 || len2 == len3 || len1 == len3 ) return 2; else // Scalene -> all 3 lengths are unequal if ( len1 != len2 && len2 != len3 && len1 != len3 ) return 3;Indeed a compiler might complain there is a code path that doesn't return a value in this solution. (It's an infeasible path: the scalene test will always succeed, so if we reach it, 3 will be returned.)
int f_russ_unsigned_error(int a, int b, int c) { // intentionally strange attempt to fix arithmetic overflow problem -- is this code correct or not? // a puzzle for the reader! :) // it seems to work, yet only Moloch's test suite found it to be buggy. unsigned int au = a, bu = b, cu = c; if (au <= 0 || bu <= 0 || cu <= 0) { return 4; } if (! (au + bu > cu && au + cu > bu && bu + cu > au)) { return 4; } if (au == bu && bu == cu) { return 1; } if (au == bu || au == cu || bu == cu) { return 2; } return 3; }
Why didn't my, Greg's, or Clay's suites detect the bugs in my unsigned implementation? I myself was sloppy and used -INT_MAX instead of INT_MIN in my input file! Amazingly, this laziness on my part (putting -2147483647 instead of -2147483648) caused my suite to miss the unsigned implementation bug, because -INT_MAX INT_MAX INT_MAX does indeed produce 4 as expected from the buggy unsigned function, whereas INT_MIN INT_MAX INT_MAX erroneously produces 2. Perhaps even more amazingly, Greg also made the same test suite lazy error!
To see why this is so, pretend we're using 4-bit instead of 32-bit integers. Then INT_MAX=0111 and INT_MIN=1000 and -INT_MAX=1001. Then -INT_MAX+INT_MAX=0000, so au+bu is not > cu=0111, and 4 is returned as expected. But INT_MIN+INT_MAX=1111 which is > cu=0111, so 4 is not returned, and the erroneous function goes on to decide the input is isosceles. Cool indeed! Who would have expected that such drastically different behavior could result from using -INT_MAX instead of INT_MIN? I am delighted to see it unexpectedly shown so clearly that testing the minimum value in a domain is crucial, even if this does show my own test suite was sloppy! :)
Had Greg & I used INT_MIN instead of -INT_MAX, we would have had correct test suites. Clay's test suite mysteriously used values that were a few integers away from INT_MIN and INT_MAX, e.g. his original text had 2 MAX_INT-4 MAX_INT-5 3 and MAX_INT-5 2 MAX_INT-4 3 -- more evidence that one should actually use the min and max legal values, rather than just values close to them.
Of course many suites did not detect the integer overflow bug. This confirms the folk wisdom that someone writing tests for their own code is likely to have the same blind spots when writing test as when writing code. It also suggests the usefulness of having a set of formal heuristics to guide test generation rather than doing random test cases or going by intuition.
A very illuminating lesson was that quality rather than quantity is important in a test suite. The 2 largest suites (343 and 5010 cases) did not perform very well. The best suites were mine (33), Greg (46), Clay (57), Steve (17), and Moloch (46). I suspect 30 or so tests are needed to feel good confidence in an implementation of this function.
PJ and kaanchan both did a fascinating tautological fallacy of using their implementation to automatically generate their test suite. Therefore their own implementations vacuously pass their own test suites! E.g. PJ systematically generated all combinations of input values from -1 to 5, making 73 = 343 test cases. I'm not sure what kaanchan did, but it looks like random generation of values, and he had 5010 tests! These are way larger suites than necessary, and what's worse, the suites are erroneous (both generating some false negatives, though interestingly neither generates a false positive because they happen not to have tried input cases that would generate overflow -- had they generated such input, their expected results would be wrong and cause some false positives as well). Note they failed to detect some buggy implementations, and totally "over-detected" some others (e.g. kaanchan's generating hundreds of failures on some of the buggy implementations, where of course only 1 failure is sufficient to point out that a bug exists.)
The lesson there is that tests cannot be created from the implementation. They need to be crafted independently of the implementation, else they'll have the same bugs as the implementation.
Further, I believe strongly that random testing is not nearly as useful as carefully crafted test cases. (I think random test generation can be useful in some circumstances, e.g. just to verify that an implementation doesn't crash. But to verify correct results being returned, you need a separate mechanism for determining the correct expected results, and you need more wisely chosen input values.) E.g. domain analysis should be done and test cases selected to cover the endpoints of domains. In particular, this problem has interesting partitions of the positive (legal) inputs and the nonpositive (illegal) input values for a given parameter, so there should be at least one test case covering each of the 4 interesting boundary conditions a==INT_MIN, a==0, a==1, a==INT_MAX, for example (a lesson I repeat here with egg on my face, given the lesson of the buggy unsigned function!)
There was a false positive test: Dudley sent a suite, then minutes later sent a corrected suite since his first had the bogus test case 7 7 15 2 which he changed to 7 7 13 2.
I highly recommend Brian Marick's The Craft Of Software Testing for further discussion of test case design.
Thanks to all who participated! This was a fun little experiment, and I hope you found it interesting.