Triangle Classification Problem

[Updated 2002/08/29 by adding Dave's lost implementation to the test suite and report, as well as further discussion of various implementations.]

Inspired by Glenford Myers's The Art of Software Testing, I did an experiment in July 2002 of having people supply implementations and/or test suites for the following small problem:

Given a function int f(int a, int b, int c) which takes 3 inputs that represent the lengths of sides of triangles, return an integer where the result
Participants wisely sought clarification on legality: define it as sides must be positive length and the sum of any 2 sides must be greater than the third side. The goal of this little project was to see what sorts of mistakes people might make in implementing a small but not totally trivial function, and what sorts of test suites people might come up with.

Thanks to cow-orker Greg who supplied a test driver I further hacked, as well as several intentionally wrong-but-plausible implementations to help test the test suites. I built a few intentionally wrong-but-plausible implementations as well. One of them (the one using unsigned ints) turned out to be very subtle indeed as we shall see (and I owe cow-orker Moloch thanks for proving that it was indeed a buggy implementation as I intended it to be). The other participants were all RussCon gaming buddies, plus Moloch's friend kaanchan.

Click here for a zip file with the cpp source file (containing test driver and all the submitted functions) and all the supplied test suites.

Results

This table shows number of triggered test failures for each test suite versus each implementation. The pink implementations are buggy. So each test suite should have a positive number in a pink cell and a 0 in a white cell, if the test suite is good. A zero in a pink cell is a false negative: the suite failed to detect a buggy implementation. A positive number in a white cell is a false positive: the suite claims there is a bug when there is none.
russtest.txt 33 testsgregtests 46 testsstevetestcases 17 testsstevetestcases2 17 testspjtests 343 testsclaytests 57 testsdudleytests 14 testsdudleytests2 14 testsjimpartialtests.txt 4 testsseabiscuitpartialtests.txt 5 testsjptests.txt 30 testsmoloch_tests.txt 46 testskaanchan_tests.txt 5010 tests
russ0000001000000
russ ac ERROR3111143101121466
russ ab ERROR3611143220131426
russ bc ERROR3111143101121466
russ typo lessthan ERROR36013064300361
russ unsigned ERROR0000001000030
greg0000001000000
greg isoceles ERROR38221861110330
greg ignores ERROR1200113200010
greg overflow ERROR122020121023200
steve ERROR3511832111121233
pj ERROR133020131034210
sea biscuit ERROR38221861110330
jim 1 ERROR132010101034210
jim 2 (fixed bug) ERROR132010101034210
dudley ERROR134010101034220
dudley 2 ERROR135020131034220
jeff f ERROR133020131034210
jp0000001000000
moloch ERROR3301021000100
kaanchan ERROR133020131034210
dave0000001000000

Implementation issues

Only 4 people submitted correct implementations (myself, Greg, JP, and Dave). [Note that the first version of this report accidentally omitted Dave; his email to me with an implementation had been misplaced, alas. Sorry Dave!] (Actually I'm not 100% sure about JP's since it does wacky stuff with absolute values which makes it harder to reason about, but the various submitted test suites haven't found obviously wrong stuff with it... Our Most Assiduous Reader is encouraged to prove the correctness of JP's.)


By far the most common implementation error was forgetting about integer overflow. Most of the incorrect implementations included this as a bug. Solutions typically looked something like:

int f(int a, int b, int c)
{
    if (a <= 0 || b <= 0 || c <= 0) {
        return 4;
    }
    if (! (a + b > c && a + c > b && b + c > a)) {
        return 4;
    }
    if (a == b && b == c) {
        return 1;
    }
    if (a == b || b == c || a == c) {
        return 2;
    }
    return 3;
}
and the calculation of (e.g.) a + b > c was subject to overflow. There is a simple solution: notice that you can subtract b from both sides and preserve the truth of the expression, so rewrite the expression to the mathematically equivalent a > c - b which is guaranteed not to overflow (since the 3 inputs have already been verified to be positive by this point in the function).

Several folks took more unnecessarily complex approaches to dealing with overflow, e.g. JP's

	if ( a <= abs(b-c) || b <= abs(a-c) || c <= abs(a-b) )
	{
		return 4;
	}
(which was thus one of the few needing to #include <math.h>) [Update: emails with JP and JeffS reveal that indeed it is harder to rigorously prove that the abs method works, but we were eventually able to prove it; there is an intuitive geometric argument that supports it easily. There was an interesting discussion of intuition and obviousness and elegance. Since non-absolute-value method is more concise, faster to execute, easier to prove correct, etc, I was befuddled that JP and JeffS found the abs more natural. Eventually I grokked where they were coming from: the visualization of subtracting one side from another and seeing if the remaining side is long enough to cover that remaining amount. Working from the definition of legality given (that the sum of 2 sides is greater than the third) was not making it obvious at all to me why one would be doing absolute values on differences.]

Sea Biscuit's looks tragically like it correctly (though needlessly complicatedly) handles overflow in testing for legality of side lengths, but sadly does the test too late, after the isosceles test, so that it mistakenly says 1 1 2 is isosceles instead of illegal:

   if ((a == b) || (a == c) || (b == c))
      return 2;
   if (((a<c)&&(b<c)&&(-c+a+b<=0)) ||
       ((b<a)&&(c<a)&&(-a+b+c<=0)) ||
       ((a<b)&&(c<b)&&(-b+a+c<=0)))
      return 4;

Then there was Moloch's resort to floating point and trigonometry:

int f_moloch( int side_a, int side_b, int side_c )
{
	double a = (double) side_a;
	double b = (double) side_b;
	double c = (double) side_c;

	double cos_gamma = ( pow(c,2) - pow(a,2) - pow(b,2) ) / (-2 * a * b);

	if ( side_a < 1 || side_b < 1 || side_c < 1 )
		return 4;

	else if ( cos_gamma >= 1 || cos_gamma <= -1 )
		return 4;

This solution fails on test cases such as my 2147483647 2147483647 1 2 (claiming this is an illegal triangle). I'm not sure, but I'll guess this might be due to floating point error, which would demonstrate the wisdom of staying with discrete math when possible.


Another over-engineered solution was Dudley's which bubble-sorted the 3 inputs into descending order before further analysis, leading to a much longer function than most. He later got the aha insight and submitted a second simpler implementation. Jim also sorted the inputs before further analysis.


Note there is a tendency to want to forget one of the necessary disjuncts in the isosceles tests (by analogy to only needing 2 conjuncts in the equilateral test), e.g.:

    if (a == b && b == c) {
        return 1;
    }
    if (a == b || b == c) {
        return 2;
    }
which I know several of us felt, but caught ourselves.

Another interesting mistake was Steve's if (a == b == c) return 1; instead of if (a == b && b == c) return 1; ... this is of course a classic C/C++ gotcha. Too bad, since he dealt with overflow correctly (though with twice as many tests as needed, essentially analogous to JP's abs method):

  if (a <= (c-b)) return 4; 
  if (b <= (c-a)) return 4;
  if (c <= (b-a)) return 4;
  if (a <= (b-c)) return 4; 
  if (b <= (a-c)) return 4;
  if (c <= (a-b)) return 4;

Jim's code actually was uncompilable as submitted (missing semicolon, which I corrected). But I'll give him credit for making a template function so one could use other numeric types than int if desired.


I'd say Dave's correct code to classify (once legality was determined) is less clear than it could be:

	if(a == b)
	{
		if(a == c)
			return EQUILATERAL_TRIANGLE;
		else
			return ISOSCELES_TRIANGLE;
	}
	else
	{
		if(a == b || a == c || b == c)
			return ISOSCELES_TRIANGLE;
		else
			return SCALENE_TRIANGLE;
	}
E.g. note that the second a==b test can be omitted since it will never be true (since it's in the else clause of the main if(a==b) test). This is related to the confusing special treatment of an a==b isosceles triangle (as opposed to a==c or b==c). The more common variations on the following idea
    if (a == b && b == c) {
        return 1;
    }
    if (a == b || b == c || a == c) {
        return 2;
    }
    return 3;
seem more natural and readable to me. Anyone disagree? I've seen code style guidelines that suggest avoiding if statements inside both substatements of if statements, recommending instead doing a chain of tests rather than a branching tree of tests, and I find such advice to be useful. I think branching trees of tests often result from an attempt to optimize/minimize the number of conditions that get executed, at the expense of clarity. E.g. if Dave had omitted his redundant 2nd a==b test, then his solution would indeed be guaranteed never to test the same 2 sides for equality twice, unlike the more common answer. If the 2nd a==b was left in intentionally for clarity, then I submit the whole thing should just have been structured more like the common answer which seems much clearer still. I am glad Dave reminded me he'd sent in an implementation, since it coincidentally happened to affirm my claim that using unsigned ints to solve the overflow problem was a plausible approach...

Another example of redundant/cluttering code was in kanchaan's:

        if ( len1 == len2 || len2 == len3 || len1 == len3 )
            return 2;

        else    // Scalene -> all 3 lengths are unequal
        if ( len1 != len2 && len2 != len3 && len1 != len3 )
            return 3;
Indeed a compiler might complain there is a code path that doesn't return a value in this solution. (It's an infeasible path: the scalene test will always succeed, so if we reach it, 3 will be returned.)

Test suite issues

Perhaps surprisingly, nobody submitted entirely correct test suites (i.e. which detected erroneous implementations and didn't reject correct ones). Ignoring my interesting unsigned implementation, 3 suites were otherwise correct: me (33 tests) and Greg (46 tests) and Clay (57 tests). I had begun to think I'd accidentally created a correct implementation with my unsigned function! However, my unsigned implementation incorrectly classifies input INT_MIN INT_MAX INT_MAX as isosceles instead of as illegal. Only Moloch's test suite detected this! Sadly Moloch's suite accepts other buggy implementations. This goes to show the surprising difficulty of testing even a simple function like the triangle classification function. I reproduce this interesting implementation here; my reasoning in writing it was that one can add 2 nonnegative signed ints represented as unsigned ints and know that there will be no overflow, so one might write the function with this strategy and forget that the tests for negative values will then be broken if done after casting to unsigned. Thus I think this is a "fair" plausible buggy implementation.
int f_russ_unsigned_error(int a, int b, int c)
{
    // intentionally strange attempt to fix arithmetic overflow problem -- is this code correct or not?
    // a puzzle for the reader! :)
    // it seems to work, yet only Moloch's test suite found it to be buggy.
    unsigned int au = a, bu = b, cu = c;
    if (au <= 0 || bu <= 0 || cu <= 0) {
        return 4;
    }
    if (! (au + bu > cu && au + cu > bu && bu + cu > au)) {
        return 4;
    }
    if (au == bu && bu == cu) {
        return 1;
    }
    if (au == bu || au == cu || bu == cu) {
        return 2;
    }
    return 3;
}

Why didn't my, Greg's, or Clay's suites detect the bugs in my unsigned implementation? I myself was sloppy and used -INT_MAX instead of INT_MIN in my input file! Amazingly, this laziness on my part (putting -2147483647 instead of -2147483648) caused my suite to miss the unsigned implementation bug, because -INT_MAX INT_MAX INT_MAX does indeed produce 4 as expected from the buggy unsigned function, whereas INT_MIN INT_MAX INT_MAX erroneously produces 2. Perhaps even more amazingly, Greg also made the same test suite lazy error!

To see why this is so, pretend we're using 4-bit instead of 32-bit integers. Then INT_MAX=0111 and INT_MIN=1000 and -INT_MAX=1001. Then -INT_MAX+INT_MAX=0000, so au+bu is not > cu=0111, and 4 is returned as expected. But INT_MIN+INT_MAX=1111 which is > cu=0111, so 4 is not returned, and the erroneous function goes on to decide the input is isosceles. Cool indeed! Who would have expected that such drastically different behavior could result from using -INT_MAX instead of INT_MIN? I am delighted to see it unexpectedly shown so clearly that testing the minimum value in a domain is crucial, even if this does show my own test suite was sloppy! :)

Had Greg & I used INT_MIN instead of -INT_MAX, we would have had correct test suites. Clay's test suite mysteriously used values that were a few integers away from INT_MIN and INT_MAX, e.g. his original text had 2 MAX_INT-4 MAX_INT-5 3 and MAX_INT-5 2 MAX_INT-4 3 -- more evidence that one should actually use the min and max legal values, rather than just values close to them.


Of course many suites did not detect the integer overflow bug. This confirms the folk wisdom that someone writing tests for their own code is likely to have the same blind spots when writing test as when writing code. It also suggests the usefulness of having a set of formal heuristics to guide test generation rather than doing random test cases or going by intuition.


A very illuminating lesson was that quality rather than quantity is important in a test suite. The 2 largest suites (343 and 5010 cases) did not perform very well. The best suites were mine (33), Greg (46), Clay (57), Steve (17), and Moloch (46). I suspect 30 or so tests are needed to feel good confidence in an implementation of this function.

PJ and kaanchan both did a fascinating tautological fallacy of using their implementation to automatically generate their test suite. Therefore their own implementations vacuously pass their own test suites! E.g. PJ systematically generated all combinations of input values from -1 to 5, making 73 = 343 test cases. I'm not sure what kaanchan did, but it looks like random generation of values, and he had 5010 tests! These are way larger suites than necessary, and what's worse, the suites are erroneous (both generating some false negatives, though interestingly neither generates a false positive because they happen not to have tried input cases that would generate overflow -- had they generated such input, their expected results would be wrong and cause some false positives as well). Note they failed to detect some buggy implementations, and totally "over-detected" some others (e.g. kaanchan's generating hundreds of failures on some of the buggy implementations, where of course only 1 failure is sufficient to point out that a bug exists.)

The lesson there is that tests cannot be created from the implementation. They need to be crafted independently of the implementation, else they'll have the same bugs as the implementation.

Further, I believe strongly that random testing is not nearly as useful as carefully crafted test cases. (I think random test generation can be useful in some circumstances, e.g. just to verify that an implementation doesn't crash. But to verify correct results being returned, you need a separate mechanism for determining the correct expected results, and you need more wisely chosen input values.) E.g. domain analysis should be done and test cases selected to cover the endpoints of domains. In particular, this problem has interesting partitions of the positive (legal) inputs and the nonpositive (illegal) input values for a given parameter, so there should be at least one test case covering each of the 4 interesting boundary conditions a==INT_MIN, a==0, a==1, a==INT_MAX, for example (a lesson I repeat here with egg on my face, given the lesson of the buggy unsigned function!)


There was a false positive test: Dudley sent a suite, then minutes later sent a corrected suite since his first had the bogus test case 7 7 15 2 which he changed to 7 7 13 2.


I highly recommend Brian Marick's The Craft Of Software Testing for further discussion of test case design.

Thanks to all who participated! This was a fun little experiment, and I hope you found it interesting.

Valid HTML 4.01! Return to Russ Zone