Test-Driven Hypocrisy? Who tests the test?
secretGeek .:dot Nuts about dot Net:.
home .: about .: sign up .: sitemap .: secretGeek RSS

Test-Driven Hypocrisy? Who tests the test?

An oft heard mantra in Test-Driven Development is "if it's not tested, it's broken" and I have to admit that this slogan makes me cringe -- and leads to some of my own hidden objections to TDD.

"If it's not tested it's broken" -- okay it's a blatant exaggeration -- yet this seems to be lost on a lot of people. What it really means is something more like:

"If it's not tested with unit tests then it's unlikely to be tested elsewhere and hence we'd be making a fairly safe bet, to assume that it contains bugs."

But somehow this has less punch than the exaggerated mantra

"If tested it ain't, broken it is."

I like exaggeration -- hell, i probably like it a thousand more times than you do -- but i get annoyed when people take exaggerations literally.

For kicks, let's apply the principle literally and see where it gets us:

Say you write some code. Oops. You should've written tests first. Your code's broke.

So now you write code to test your code.

Oops, the code you wrote to test your code is broke. Fool! You didn't write any tests to test the tests that test the code that broke.

Stack overflow. Goodnight.

Discuss.

(First -- thanks to Punky for mentioning this conundrum in the comments to the previous post)

Okay -- wait a moment -- i think i see what went wrong. Let's investigate.

In the following dialogue there are two parts:

Agl: An Agilista.
and Dvl: The Devil's Advocate.

Agl: We don't write tests to tests our unit tests -- so the 'stack overflow' doesn't occur.

Dvl: But that means your tests are broken -- by your own definition.

Agl: No it only means there's a chance that they're broken. So we work to make this chance as small as possible. Firstly, we write very simple unit tests -- the smallest things that can possibly break...

Dvl: Yeh yeh, small unit tests are less likely to contain bugs -- or likely to contain less bugs -- but you need lots of tests, thus lots of code and at least a few bugs. i say again, your unit tests are broke and the whole approach is bunk.

Agl: No, we start off with small unit tests that fail. On purpose -- we write the unit tests before we write the code. The tests fail at first. Then we write the simplest code we can to pass the unit tests. We run it again and now the tests pass. By doing that, we've gotten a hidden benefit -- we've tested our unit tests. And it's at that stage that we sometimes find our unit tests were broken.

Dvl: Nonsense -- just because the tests failed at first and then passed doesn't mean the tests were correct. It might be that there's a bug in both the test and the code being tested. I can easily write up a situation like that.

Agl: True, you can, and it's definitely possible. But it's all about probabilities. We wager that it's unlikely that a pair of bugs together will display that behaviour. And even if they do, we've reduced the likelihood of bugs by an order of magnitude. Certain combinations of bugs will slip through -- but that's a much better situation than the situation we had before, where all bugs slipped through.

Dvl: Weeeel, I'm unconvinced.

Agl: Look, just try it. Plenty of people have tried it and found that their projects are more succesful, with less bugs because of it.

Dvl: It's impossible to know that for certain unless they've performed a repeatable, placebo-controlled double-blinded experiment. And the economics of the situation dictate that no-one would ever pay for that kind of experimentation. If they have, show me the journal and i'll show you the experimental flaws.

Agl: You're just being an a**hat now. Talk to people who've used it and they'll tell you how effective it is.

Dvl: Oh I see, it's like magnets and crystals for healing. Anecdotal evidence a-plenty.

Agl: Then try it for yourself! People who try it seldon turn back.

Dvl: I've heard the same thing about heroine and nicotine. Maybe it creates a psychological dependency. Seems feasible. I wonder if being 'test-infected' would respond to OCD medication?

Agl (closing eyes, covering ears and shouting): SHUT UP! SHUT UP! SHUT UP!

lb: Okay... anyone got a better response?





'Haacked' on Mon, 12 Mar 2007 22:38:40 GMT, sez:

Hey Leon, you ever heard of the term "triangulation"? Why is it that sailor's carried three sextants with them (no, it's not for three times the fun)?

If you only have one, you have no idea if the measurement is faulty. If you have two, and they agree, chances are very good they are both working and not wrong for the same reason.

If you have three, then the two that match are almost certainly correct.

Unit tests are similar.

If a test fails, either the test or the code or both is wrong. But at least you know something is wrong.

However, if a decent test passes, chances are slim that both the code and test are faulty in just the right way to cause the test to pass (though it's possible).

Stack overflow avoided!



'Haacked' on Mon, 12 Mar 2007 22:41:20 GMT, sez:

p.s. another way to look at this is if you have one atomic clock, you should test it, no? But with what? Another atomic clock. Now if they both match, do you really need to test second atomic clock with a third?

Only if they are different do you worry.



'lb' on Mon, 12 Mar 2007 22:56:38 GMT, sez:

Hi Phil -- i came up with an idea a couple of years back called 'Duplicate Driven Programming' that relies on this same principle --

http://secretgeek.net/duplicate_op.asp



'lb' on Mon, 12 Mar 2007 23:16:03 GMT, sez:

This also reminds me -- something i've been tinkering on my spare time at work is refactoring a massive stored procedure used by a client of ours.

This thing is several thousand lines long, and terribly hard to understand, but a lot of money passes through it, and it works.

so i'm just quietly reworking it to be more maintainable, increase the agility etc.

To test that i haven't changed it's behaviour i have a particular database i run it against that puts it through its paces. this is real data taken from a real site that i've verified has full coverage of the stored procedure.

i then compare output of the 'refactored' stored procedure to the output from the original stored procedure, using WinMerge (a diff tool)

any better suggestions for refactoring (and testign changes to) a stored procedure?



'Haacked' on Mon, 12 Mar 2007 23:18:26 GMT, sez:

Heh heh. Duplicate Driven Development suffers from a key disadvantage to TDD. It doesn't take advantage of the law of large numbers.

http://en.wikipedia.org/wiki/Law_of_large_numbers

Suppose you and I are trying estimate the acceleration of a car by taking 10 measurements each. You sum up your 10 measurements of the car's position over time. I take mine. We then average our total result.

Or, after each measurement, we average that particular measurement. Then we sum up our averages.

So method 1 = average the sum.
method2 = sum the averages.

Which is going to be more accurate?



'Haacked' on Mon, 12 Mar 2007 23:22:02 GMT, sez:

Regarding your Stored Proc, sounds like you're on the right track with good test data and code coverage.

The only thing I would recommend is to think of the next guy that has to maintain that SP. Is there any small pieces of functionality you can pull into a UDF or smaller stored proc.

If you can refactor that SP so it's a series of calls to smaller logical units (basic functional decompsition) and then write tests for each of those smaller procs, you'll be helping the next person out a lot.

The problem with having one gargantuan test is that if something goes wrong, sometimes all you know is something went wrong, and tracking it down is a pain.

If you have the code well factored, then one of the "constituent" tests might fail as well as the big test, thus narrowing your focus.

Make sense?



'lb' on Mon, 12 Mar 2007 23:26:21 GMT, sez:

>a UDF or smaller stored proc

yep - the main thing i'm doing is breaking it down into UDF's and smaller sprocs.

but i've still got just this one monster test for the whole thing. maybe i can refactor my test into a set of tests for each independent part (but keep the monster test as well).



'Shog9' on Mon, 12 Mar 2007 23:35:04 GMT, sez:

We need bee watcher watcher watchers, damnit! Think of the Children!



'lb' on Mon, 12 Mar 2007 23:35:11 GMT, sez:

@haacked, re: Which is going to be more accurate?

well i'd reject your data out of principal and outsource a second opinion.

nah, i'm not sure what you're getting at here. i know that if you choose to average an average you have to look at the weighting of each average in the set. (you don't just sum em up and divide by the number of items)



'Haacked' on Mon, 12 Mar 2007 23:43:55 GMT, sez:

I would definitely keep the large test around.

My point is that if you and I are pair programming, we'll be correcting our "measurements" often.

Thus in the end, our product is superior in quality than if you and I went off, wrote it individually, then took the better project.

That's what I'm getting at by invoking the principle of large numbers. If we work independently, it's like we're taking one large measurement each and then comparing. Rather than comparing a lot of smaller measurements and thus arriving at a more accurate result.



'Haacked' on Mon, 12 Mar 2007 23:46:06 GMT, sez:

p.s. I should add, I'm not actually a Pair Programming advocate. I'm rather neutral on that. Never tried it. Not really in a position to since I work for a distributed company.

But I think the idea applies to TDD. If you test often during the project, it's better than a big testing push at the end.

Likewise, if we review each other's code often, it's better than if we worked independently and presented our results at the end.



'Kalpesh' on Tue, 13 Mar 2007 00:41:08 GMT, sez:

The quote should say "If it is not tested, it might break soon" - when there is a change in the software.



'PJW' on Tue, 13 Mar 2007 03:50:20 GMT, sez:

Bah,
"If it's not tested with unit tests then it's unlikely to be tested elsewhere and hence we'd be making a fairly safe bet, to assume that it contains bugs."
Since code is likely to change, if you can not change it easily then your code is broken.
Unit tests allow for the code to be change and the behavior to be verified to remain the same/correct, or nearly so.

The difficulty of production code is an order of magnitude more difficult then unit test code. As a result the number of bugs per useful unit of code(b/uuc) in unit tests grows nearly as O(n) while the number of b/ucc in production code can easily grow as O(n^2) or more(O(n^3),O(2^n)...). The same argument can be applied to modifying existing code.

Therefore, when n is very large nearly all of the bugs in the production code will be caught.

If the b/uuc of the production code in question is of order O(n) the benefits of unit testing is minimal. This is also true when n is small causing b/ucc for the unit test and the b/ucc for production code to approach one another.



'Owen' on Tue, 13 Mar 2007 03:53:43 GMT, sez:

Damn Leon, you really are stoking up a descussion. I Like it!

As with any mantra, and technique, and tool, people like to talk it up. TDD doesn't mean that we think that the code is broken if not tested although i have heard that phrase uttered a couple of times and each time it's been questioned! The idea is that if you can't prove it works then why bother? Would you rather that a unit test was repeatable was pretty certain to catch problems with current state of code and didn't really need much maintaining? Then plug in a CI process and you've got a constant level of Quality checking without having to worry too many people with more work.
The alternative is to code, have developers who just check that the code works by doing some simple tests (oh yeah the screen looks ok), that is until they become bored and don't bother any more.

There are usually no need for any real logic in a test (i would worry if there was as this would point to a highly nonorthogonal system) so the risk of defects is neglegable within the test. That said the whole idea can quickly become corrupted.



'lb' on Tue, 13 Mar 2007 04:56:53 GMT, sez:

@PJW:
>The difficulty of production code is an
>order of magnitude more difficult then
>unit test code

ah! this is true, and very relevant.

Here's an extreme example:

say you have written a facial recognition function. very difficult stuff, right. the unit tests for this would be quite simple.

for example,

send a picture of mahatma gandi to the function,
assert that the function returns "mahatma gandi".

the complexity of the unit test is very small, even though the function under test would be extremely complex.

great point PJW!

It's only because of this imbalance in difficulty between unit test code and 'production code' that TDD becomes feasible.






'Peter {faa780ce-0f0a-4c28-81d2-3667b71287fd}' on Tue, 13 Mar 2007 05:47:19 GMT, sez:

Leon, somewhere in the post above I think you discovered the secret to time travel. That, or a clean, renewable energy source. Either way, congratulations!



'Mike Woodhouse' on Tue, 13 Mar 2007 07:35:39 GMT, sez:

Your approach to the large-stored-proc problem (and I've been there myself and I sympathise) is, IIRC, summed up in Michael Feather's book, "Working Effectively With Legacy Code". The thing is that it's really a very different thing to introduce test-driven techniques to code that wasn't built that way in the first place. So you start from the outside in - wrap it and run it, capturing the results. Then you can start to refactor with greater confidence that you're not changing the function (since refactoring is only about improving design). You do need to keep checking that your test wrapper has enough data/paths to keep the confidence level high, though.

As regards TDD in general, I find it particularly when I'm exploring a requirement without having a clear idea how it's going to want to be built: the tests make me think about the thing in small chunks and then provide a security blanket when I need to refactor. And to be honest, I do find it rather fun, which is no bad thing after almost 30 years of programming.



'punky' on Tue, 13 Mar 2007 10:13:35 GMT, sez:

The discussion goodness continues :-)

Haacked,

While I appreciate the point of mutual checking between test and production code (indeed it's my experience that many bugs are found that way, and early on too), I will side with the man with the horns & hoof, and object to the atomic clock comparison. I don't think any of my code stands up to that particular standard, and hopefully I'm not that much worse than your average coder. In other words (plain and depressing ones at that), both my test code and my production code is much more likely to be wrong than an atomic clock is. And so the chance of unfortunate interplay between bugs leading a faulty test to pass might not be so slim after all. What's worse is that the green bar might prevent us from looking for those kinds of bugs (making them that much more insectice-resiliant), precisely because of the sense of security established by the mutual check.

Having said that, I think you're absolutely right from a pragmatic point of view :-)



'Scott' on Tue, 13 Mar 2007 10:21:02 GMT, sez:

LB, I'm doing a similar refactoring - sad part is I wrote the original stored procedure... under the gun and with much less knowledge than I have now.

But I'm not spending time refactoring just for the sake of refactoring - I spend a bit of time on it each time I am asked to add a feature or fix a bug.

If I had small fast tests available, I know this would be a much easier process. But, like Mike Woodhouse is pointing out, it's hard to shoehorn tests into a project that started without tests.



'Haacked' on Tue, 13 Mar 2007 23:50:46 GMT, sez:

@Punky Unfortunately, sometimes analogies can throw people off the main point.

The main point is not that your code and unit tests are finely tuned like an atomic clock.

For example, using Leon's earlier example. Suppose you write a method to perform facial recognition. You then write a unit test that feeds the code a pic of Ghandi and assert that the method returns "Ghandi".

How do you test your unit test? Well, if you get a passing result, then your unit test is probably fine. Why? What's the odds that your unit test is faulty, and that the code returns the exact faulty result your unit test expects?

However, if the test fails, now you have to manually dig in and figure whether it's the unit test, or the code in question that's at fault. There is manual work to be done by us as developers. ;)

This is what I mean by the atomic clock analogy. You don't have to write a test to test the test in this case. You'll have pretty good confidence that the test is ok if the test passes.

That's all I'm trying to say.



'Haacked' on Tue, 13 Mar 2007 23:54:50 GMT, sez:

@Punky... I should mention, I've had a faulty tests pass once. But it was so incomprehensibly mind boggling unlikely, I ran out and bought a lottery ticket. The point is, it can happen, but the chances are amazingly slim if you follow the Red Green Factor.

Red Green Factor. Make sure the test fails first (red). Write the code to make it pass. Make sure it passes (green).

Also, as another commenter mentioned, most unit tests are much easier to write than production code (not all, you should see the hackery I do to simulate HttpContext). If you keep the tests small, you're less like to have just the type of bug that conspires with your code to give you a false green positive.



'lb' on Wed, 14 Mar 2007 01:34:56 GMT, sez:

>most unit tests are much easier to write
>than production code (not all, you should
>see the hackery I do to simulate
>HttpContext).

okay -- for this kind of thing I think you want to refactor the complexity out of your unit test.

have the unit test very very simple -- but let it rely on your 'httpContextSimulator' class (for instance).

then you can use the 'httpContextSimulator' class over and over in various unit tests (it's a one off investment).

and you'll want other unit tests to check that 'httpContextSimulator' behaves the way you expect. this might sound like a case of 'testing the tests' -- but it's not: it's a case of testing the unit testing framework.

the unit testing framework ideally would already have all the features you want, fully tested by other people -- and this provides an economy of scale, most of the time.



'punky' on Wed, 14 Mar 2007 12:08:06 GMT, sez:

Haacked,

Aye, I was never really opposed to the main point, I was just nit-picking about the images of fault-proofness conjured up in my head as a response to the atomic clock analogy - they seemed to rub off on the real subject matter, somehow.

I agree that oftentimes, unit tests will be much simpler than the code they test. For more complex cases, I find myself drifting towards a data-driven approach similar to the Ghandi example, although I sometimes get the impression that this is regarded by purists as non-unit-testy (?). You know, don't rely on anything that requires hitting the file system.

Disregarding that, when e.g. testing operations on graph structures, I find it tedious (and error-prone) to manually construct a graph node by node; I'd rather specify my input data succinctly in a file. On a similar note, sometimes I wish I could assert that the output of my code is equal to some specified file (think diff).

Another problem I've been pondering about is: how do verify that e.g. an image manipulation algorithm correctly manipulates the source image? It's a chicken-and-egg kind of situation: I cannot verify that my manipulation works as intended until I obtain, through programming or praying (think Google), a correctly manipulated image.

I apologize for this comment turning into a brain dump, but that's how it goes.



'Eric' on Thu, 15 Mar 2007 19:08:30 GMT, sez:

I think you're missing the coupling between code and tests.

If I have code without tests, I have no automated way to know whether it works (or continues to work).

Under TDD, my goal is to write a test that will test whether the code works. I run the test, and it fails.

That doesn't show that my test is correct, but it shows that it's testing something. Then I write the code to get the test to pass.

At that point, I have a coupled set of code and test that is known to work.

What can happen?

Well, a bug could show up in the main code. That should break a test if I have good test coverage.

Or, a bug could show up in the test code. This is pretty unlikely, because test code doesn't get revised very often and it's simple to start with.

One can posit a case where the verification part of test code works at the beginning but because of some change it now always passes.

But that's a pretty unlikely occurence.

Or, to put it another way, you ask "what verifies that the unit tests don't have bugs?"

The answer is "the code that they are testing..."



'J. B. Rainsberger' on Mon, 14 May 2007 15:40:18 GMT, sez:

It's simple: the tests test the code, the code tests the tests and the brain helps us screw the same thing up twice in a row.



'J. B. Rainsberger' on Mon, 14 May 2007 15:41:18 GMT, sez:

Good example: if I'd written a test for that comment, I'd see I flipped a boolean, like I always do.

"...and the brain helps us /avoid/ screwing up the same thing twice in a row."

Yes. I see the irony.



'lb' on Mon, 14 May 2007 20:42:40 GMT, sez:

@J.B. Rainsberger:
Yeh -- i liked the first version best:

>and the brain helps us screw the same >thing up twice in a row



'J. B. Rainsberger' on Sun, 01 Jul 2007 03:53:15 GMT, sez:

@lb

Me, too. The first version was both funny and wrong, and therefore doubly funny. There's nothing like looking stupid to undermine one's point.



'portraits art' on Wed, 28 Nov 2007 00:08:33 GMT, sez:

That’s very true. It’s the same thing that I used to tell the people here in our place. It’s really weird that some in-house programmers would test their codes or programs using their own codes which aren’t even tested. But just like other scientific procedures, are there any list of standardized codes which can be used to test initial codes?




name


website (optional)


enter the word:
 

comment (HTML not allowed)


All viewpoints welcome. Incivility is not tolerated, such comments are deleted.

 

I'm the co-author of TimeSnapper, a life analysis system that stores and plays-back your computer use. It makes timesheet recording a breeze, helps you recover lost work and shows you how to sharpen your act.

 

NimbleText - FREE text manipulation and data extraction

NimbleText is a Powerful FREE Tool

I wrote this, and use it every day for:

  • extracting data from text
  • manipulating text
  • generating code

It makes you look awesome. You should use NimbleText, you handsome devil!

 

Articles

The Canine Pyramid The Canine Pyramid
Humans: A Tragedy. Humans: A Tragedy.
ACK! ACK!
OfficeQuest... Gamification for the Office Suite OfficeQuest... Gamification for the Office Suite
New product launch: NimbleSET New product launch: NimbleSET
Programming The Robot from Diary of a Wimpy Kid Programming The Robot from Diary of a Wimpy Kid
Happy new year 2014 Happy new year 2014
Downtime as a service Downtime as a service
The Shape of Your Irrationality The Shape of Your Irrationality
This is why I don't go to nice restaurants any more. This is why I don't go to nice restaurants any more.
A flowchart of what programmers do at work all day A flowchart of what programmers do at work all day
The Telepresent Man. The Telepresent Man.
Interview with an Ex-Microsoftie. Interview with an Ex-Microsoftie.
CRUMBS! Commandline navigation tool for Powershell CRUMBS! Commandline navigation tool for Powershell
Little tool for making Amazon affiliate links Little tool for making Amazon affiliate links
Extracting a Trello board as markdown Extracting a Trello board as markdown
hgs: Manage Lots of Mercurial Projects Simultaneously hgs: Manage Lots of Mercurial Projects Simultaneously
You Must Get It! You Must Get It!
AddDays: A Very Simple Date Calculator AddDays: A Very Simple Date Calculator
Google caught in a lie. Google caught in a lie.
NimbleText 2.0: More Than Twice The Price! NimbleText 2.0: More Than Twice The Price!
A Computer Simulation of Creative Work, or 'How To Get Nothing Done' A Computer Simulation of Creative Work, or 'How To Get Nothing Done'
NimbleText 1.9 -- BoomTown! NimbleText 1.9 -- BoomTown!
Line Endings. Line Endings.
**This** is how you pivot **This** is how you pivot
Art of the command-line helper Art of the command-line helper
Go and read a book. Go and read a book.
Slurp up mega-traffic by writing scalable, timeless search-bait Slurp up mega-traffic by writing scalable, timeless search-bait
Do *NOT* try this Hacking Script at home Do *NOT* try this Hacking Script at home
The 'Should I automate it?' Calculator The 'Should I automate it?' Calculator

Archives Complete secretGeek Archives

TimeSnapper -- Automated Screenshot Journal TimeSnapper: automatic screenshot journal

25 steps for building a Micro-ISV 25 steps for building a Micro-ISV
3 minute guides -- babysteps in new technologies: powershell, JSON, watir, F# 3 Minute Guide Series
Universal Troubleshooting checklist Universal Troubleshooting Checklist
Top 10 SecretGeek articles Top 10 SecretGeek articles
ShinyPower (help with Powershell) ShinyPower
Now at CodePlex

Realtime CSS Editor, in a browser RealTime Online CSS Editor
Gradient Maker -- a tool for making background images that blend from one colour to another. Forget photoshop, this is the bomb. Gradient Maker



[powered by Google] 

How to be depressed How to be depressed
You are not inadequate.



Recommended Reading


the little schemer


The Best Software Writing I
The Business Of Software (Eric Sink)

Recommended blogs

Jeff Atwood
Joseph Cooney
Phil Haack
Scott Hanselman
Julia Lerman
Rhys Parry
Joel Pobar
OJ Reeves
Eric Sink

InfoText - amazing search for SharePoint
LogEnvy - event logs made sexy
Computer, Unlocked. A rapid computer customization resource
Aussie Bushwalking
BrisParks :: best parks for kids in brisbane
PhysioTec, Brisbane Specialist Physiotherapy & Pilates
 
home .: about .: sign up .: sitemap .: secretGeek RSS .: © Leon Bambrick 2006 .: privacy

home .: about .: sign up .: sitemap .: RSS .: © Leon Bambrick 2006 .: privacy