Why generate test code from source code ?
Firstly - isn't that backwards?
Shouldn't the tests drive the development ? And so shouldn't we try to generate the source from tests?
No, it is forwards:
Generating tests from source does not necessarily mean "source-driven" development - A test-generator might be used after source development, which was driven from the "most significant" tests written by the coder. In this scenario the generator is used simply to increase coverage.
(Besides, lazy programmers like me are in greater need of a test-generator than the extreme programmers are of a source-generator.)
So, why is generating tests better than generating source?
I've tried writing tests, I'm not so good at it that I would trust the source I got from it. I would be constantly window jumping to touch up the source. I'd trust the code I write far more. (And I do not think I am unusual in this: most programmers with X years of experience have been practising writing source code for X years, not test code).
OTOH, I would be (initially) more inclined to trust the test code generated from source code because the process of generating it seems to be deterministic - we can determine the full set of tests for each condition in the source, and generation of test code should be easier than generation of source code because test code uses far fewer clichés than source code (80% of my own unit tests are "merely" lists of data, with driving loops).
However my trust of generated test code is conditional on solving the problem of combinatorial explosion - every source condition requires more than one test, and for each condition the tests are combined, not sequenced. Effectively a "complete" test is an instance of the Halting Problem. Hence automated test code should be judged on how it chooses what to test, as well as the effectiveness of the tests that are generated.
I wish to distinguish two kinds of testing: "macro-testing" and "micro-testing", which depends on how wide a context is used by the tester. Consider testing the following condition: "i == 5". A "micro-test" is one which generates test data of <5,>5 and 5 because 5 is the only immediately relevant datum. A "macro-test" might generate far more test data (or less), because a larger context shows that "i" could have a number of values each of which may need testing. Or a "macro-test" may generate no test data (because i always has the value 5, so the condition in the source code is redundant).
"Macro" and "micro" are gradations on a grey scale, not distinct categories, and the more "micro" a test is the easier it is to generate. For example: the "<5,>5 and 5" above are easy to find, whereas the context for a "fully macro" test is the full system (including libraries). The more macro-tests generated by the tester, the more confidence I shall have in it.
As we increase the amount of relevant context from micro to macro we move from a "normal" problem which might best be handled by a algorithmic/symbolic approach to another variety of the Halting Problem better suited to AI solutions, and I would suggest that a variety of methods will be needed to handle enough cases.
It would also seem sensible to tackle the easier problem first: once enough context has been "processed" for a macro test, it will still have to go through the same process of generating data for test cases as micro tests (except probably with more data for each test case)
Notes:
Programming is the only profession where laziness is a virtue.
How many cases is "enough": this is another instance of the Halting Problem, but shortcuts (aka heuristics) to solving it are available, for example: "Enough tests to generate 80% coverage of the source", "As many tests as it takes to test all the requirements".
Shouldn't the tests drive the development ? And so shouldn't we try to generate the source from tests?
No, it is forwards:
Generating tests from source does not necessarily mean "source-driven" development - A test-generator might be used after source development, which was driven from the "most significant" tests written by the coder. In this scenario the generator is used simply to increase coverage.
(Besides, lazy programmers like me are in greater need of a test-generator than the extreme programmers are of a source-generator.)
So, why is generating tests better than generating source?
I've tried writing tests, I'm not so good at it that I would trust the source I got from it. I would be constantly window jumping to touch up the source. I'd trust the code I write far more. (And I do not think I am unusual in this: most programmers with X years of experience have been practising writing source code for X years, not test code).
OTOH, I would be (initially) more inclined to trust the test code generated from source code because the process of generating it seems to be deterministic - we can determine the full set of tests for each condition in the source, and generation of test code should be easier than generation of source code because test code uses far fewer clichés than source code (80% of my own unit tests are "merely" lists of data, with driving loops).
However my trust of generated test code is conditional on solving the problem of combinatorial explosion - every source condition requires more than one test, and for each condition the tests are combined, not sequenced. Effectively a "complete" test is an instance of the Halting Problem. Hence automated test code should be judged on how it chooses what to test, as well as the effectiveness of the tests that are generated.
I wish to distinguish two kinds of testing: "macro-testing" and "micro-testing", which depends on how wide a context is used by the tester. Consider testing the following condition: "i == 5". A "micro-test" is one which generates test data of <5,>5 and 5 because 5 is the only immediately relevant datum. A "macro-test" might generate far more test data (or less), because a larger context shows that "i" could have a number of values each of which may need testing. Or a "macro-test" may generate no test data (because i always has the value 5, so the condition in the source code is redundant).
"Macro" and "micro" are gradations on a grey scale, not distinct categories, and the more "micro" a test is the easier it is to generate. For example: the "<5,>5 and 5" above are easy to find, whereas the context for a "fully macro" test is the full system (including libraries). The more macro-tests generated by the tester, the more confidence I shall have in it.
As we increase the amount of relevant context from micro to macro we move from a "normal" problem which might best be handled by a algorithmic/symbolic approach to another variety of the Halting Problem better suited to AI solutions, and I would suggest that a variety of methods will be needed to handle enough cases.
It would also seem sensible to tackle the easier problem first: once enough context has been "processed" for a macro test, it will still have to go through the same process of generating data for test cases as micro tests (except probably with more data for each test case)
Notes:
Programming is the only profession where laziness is a virtue.
How many cases is "enough": this is another instance of the Halting Problem, but shortcuts (aka heuristics) to solving it are available, for example: "Enough tests to generate 80% coverage of the source", "As many tests as it takes to test all the requirements".
0 Comments:
Post a Comment
<< Home