Testing has been describes as an art (The Art of Software Testing by Glenford Myers), a craft (The Craft of Software Testing by Brian Marick), and a process (Effective Methods for Software Testing by William E. Perry), but I would like to examine another aspect of testing, that is, the Science of Software Testing.

A Brief Background

I graduated college with a Bachelor of Science degree as a Math major, which was accidental. I started out as an electrical engineering major but changed late in my Junior year when I discovered that EE really wasn't as appealing to me as I had originally thought it would be. In my schooling, I was trained in the traditional scientific method, which affects how I see things.

The Traditional Scientific Method

The traditional scientific method has been the predominant method for people to observe and understand the operation of world and the universe. In recent years, some scientists have developed methods that are less rigorous, but the traditional method is what I will use as the basis for this article. The steps in the traditional method are:

1. Observe some aspect of the universe.

2. Invent a theory that is consistent with what you have observed.

3. Use the theory to make predictions.

4. Test those predictions by experiments or further observations.

5. Modify the theory in the light of your results.

 6. Go to step 3.

There are differing views among scientists today as to what constitutes a theory, a hypothesis and a fact. How someone defines these terms can greatly affect their view of science. To fully expound on these differing views of the scientific method and how the terms are defined is beyond the scope of this article. It is important, however, to understand there is often a level of bias since people will hold certain definitions that are consistent with their beliefs of how the world operates. This is circular reasoning, because if I am trying to explain how and why something happens, it will be based in some degree of a framework that I believe in already. Therefore, one of the great challenges of science is to maintain objectivity.

For the purpose of defining the working definitions of this article I will outline the following terms, which I do not propose to be perfect or accepted by everyone. The source is Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.

Observation - "(a) The act of recognizing and noting some fact or occurrence in nature, as an aurora, a corona, or the structure of an animal. (b) Specifically, the act of measuring, with suitable instruments, some magnitude, as the time of an occultation, with a clock; the right ascension of a star, with a transit instrument and clock; the sun's altitude, or the distance of the moon from a star, with a sextant; the temperature, with a thermometer, etc. (c) The information so acquired.

Note: When a phenomenon is scrutinized as it occurs in nature, the act is termed an observation. When the conditions under which the phenomenon occurs are artificial, or arranged beforehand by the observer, the process is called an experiment. Experiment includes observation."

Experiment - "1. A trial or special observation, made to confirm or disprove something doubtful; esp., one under conditions determined by the experimenter; an act or operation undertaken in order to discover some unknown principle or effect, or to test, establish, or illustrate some suggest or known truth; practical test; poof."

Hypothesis - "1. A supposition; a proposition or principle which is supposed or taken for granted, in order to draw a conclusion or inference for proof of the point in question; something not proved, but assumed for the purpose of argument, or to account for a fact or an occurrence; as, the hypothesis that head winds detain an overdue steamer.

2. (Natural Science) A tentative theory or supposition provisionally adopted to explain certain facts, and to guide in the investigation of others; hence, frequently called a working hypothesis."

Assumption - "The thing supposed; a postulate, or proposition assumed; a supposition."

Theory - "1. A doctrine, or scheme of things, which terminates in speculation or contemplation, without a view to practice; hypothesis; speculation.

2. An exposition of the general or abstract principles of any science; as, the theory of music.

3. The science, as distinguished from the art; as, the theory and practice of medicine.

4. The philosophical explanation of phenomena, either physical or moral; as, Lavoisier's theory of combustion; Adam Smith's theory of moral sentiments."

Fact - "2. An effect produced or achieved; anything done or that comes to pass; an act; an event; a circumstance.

3. Reality; actuality; truth; as, he, in fact, excelled all the rest; the fact is, he was beaten.

4. The assertion or statement of a thing done or existing; sometimes, even when false, improperly put, by a transfer of meaning, for the thing done, or supposed to be done; a thing supposed or asserted to be done; as, history abounds with false facts."

Law - "5. In philosophy and physics: A rule of being, operation, or change, so certain and constant that it is conceived of as imposed by the will of God or by some controlling authority; as, the law of gravitation; the laws of motion; the law heredity; the laws of thought; the laws of cause and effect; law of self-preservation.

6. In mathematics: The rule according to which anything, as the change of value of a variable, or the value of the terms of a series, proceeds; mode or order of sequence.

7. In arts, works, games, etc.: The rules of construction, or of procedure, conforming to the conditions of success; a principle, maxim; or usage; as, the laws of poetry, of architecture, of courtesy, or of whist."

The Science of Software Testing

Some testing methods are performed at a "junk science" level, which is often based on small sample sizes and poorly controlled or documented experiments. In software development, this is usually called the demo and is performed by executing the software with constructed test cases that are known in advance to work.

Rigorous testing, on the other hand, is based on observing the difference between the actual behavior and the expected behavior of the software to be tested (the hypothesis). Testing should be seen as both verification (testing against specifications) and validation (testing against the real world). Both verification and validation are needed because specifications aren't perfect.

Aspects of the Science of Software Testing

Pre-definition of Expected Results

Pre-definition of expected results is similar to the scientist that predicts the outcome of an experiment before it is performed by proposing a hypothesis. There is something about predicting the outcome in advance that adds a degree of rigor to the findings. If you wait until the experiment is over and try to interpret the results in light of your understanding and observation, it is easy to convince yourself and others that what you observed was a validation of your hypothesis after all, all things considered. When the actual results of the experiment do not match your pre-defined expected results, the discrepancy should lead you to question the experiment, the hypothesis, or both.

Observation

Without observation, it is impossible to tell the outcome of a test or an experiment. Although this makes sense, it is tempting to design tests and experiments that are difficult if not impossible to observe. We may want to prove or test something, but real-world constraints prevent constructing an accurate experiment. That's why you can't test everything – not everything is testable.

Repeatability

In science, an experiment may be performed thousands of times before a trend can be established. The first time a result is observed the scientist isn't sure if the result was due to an unknown aspect of the experiment or a predictable behavior of the subject. To provide a confirmation of the experiment, it may need to be repeated many times. Likewise, in testing, when a defect is observed, the first test may be seen as the indicator and follow-up tests may be seen as the confirmation. After a defect has been fixed, the test must be repeated exactly as before to ensure the fix works. Although this sounds simple, it may be very difficult in actual practice to get the second test environment set up exactly as the first test environment.

Construction of the Experiment

In scientific research, experiments are carefully planned and controlled. The laboratory environment grew out of the need to prove conditions during an experiment and to repeat the experiment. In this analogy of testing as science, what many people do is perform experiments in their kitchen, not in a controlled laboratory environment. This is a critical lapse, as the test environment can impact many external and internal factors of the test which could very well lead to false test results. I would venture to say that no other discipline could get away with the lax methods used in many software tests, especially where environments are concerned.

In testing, carefully constructed and controlled environments are sometimes needed to get the level of test reliability that is appropriate to the risk. Your test is only as good as the test environment!

Performing the Experiment

The performance of the experiment is an exercise in carefully following the design of the experiment. The research scientist doesn't improvise unless they are doing work apart from the plan. Granted, some of the great scientific discoveries have occurred because the researcher tried something other than the planned experiment, but these are exceptions rather than the rule. In testing it is important to stick to your test plan. It's alright to test other cases, just be sure you document what you did so you can repeat the test if you have to.

Having a Control Group

In scientific experiments control groups are used as a baseline for comparison of results. For example, a researcher might test a trial medication on one group of people while giving a placebo to another group. The people in the experiment do not know if they have been given the real medication or the placebo. This "double blind" research helps to counteract subconscious biases. In testing we also need a baseline of correct system behavior as a baseline. Interpreting Results and Drawing Conclusions One of the great challenges of science is to observe the tests of a hypothesis and make a reasoned interpretation of the results. The challenge of doing this task is maintaining objectivity and having the courage to report what you actually observed as opposed what someone else expected to see. Gee, that sounds familiar. In testing, you can only speak to what you have observed. It is unrealistic and unwise to predict results from what could be seen from tests not performed.

Modifying the Hypothesis

In testing, the main hypothesis is often that the system should work under given conditions. However, there is another opposing hypothesis that although the system should work, there are defects in it that need to be found. The second hypothesis is the safest one.

When your test results prove the second hypothesis is true, then a fundamental shift starts to occur in the minds and attitudes of those who held the first hypothesis. This is when many people instead of modifying the hypothesis try to discredit or invalidate the experiment. This often takes the form of blaming the testers for the defects, which is like blaming a research scientist for the results of a correctly performed experiment.

However, let's say for a moment that people reach agreement that the first hypothesis was wrong and that the software does have defects and needs to be fixed. This may imply that the software will be delivered late and other people will be held accountable. Although these consequences may occur, people need to face reality and correct the problems instead of focusing on their own agendas.

Perhaps this aspect of testing is most closely aligned to the science we see practiced today. If the research confirms the hypothesis, we hear about it. If the research supports a contrary hypothesis, especially one that may be politically incorrect, those findings may never be published.

The Longer View

The reason scientific research is performed is to explain the way observable nature behaves. I would also add that a great benefit of that knowledge is to improve things we currently do. It has been said that the thing that distinguished Thomas Edison from other inventors was that he always had a keen sense of how science could help people by improving their lives. Edison also had a good sense of business as well. He knew when to stop research and build the project.

In testing, the initial goal is to find defects. However, that is a short-sighted view and fails to make the best of the resources that have been expended on creating and fixing the defect. The longer view of testing is to build ways to prevent similar problems in the future by improving the processes used to build the product.

I believe we are far from seeing software testing performed as a scientific process, but it gives us something to think about and relate to, especially when it comes to evaluating the rigor and reliability of test results.

Conclusion

There are many points in common between software testing and traditional science. In fact, software testing may be closer to a science than to anything else we can relate. These similarities can be helpful in understanding testing and explaining testing to others. The similarities also provide a benchmark of how rigorous a process we are using in defining and performing testing processes. Although not every test will need to be performed at the rigor of scientific research, some tests need to be performed at that level because of high risk.