I received a question recently from one of my web site visitors that read,
"Your site, by far, has explained most of what I needed to communicate about UAT to my business partners. However, I cannot find anywhere ... ANYWHERE ... tangible information such as what percentage of bugs are caught during UAT... tangible information to justify UAT testing. ... This piece of information, actually, will be significant in how my UAT Dept will function as it relates to the business users."
Here was my answer:
"Thanks for writing! What I say here may shock you, but I think it makes an important point about the role of UAT.
When a project goes well (including the phases of testing such as unit, integration and system testing), ideally UAT shouldn't find a lot of defects. UAT is one of the worst times to find defects because of the cost and risk of fixing them just before system implementation.
UAT is validation, which means that it is more concerned with assuring that the system will support system processes. In this context, a defect might be where the system can't perform a function because it wasn't designed or built to do so. This typically indicates a problem with requirements, which other forms of review and test should have caught much earlier.
So...when showing the value of UAT I focus on 1) the assurance that the "right" system is being built or bought, 2) the involvement of users in the project, especially in the testing process, and 3) the planning of UAT can start almost at project inception, which means that people can start mapping out business processes about the same time as requirements (I often see these as use cases), which greatly aids in the development process.
One of the problem with industry averages, such as defect counts per phase is that organizations just don't measure things very well and when they do, they tend to guard that data."
This is one of those things in software testing that can really get derailed depending on your management's understanding of testing. If management is depending on UAT to be the main line of defense in finding defects, then it's very likely this phase of testing will not truly be UAT. It will be some type of system testing/UAT hybrid that is essentially the "big bang" approach to testing.
This is very risky because all the defects (and therefore, the risk as well) are saved until the end of the project.
Think about what is going on in the project at this point:
So, with all this going on, let's say with about six weeks left before the deadline that you discover a major defect - a real showstopper. For example, you run a performance test that reveals only four people can access the system concurrently for a department of forty people. You have one option - get it fixed or else the project fails!
What if the only option is to spend an extra one million dollars on a more powerful CPU? Actually, there is another option - a total system redevelopment effort. However, the deadline will be missed by months, maybe years.
This actually happened on a project I was a part of. You can read the case study at http://www.cio.com.au/index.php/id;1141181339.
My point in following this train of thought is that the problem with saving all the risk and the defects until the end of the project is that the problems may not be solvable. The role of UAT was never designed to be the place where major defects are found.
In fact, I would say that if you are finding major defects in UAT, your development and testing processes are broken.
Metrics for the Value of UAT
Defect Detection Percentage (DDP)
One of the best and most commonly mentioned metrics for test effectiveness is Defect Detection Percentage (DDP). This metric is obtained by dividing the number of defects you find by the number of defects found by everyone (including the customers and users) over the life of the release, multiplied by 100. So, this metric is a percentage. You can easily measure this by phase of testing, such as system testing or UAT.
Ideally, using the logic expressed earlier in this article, this percentage should be fairly low. However, a low DDP could also mean that testing wasn't very effective. To know for sure, you would have to measure DDP throughout the project.
Another helpful metric would be the percentage of process coverage. This would measure how many business or operational processes have been validated. The problem with any type of coverage metric is that just because something has been tested, doesn't mean it has been tested well.
Acceptance Criteria Coverage
If you have defined acceptance criteria, this should be an easily obtainable metric and should be close to 100%. It's computed as simply the number of acceptance criteria tested divided by the total acceptance criteria, multiplied by 100.
Usability metrics are also helpful in assessing the value of UAT. Surveys and usability test scores can help determine the usability of the software under test. High scores can be a good validation of design, while lower scores would indicate that usability factors aren't being considered. The value of UAT would be to identify software that is hard to use before it goes to the customer. The really bad thing is that usability is designed into the software, not patched in. However, there are some usability issues that could be fixed with some minor changes.
Process Gaps Found
I would finally like to propose my own metric of "process gaps found." In this metric, you are measuring the number of defects that relate to situations where the system under test does not support the processes performed by the organization. These are truly the things you are looking for in UAT and would constitute a "good find." However, like other defects, it indicates a development and QC process earlier in the project. In other words, "How did we design and build the system in such a way as to miss this gap?"
Did we miss speaking to the right people in requirements, fail to document requirements correctly, etc.?
I hope this article helps as you assess the effectiveness of your user acceptance testing!