I spoke about vulnerability and threat ratings here a couple of weeks ago. What I didn’t really give enough attention to was the testing of this rating system. In short, once you’ve identified a vulnerability you have to somehow communicate the overall risk. This is typically done with assigning some value to impact and likelihood. You read more in my previous post. I think most of us forget a critical component in the development of these systems. We forget to test them.

Come on, we’re experts right? Do we really need to test? What is the value of the test? Well, if we are committed to delivering accurate and realistic information about identified vulnerabilities we must demonstrate that our system can provide that. A good way to provide this proof is to play test. I know, we all think we’ve created the silver-bullet so it feels wrong. Trust me though. If you do not test your new-fangled system the second it gets in the hands of a developer or infrastructure group (or even your own localized security group) and produces different results than what you came up with you will be in trouble.

One of the primary properties of such a rating system is reproducibility (the other, of course, is accuracy). A given vulnerability in the hands of a couple of knowledgeable people should roughly give similar results. If it does not, this should raise some sort of alarm bells. If you are in fact trying to create s system that is coherent and reproducible you you have to make sure it passes the litmus test. In rating systems it is imperative that you attempt to minimize subjectivity and increase reproducibility in the results. The problem of subjectivity is challenging because everyone has their own experiences that affect the process. Add to that the fact that some are limited by what they can conceptualize and results become amazingly erratic. As a good example, a system with a single dimension each for likelihood and probability is nearly impossible to deal with. This is because, and I’ve talked about this before, that there is too much information loaded into a single dimension. It is difficult to justify, for example, a particular value for likelihood. There are too many categories of information loaded into the concept. A simple test of your system will help determine if your rating elements are suffering from category overload. In the end if you approach your system with the goal of objectivity and reproducibility and add in testing at the end I think you’ll have a system that delivers, as much as possible, realistic and accurate information.