Friday, April 17, 2009

Some Thoughts About Measuring Software Complexity

I have learned from J.D. that I should try to drop down my thought as much as I can or else I will forget about it… so here it goes.

At the end of the CSE P 503 today, we went into the a heated discussion about measuring complexity in software design.  I was pretty tired so I mostly listened.  And Dr. Notkin pointed out, the complexity is deduced by the ability of human to understand the model, and therefore, complexity in software is not measurable because there is no way to quantify how well human understand the code based of artificial metrics.

I agree: quantifying a complexity in already simplifying it and the the interactions in the entities of the model is inherently complex, simplifying it would mean over-emphasizing some elements while bypassing some other vital relationship, and the resulting metrics is probably not generalizable.

I disagree:since complexity is an approximation of how difficult is it for human to understand the model, I think that we can deduce the complexity based on some essential cognitive characteristics.

That’s similar to Bounded Model Checking – we don’t need to measure the whole complexity, but just one or two important factors that affect the complexity.

My suggestions for two metrics:

1. No. of statements in a method calls

Human likes sequences.  That’s why we creates step-by-step procedure, and prefer imperative programming over functional programming.  However, humans also have this limitation of simultaneously remembering only a small (7 in general) things in the brain.  That’s why we prefer short list over long list.

One measurement of complexity is to check the overall LOC in a method: in general, we prefer 30 methods that contain 30 statements over 3 methods with 300 statements.  The scope of each method should be small because it is easier to understand. And having 30 binary relations (method Foo() = doing-this-and-that) is actually a good thing.  We love binary relations.  I will explain why next.

2. No. of interactions in a method invocation

Binary relations are the backbone of our interpretation of the physical world.  It is intrinsic in our language! See! We go shopping; you love this movie; I sell fish!  Binary relations are the easier relation to understand.  However, complexity arise when we try to introduce ternary, or n-aray relations: I go to a store this morning! Or, in a more familiar software dialect: a = b.Foo(c, d.E, f.bar(g) + i).  It is just a lot harder to figure our the relationship between the entities in the relationship now.

Therefore, I suggest we measure the amounts of n-arary relations in the code. More binary/ternary relations there are, lower is the complexity is.

Of course, when you try to quantify complexity, esp. averaging it or putting it into a 1-10 scale, the number basically lost its meaning because complexity is judged between a super simple model (1) to a ultra-complex model (10), which don’t actually exist. But estimating the complexity of a system based on these two criteria, I argue, is a workable solution.

No comments: