Wednesday, April 22, 2009

SortedList, SortedDictionary, vs LinQ

I had a situation at work today where I need to sort a Dictionary with string as key.

The keys in the Dictionary are in this order: “Title 1”, “Title 2”, “Title 3” etc.

I cannot use the normal framework SortedDictionary because if you sort by key as string, the items in the SortedDictionary will be sorted according to the unicode value…

e.g. you will get something like “Title 1”, “Title 10”, “Title 2” etc… totally undesirable…

Since I will need to write my own IComparer to do the custom comparison, I wonder if using Linq will provide a better solution.

So I spent probably an hour as work trying to create the query but I realized again that I am not that familiar with Linq (mostly about how to use lambda expression properly with Orderby.  s I dropped the work, but then when I came back to work on the code later at home, it all became clear:

Here is the code:

 

static SortedList<string, double> FrameworkListSort(
Dictionary<string, double> columnWidthDictionary)
{
return new SortedList<string, double>(columnWidthDictionary, new TitleComparer());
}

static SortedDictionary<string, double> FrameworkDictionarySort(
Dictionary<string, double> columnWidthDictionary)
{
return new SortedDictionary<string, double>(columnWidthDictionary, new TitleComparer());
}

static IOrderedEnumerable<KeyValuePair<string, double>> LinqListSort(
Dictionary<string, double> columnWidthDictionary)
{
return columnWidthDictionary.OrderBy(
column => /*column.Key);*/
{
int rank;
Int32.TryParse(column.Key.Substring(sTitle.Length).Trim(), out rank);
return rank;
});
}

static IOrderedEnumerable<KeyValuePair<string, double>> LinqListSort(
IEnumerable<KeyValuePair<string, double>> columnWidthDictionary,
string criteria)
{
return columnWidthDictionary.Where(targetColumn => targetColumn.Key != criteria).OrderBy(
column =>
{
int rank;
Int32.TryParse(column.Key.Substring(sTitle.Length).Trim(), out rank);
return rank;
});
}



Although the Linq code is not as simple as a constructor, the I know that using Linq is superior performance-wise compared to SortedDictionary



However, there’s only one way to test it out, so I created a 100,000 items Dictionary with “Title x” as string key, and then compared the time it takes to sort created a sorted list using the three algorithms.  There is the result:



sortedlist-perf



While it isn’t too surprising that using Linq yield a “faster” solution, I was surprised that element removal using Linq (since it requires looping via the whole IEnumberable structure) is not worse than SortedDictionary.  But SortedList is still faster – probably because it only needs to remove the one element in a linked-list, instead of reorganizing the whole b-tree in SortedDictionary case.



So for sorting: IOrderedEnumerable >> SortedList > SortedDictionary



For element removal: SortedList > IOrderedEnumerable > SortedDictionary.



Here is the test code, so you may try it at home :)



static void Main(string[] args)
{
const int numItems = 100000;
Random rand = new Random();
Dictionary<string, double> columnWidthDictionary = new Dictionary<string, double>();

for (int index = 1; index <= numItems; index++)
{
columnWidthDictionary.Add(String.Format("{0} {1}", sTitle, index), rand.NextDouble());
}

Stopwatch watch = new Stopwatch();

watch.Reset();
watch.Start();
SortedList<string, double> sortedColumnWidthDictionary =
FrameworkListSort(columnWidthDictionary);
watch.Stop();
Console.WriteLine(String.Format("Time elasped (FrameworkListSort) : {0} ms",
watch.ElapsedMilliseconds));

watch.Reset();
watch.Start();
SortedDictionary<string, double> sortedColumnWidthDictionary2 =
FrameworkDictionarySort(columnWidthDictionary);
watch.Stop();
Console.WriteLine(String.Format("Time elasped (FrameworkDictionarySort) : {0} ms",
watch.ElapsedMilliseconds));

watch.Reset();
watch.Start();
IOrderedEnumerable<KeyValuePair<string, double>> sortedEumerableColumns =
LinqListSort(columnWidthDictionary);
watch.Stop();
Console.WriteLine(String.Format("Time elasped (LinqListSort) : {0} ms",
watch.ElapsedMilliseconds));

watch.Reset();
watch.Start();
//sortedColumnWidthDictionary.Remove(sortedColumnWidthDictionary.Keys.Last());
sortedColumnWidthDictionary.RemoveAt(sortedColumnWidthDictionary.Count-1);
watch.Stop();
Console.WriteLine(String.Format("Time elasped (SortedList + Remove last item) : {0} ms",
watch.ElapsedMilliseconds));

watch.Reset();
watch.Start();
//sortedColumnWidthDictionary2.Remove(sortedColumnWidthDictionary.Keys.Last());
sortedColumnWidthDictionary2.Remove("Title 1");
watch.Stop();

Console.WriteLine(String.Format("Time elasped (SortedDictionary - Remove last item) : {0} ms",
watch.ElapsedMilliseconds));
watch.Reset();
watch.Start();
sortedEumerableColumns = LinqListSort(sortedEumerableColumns, "Title 1");
watch.Stop();
Console.WriteLine(String.Format("Time elasped (LinqListSort with criteria for removing last item) : {0} ms",
watch.ElapsedMilliseconds));
Console.ReadLine();
}

Friday, April 17, 2009

Some Thoughts About Measuring Software Complexity

I have learned from J.D. that I should try to drop down my thought as much as I can or else I will forget about it… so here it goes.

At the end of the CSE P 503 today, we went into the a heated discussion about measuring complexity in software design.  I was pretty tired so I mostly listened.  And Dr. Notkin pointed out, the complexity is deduced by the ability of human to understand the model, and therefore, complexity in software is not measurable because there is no way to quantify how well human understand the code based of artificial metrics.

I agree: quantifying a complexity in already simplifying it and the the interactions in the entities of the model is inherently complex, simplifying it would mean over-emphasizing some elements while bypassing some other vital relationship, and the resulting metrics is probably not generalizable.

I disagree:since complexity is an approximation of how difficult is it for human to understand the model, I think that we can deduce the complexity based on some essential cognitive characteristics.

That’s similar to Bounded Model Checking – we don’t need to measure the whole complexity, but just one or two important factors that affect the complexity.

My suggestions for two metrics:

1. No. of statements in a method calls

Human likes sequences.  That’s why we creates step-by-step procedure, and prefer imperative programming over functional programming.  However, humans also have this limitation of simultaneously remembering only a small (7 in general) things in the brain.  That’s why we prefer short list over long list.

One measurement of complexity is to check the overall LOC in a method: in general, we prefer 30 methods that contain 30 statements over 3 methods with 300 statements.  The scope of each method should be small because it is easier to understand. And having 30 binary relations (method Foo() = doing-this-and-that) is actually a good thing.  We love binary relations.  I will explain why next.

2. No. of interactions in a method invocation

Binary relations are the backbone of our interpretation of the physical world.  It is intrinsic in our language! See! We go shopping; you love this movie; I sell fish!  Binary relations are the easier relation to understand.  However, complexity arise when we try to introduce ternary, or n-aray relations: I go to a store this morning! Or, in a more familiar software dialect: a = b.Foo(c, d.E, f.bar(g) + i).  It is just a lot harder to figure our the relationship between the entities in the relationship now.

Therefore, I suggest we measure the amounts of n-arary relations in the code. More binary/ternary relations there are, lower is the complexity is.

Of course, when you try to quantify complexity, esp. averaging it or putting it into a 1-10 scale, the number basically lost its meaning because complexity is judged between a super simple model (1) to a ultra-complex model (10), which don’t actually exist. But estimating the complexity of a system based on these two criteria, I argue, is a workable solution.

Software disasters: are inevitable, and will we see more in the future? Part I

I know that I have abandoned the blog for a long time, and yes originally it was supposed to be a blog about code.  But I sort of moved to my Windows Live Space blog and vacated this one.

I’m not sure if there is anyone who reads the blog now, but if you do, I still think it is kind of cool to share my thoughts with you here.

The 3-parts essay is originally a homework assignment submitted for CSE P 503, taught by Dr. David Notkin.  To be honest, I don’t need the grade anymore, but Dr. Notkin is someone that I admire and I just want to be a student of his class, so that I can learn from him.

The essay is about analysis of modern software disasters.  I expanded it a little to incorporate my thoughts about the complexity of the disasters has evolved in the past few years.

Enjoy.

We have also arranged things so that almost no one understands science and technology. This is a prescription for disaster. We might get away with it for a while, but sooner or later this combustible mixture of ignorance and power is going to blow up in our faces.  --- Carl Sagan

The first question is so incredibly naïve that I almost feel ashamed of answering it. On average, a program with > 512KLOC has 4-100 errors per KLOC [1], and no matter how aggressive the test and how robust the application behave, we can never be sure if the program will always run correctly. Incompatible platform (Windows drivers in Vista OS), regression caused by upgrade (Ariane 5 crash and AT&T outage in 1990)… the possibilities for failures are innumerable…

The second question, however, worth more of our consideration. As the software engineering discipline slowly maturely, we have devised new methodologies and tools such as Test-Driven development, coverage code tools, and model-checking etc that reduces the code defects in the system and also the possibility of disasters. We shouldn’t deny that these tools and conventions improves the quality of the code written by us, but does it guarantee that we will see less software disasters in the future. Any seasoned developer would give a moderate and sensible “probably no”. However, unseasoned and naïve as I am, I give this some thoughts, and I came to a horrid conclusion..

We are bound to see a lot more software disaster. A lot more…

What is a software disaster?

Disaster (n.): a sudden calamitous event bringing great damage, loss, or destruction; broadly : a sudden or great misfortune or failure – Webster Dictionary

Generally speaking, we assume that software disasters are calamitous events that are caused by code defects that result in the loss of financial resources or human lives. Therefore, in the postmortem of these calamities, we focus our efforts in discovering the root cause (the bug) in the code, and seek to provide solutions that prevent and detect these types of defects. The crash of Ariane 5 (caused by number overflow error) and the Patriot missile failure to intercept Scud missile during the first Iraq war [2] (bad floating point arithmetic) are two prominent examples of these types of errors. There are many more of these software disasters, and I’m sure that my fellow classmates, who are smarter and also more experienced, would have dissected the cases and offered their own insights.

Indeed, we now have better programming model and hardware and also methodology to deal with these problems. But “disaster” carries a unique meaning for different people – small and trivial incidents that aren’t noticeable by the majority of us maybe disasters for some. For instance, the AmazonFail event, in which some items in the Amazon database were incorrectly flagged as “adults”, caused many gay-themed books had disappeared from Amazon's sales rankings and search algorithms. As a result, many publishers and authors lost sales, and Amazon was confronted with a PR disaster as initially the action was perceived as anti-gay by the public. Early in the year, Facebook also introduced a privacy flaw during their upgrade, which allowed users to gain access to portions of their friends’ profile that they should not have been able to see. Both incidents occurred no more than three months apart and although most of us may consider them minor, the implications may be disastrous for some.

It is my intention that, as most of us are concentrating our analysis on the causes of the past software disasters (rightfully so and excellently done), I want to take a step back, compare a few “disasters” in the distant past and the recent past, and give my own thoughts about how software disasters have evolved. So first, let us look at a well-studied case, about a software disaster in the distant past.

 

During the Good Old Days of AT&T…

n 1990, I was only 8-years-old, and for the majority of us still primarily communicated using phones. But on January 15th 1990, for 9 hours, an estimated 60 thousand people lost their AT&T long distance service. Since it also affected the 800-numbers calls, hotels and car rentals lost sales, and it was estimated that it cost AT&T $60 to $70 million in lost revenues [3].

The cause was of course, inadvertently, due to a bug in the AT&T phone switch software that was introduced during an upgrade.

Back in the days, it took around 20 seconds for a long distance call to get connected. In order to improve the latency, AT&T deployed new software in #4 ESS switching system, which intends to reduces the overhead required in signaling between the switches by eliminating a signal that explicitly indicates if a node is ready to receives traffic, and reduces to the time between dialing and ringing to 4 seconds. The software was deployed to all 114 computers in the AT&T network.

Then, on January 15th 1990, one of the switches in New York started signaling “out-of-service”, and rebooted itself and resumed service. A second switch would receive the signal from the first switch and start processing. However, when another signal from the first switch arrived when the second switch was busy, the signal was not processed properly, and the second switch also signaled “out of service”. One by one, the “out-of-service” signal propagated to the neighboring switches, and eventually to all 114 switches, because they all deployed the same software.

The intermittent “out of service” signals blocks some of the long distance calls. Eventually, the engineers discovered then a switch would signal “out-of-service” when another signal arrives too soon. By reducing the workload of the network which increases the latency between signals, the erratic behavior eventually disappeared from the network. But during the blackout period, it was an estimated that more than 5 million calls were blocked [4].

But the question remains: what causes the switch to erratically signal “out-of-service”? To answer the question, let’s look at a snippet of C code in the #4 ESS switching system that is used for processing incoming signal:

/* ``C'' Fragment to Illustrate AT&T Defect */
do
{
switch (expression)
{
// some code...
case value:
if (logical)
{
//sequence of statements
break;
} else
{
//another sequence of statements
}

//statements after if...else statement
}
//statements after case statement
} while (expression);

The root cause of the disaster is caused by nothing more than the break statement being placed at a wrong location. When the break statement is reached in the code, it would prematurely cause the program to break from the enclosing switch statement. However, this sudden termination was not anticipated – the programmer did not realize that the break statement would terminate the enclosing switch statement. The result of the termination would crash the switching node, causing it to send an “out-of-service” signal to its neighboring nodes (5). The “out-of-service” propagated between nodes and eventually crashed the system.

The cause of this disaster is absurdly simple: the programmer simply misunderstood or overlooked the effect of break statement in the code. In this particular case, some case studies also suggest that a comprehensive code coverage test would also reveal that a portion of the code is not executed due to the misplaced break statement, (and apparently ignoring the fact that the code coverage as a testing technique only began to get popular in the 1988s [5]). Therefore, like most typical software disasters that we studied, we can blame:

  • Programmer,

  • Inadequate testing, since they should’ve caught it, 

  • The management, well only if they weren’t so aggressive and provide enough time for the programming and the IT department to test and deploy the software in a timely manner.

For most modern software disasters, any analysis or case studies will inadvertently return to these causes and blame the same people. However, even if we have smart programmers who write code carefully, comprehensive test suites and smoke test for the application, and management that set the robustness and the security of the software as the most important criteria for shipping, does it mean that we could largely avoid software disasters? No, we won’t, and that’s not only because we aren’t perfect…

To be continued…