“The objects of sense exist only when they are perceived; the trees therefore are in the garden [...] no longer than while there is somebody by to perceive them.” – George Berkley, in A Treatise Concerning the Principles of Human Knowledge, 1734

I’ve often wondered, if Berkley was a software engineer instead of philosopher, how he would have viewed the world differently… Software engineers invariably believe that bugs which haven’t been perceived yet are indeed bugs. Perhaps it’s because of the the tacit knowledge and experience software engineers have that, even though bugs may not have surfaced yet, they will – and  usually at the worst possible time.

Philosophical musings aside, bugs are omnipresent in the life of all software development managers.  One of the things I like best about software engineering is that everyone plays the game, but no one really wins. 

Bugs, to a large extent, also level the playing field.  Some years back I recall a project run by a large financial services company.  The budget for the project easily eclipsed the largest funding of any of my A-round companies.  Hundreds of developers, legions of QA guys, managers, alpha, beta1, beta2, – all the resources money can buy.  They roll the app out, then their biggest customer hits an icon on the landing page and immediately gets a stack trace. The only surprise to me was that anyone was surprised – bugs have no respect for the size of the company or the amount of money spent on a project. 

I also enjoy the lexicon developers have created to describe various bugs. Heisenbug is one of my personal favorites.  An obvious takeoff on the Heisenberg Uncertainty Principle,  Heisenbugs refuse to occur the instant a system is instrumented to study them.  This is usually due to the subtle timing changes introduced by a debugger or logging statements.

My favorite class of bug, however is the bugs that lie dormant in a piece of code for months, years, sometimes even decades before they pick the right moment to show themselves.  To my knowledge there is not a name for these, but perhaps we should call them tripod bugs after the evil machines in H.G. Wells’ War of the Worlds.  These bugs are usually timing bugs raised by faster processers, faster storage (such as SSDs), or radical shifts in system load.

The online payment guys are a good case study for tripod bugs. Every year around the end of October their transaction load steps up. Not only does the increased load persist through the holiday retail season but this tends to become the new plateau for the following year. With this increase in load comes a new wave of bugs, some of them in code that has been deemed stable for a very long time.

Building a test replica of such a system is practically impossible. The cost would be staggering and even then simulating the stochastic transaction load is virtually impossible. Diligence and fast remediation seems the only solution.

On the subject of remediation, some time ago a popular file sharing service introduced a bug that allowed everyone to see everyone else’s data.  This is not intended as a slam against them.  In fact I felt only empathy for their VP of Engineering on what I’m sure was a very long day. I can also easily envision how such a thing happens; the authentication code was disabled during testing and someone forgot to put it back before the app went live. The commendable part was how fast they responded – the issue was resolved 20 minutes after it was discovered.  This underscores the benefit of Cloud resident apps. Deploy in one place and the problem is solved for everyone.

The technologist in me believes that ultimately, technology triumphs over all.  Less than 100 years ago airplanes crashed, television sets required repair people, car engines failed, and tires blew out. While the occurrence of these problems never quite goes to zero, enough decades of smart engineers drumming on them makes them go effectively to zero – as an example, my new sports car doesn’t come with a spare tire.

So it seems reasonable to assume that at some point in time apps will roll out with zero bugs. The only open question is: ‘how will we get there? If I had the answer to that question, I probably wouldn’t be writing this article.

jpeebles
Julian P