Wednesday, April 24, 2013

Give Scotty his Due


People are bad at making long-term planning estimates.

People are -really, really- bad at making long-term planning estimates.

And just to be clear, I am definitely included in the set of 'people'.

And when I say 'long-term planning estimate', I am specifically talking about the estimate of how long a complex task will take. And by that, I don't mean something like writing a 3-2 tree or re-writing malloc. Those are complex tasks, sure, but there's lots of help available for them, and we already have metrics for similar things.

I'm talking about the kind of long-term estimate that we are so often asked to make in industry. How long will this big semi-specified project take to finish? Or even how long will this fully specified project take to finish?

How often do you hear about a project releasing on time, on budget?

To be fair, there is a certain amount of confirmation bias going on. We don't hear about the on-time on-budget projects as much because they're deemed unremarkable and not news worthy. That's the way things are -supposed- to work, and when things work as designed, there's very little fanfare.

However, any number of us are able to think of projects that have gone massively overtime and overbudgets. Promised big, delivered late.

I'm looking at you, Half-Life 2 Episode 3. Or whatever it is you're going to be called when you're finally released. And for software which is known to be not vaporware, Duke Nukem Forever, which was in development longer than the moon program was.

Granted, these are two extreme cases, and in the case of Duke Nukem Forever the developers seemed firmly intent on shooting themselves in the foot, but we all know of dozens of other properly managed software projects that have both gone over time, and over budget.

What's going on? Why are we so bad at making estimates?

I think we're consistently answering the wrong question.

To get a correct estimate would require a rather large investment of effort. We'd need to research how other, similar projects have fared. We'd need to look deep into the domain in which we're developing, because software often requires you to have two sets of knowledge: Knowledge of programming, which we have, and knowledge of the field we're programming for, which we need to learn on the fly. For example, it'd be bad form for me to try to program even something as simple as a restaurant POS system without having some idea of what their needs are and what their use cases look like. So now I need to factor in research and learning time. Then we need to determine how long testing will take, which again should be done not by just sitting and thinking about it briefly, but instead by comparing how long testing has taken for similar projects.

That's a lot of work. And that's before we get to all to common problems like funding hiccoughs, software and hardware issues, or spec changes.

You know what's easier? Determining about how confident I feel that I can accomplish this task, and multiplying it by how complex I think the task is. Then making an estimate based on -that-.

I think that's the question we're actually answering when we give a time estimate. Not how long we think it'll take - that question is hard, and very possibly can't be answered correctly. But I absolutely know how confident I feel and how complex this looks.

So what's the fix?

Not making estimates might be a good start. But management hates that, and that's perfectly reasonable. The business world is not made on good will and best wishes. Doing all the research necessary to make a good estimate is a nice second. But the company might not wish to pay for the effort required there, and if you're doing something truly novel, there may not be any projects similar enough to compare against. Also, unquestionably, we all think we can do better than our peers, statistics be damned.

i just give Scotty his Due.

Scotty, of course, is the engineer of the USS Enterprise, who rather famously would multiply his time estimates for how long something would take to get done. He knew Kirk would always tell him he had half that time to do it. So he'd give estimates -four times- larger than what he actually thought it'd take him. If he got it done on time, well, good, no sweat. However, when the captain ( manager for us ) cut him back, he still had slush room. And he -still- had slush room after that. The captain chopped by half, he over-estimated by four, so he still had double the amount of time he thought he'd need. And of course, being a dramatic entity on a hit sci-fi show, he always got things done either just-in-time for dramatic effect, or under time, and could be called a 'miracle worker'.

We're not Scotty, but his methods just might work for us. Take your estimate. Add an order of magnitude. Then multiply that by four. Give Scotty his due. Maybe that estimate will actually shake out.

Wednesday, April 17, 2013

Compiled versus interpreted


Some languages ( C, C++ ) are compiled; others ( Python, Common Lisp ) are interpreted.

What's the difference?

It's worth noting that any language can be compiled -or- interpreted. All the interpreted languages I'm familiar with also have compilers. I've heard that there is a C interpreter out there, but that making a sane C++ interpreter is difficult, possibly impossible. Still, there -are- interpreters out there for what are traditionally compiled languages.

Also worth noting up front is that the end result is the same, either way. Your code-as-written will be broken down, parsed, turned into machine code, and then run by the computer's CPU.

So, back to the question. What's the difference?

A compiling language will produce an executable by default. The compiler will come along, analyze your code, make a symbol table where it will keep track of names of things, datatypes, and so on, and then it will create a whole bunch of machine code. This file that's produced can then be run by the OS later. Since your code was compiled, the symbol table is really easy for the computer to keep track of - when it runs across a symbol, for example, a function or variable name, it knows where to look for it.

For an interpreted program, life is a little more difficult. A lot of things that are handled by the symbol table now have to be handled on-the-fly by the interpreter. On the other hand, interpreted languages can use dynamic typing, which is a boon; dynamic scoping; and tend towards being more platform independent ( Java, for example, compiles to bytecode, which is then interpreted by the Java virtual machine, and is famous for its 'write once, run anywhere' manifesto ). They also have their downsides; the main disadvantage is that they have a bit more overhead than a compiled language. With a compiled language, the grunt work of turning your code-as-written into machine code is done at compile time, and then the compiler exits. It's no longer in memory space, taking up resources. An interpreted language, however, needs its interpreter. Otherwise, there's no machine code, and nothing gets done.

So, long-story-short, the interpreted languages need an interpreter. The compiled languages get compiled, but don't need the compiler once they've been made into an executable. Either way, your code is still getting turned into machine instructions for the CPU, there's just a difference in how those instructions get there.