Wednesday, April 24, 2013

Give Scotty his Due


People are bad at making long-term planning estimates.

People are -really, really- bad at making long-term planning estimates.

And just to be clear, I am definitely included in the set of 'people'.

And when I say 'long-term planning estimate', I am specifically talking about the estimate of how long a complex task will take. And by that, I don't mean something like writing a 3-2 tree or re-writing malloc. Those are complex tasks, sure, but there's lots of help available for them, and we already have metrics for similar things.

I'm talking about the kind of long-term estimate that we are so often asked to make in industry. How long will this big semi-specified project take to finish? Or even how long will this fully specified project take to finish?

How often do you hear about a project releasing on time, on budget?

To be fair, there is a certain amount of confirmation bias going on. We don't hear about the on-time on-budget projects as much because they're deemed unremarkable and not news worthy. That's the way things are -supposed- to work, and when things work as designed, there's very little fanfare.

However, any number of us are able to think of projects that have gone massively overtime and overbudgets. Promised big, delivered late.

I'm looking at you, Half-Life 2 Episode 3. Or whatever it is you're going to be called when you're finally released. And for software which is known to be not vaporware, Duke Nukem Forever, which was in development longer than the moon program was.

Granted, these are two extreme cases, and in the case of Duke Nukem Forever the developers seemed firmly intent on shooting themselves in the foot, but we all know of dozens of other properly managed software projects that have both gone over time, and over budget.

What's going on? Why are we so bad at making estimates?

I think we're consistently answering the wrong question.

To get a correct estimate would require a rather large investment of effort. We'd need to research how other, similar projects have fared. We'd need to look deep into the domain in which we're developing, because software often requires you to have two sets of knowledge: Knowledge of programming, which we have, and knowledge of the field we're programming for, which we need to learn on the fly. For example, it'd be bad form for me to try to program even something as simple as a restaurant POS system without having some idea of what their needs are and what their use cases look like. So now I need to factor in research and learning time. Then we need to determine how long testing will take, which again should be done not by just sitting and thinking about it briefly, but instead by comparing how long testing has taken for similar projects.

That's a lot of work. And that's before we get to all to common problems like funding hiccoughs, software and hardware issues, or spec changes.

You know what's easier? Determining about how confident I feel that I can accomplish this task, and multiplying it by how complex I think the task is. Then making an estimate based on -that-.

I think that's the question we're actually answering when we give a time estimate. Not how long we think it'll take - that question is hard, and very possibly can't be answered correctly. But I absolutely know how confident I feel and how complex this looks.

So what's the fix?

Not making estimates might be a good start. But management hates that, and that's perfectly reasonable. The business world is not made on good will and best wishes. Doing all the research necessary to make a good estimate is a nice second. But the company might not wish to pay for the effort required there, and if you're doing something truly novel, there may not be any projects similar enough to compare against. Also, unquestionably, we all think we can do better than our peers, statistics be damned.

i just give Scotty his Due.

Scotty, of course, is the engineer of the USS Enterprise, who rather famously would multiply his time estimates for how long something would take to get done. He knew Kirk would always tell him he had half that time to do it. So he'd give estimates -four times- larger than what he actually thought it'd take him. If he got it done on time, well, good, no sweat. However, when the captain ( manager for us ) cut him back, he still had slush room. And he -still- had slush room after that. The captain chopped by half, he over-estimated by four, so he still had double the amount of time he thought he'd need. And of course, being a dramatic entity on a hit sci-fi show, he always got things done either just-in-time for dramatic effect, or under time, and could be called a 'miracle worker'.

We're not Scotty, but his methods just might work for us. Take your estimate. Add an order of magnitude. Then multiply that by four. Give Scotty his due. Maybe that estimate will actually shake out.

Wednesday, April 17, 2013

Compiled versus interpreted


Some languages ( C, C++ ) are compiled; others ( Python, Common Lisp ) are interpreted.

What's the difference?

It's worth noting that any language can be compiled -or- interpreted. All the interpreted languages I'm familiar with also have compilers. I've heard that there is a C interpreter out there, but that making a sane C++ interpreter is difficult, possibly impossible. Still, there -are- interpreters out there for what are traditionally compiled languages.

Also worth noting up front is that the end result is the same, either way. Your code-as-written will be broken down, parsed, turned into machine code, and then run by the computer's CPU.

So, back to the question. What's the difference?

A compiling language will produce an executable by default. The compiler will come along, analyze your code, make a symbol table where it will keep track of names of things, datatypes, and so on, and then it will create a whole bunch of machine code. This file that's produced can then be run by the OS later. Since your code was compiled, the symbol table is really easy for the computer to keep track of - when it runs across a symbol, for example, a function or variable name, it knows where to look for it.

For an interpreted program, life is a little more difficult. A lot of things that are handled by the symbol table now have to be handled on-the-fly by the interpreter. On the other hand, interpreted languages can use dynamic typing, which is a boon; dynamic scoping; and tend towards being more platform independent ( Java, for example, compiles to bytecode, which is then interpreted by the Java virtual machine, and is famous for its 'write once, run anywhere' manifesto ). They also have their downsides; the main disadvantage is that they have a bit more overhead than a compiled language. With a compiled language, the grunt work of turning your code-as-written into machine code is done at compile time, and then the compiler exits. It's no longer in memory space, taking up resources. An interpreted language, however, needs its interpreter. Otherwise, there's no machine code, and nothing gets done.

So, long-story-short, the interpreted languages need an interpreter. The compiled languages get compiled, but don't need the compiler once they've been made into an executable. Either way, your code is still getting turned into machine instructions for the CPU, there's just a difference in how those instructions get there.

Tuesday, March 5, 2013

My Workspace

I was asked by someone about my workspace. This is kind of an opinion piece. My way is certainly not the One True Way to set up a coding environment, and shouldn't be taken as such. Hell, even for -me- it's not a One True Way. It's possibly not even a -Better- way. It's just a way that lets me get things done without wanting to punch the machine.

So, first, platform. If I'm on a Linux box, my work gets done at the command line, all the time every time. I still prefer to have a GUI with a desktop and graphical applications so I can Google for something real fast or distract myself for a few hours with that neat marble game, but actual coding work will be done at the terminal. I'm fond of Ubuntu's tabbed-capable terminal. I'll almost always make that sucker super big, make a tab, get one tab to where I want to be in the file system and the other one will be used to run my current editor of choice. For most languages, I'll use vim. If I'm coding in Common Lisp, I'll use emacs, simply because the emacs/SLIME combination is, for my purposes, extremely hard to beat.

On Mac, I use a hybrid approach. I'll have the terminal open to where I suspect I'm going to be doing a lot of compiling and running of code. For actual editing, well, this really depends on my mood. If I just need to make a small, fast edit, I'll use vim from the command line. If I'm working with Common Lisp at all, again, emacs is the tool of choice. If I'm doing major work, Sublime Text 2 has been amazing. I don't even fully utilize all of the stuff it can do. I just really like its slick appearance and some of its capabilities, like highlighting a whole bunch of the same keyword all at once and then changing them all at once, and regexp search, and a few other things it does well.

On Windows, well, I don't typically code on Windows. When I do, I usually struggle along with Visual Studio. I am not saying anything bad about Visual Studio, in fact, it seems to be a really slick tool that would work fantastically if I'd just sit down and take the time to get really familiar with it. I don't, so I find it slightly annoying to work with.

You may have noticed that unless I'm on Windows, I'm not using a dedicated IDE. Yes, I know vim can be made to be very much its own IDE, but I haven't done that yet, so it doesn't count. It's not from a lack of trying. I certainly find myself in Visual Studio often enough, and I tried Eclipse for a few months, as well as XCode. The problem with all these tools is that I don't really want to learn a new tool. I get frustrated by starting a new project in an IDE. Also, Eclipse was really, really slow on my Mac for reasons I didn't investigate.

Also, part of me absolutely loves only relying on tools which are almost universally accessible. Sit me down in front of any given Mac or Linux box, and I can code, without feeling frustrated about my favorite tool not being available. That's -neat-.

It's worth noting that almost all of my work to date involves either command-line apps, or a handful of OpenGL programs using SDL. If I was to do more work which required frameworks, or which had to talk to a specific platform's GUI API, I would probably pretty quickly adapt to using an IDE. In particular, I've played with QT Creator in the past, and I really liked that.

What are the benefits to my workflow, if any? Well, it works for me, and I can work pretty fast this way. Which should really be the goal of any work environment. If it lets you do the work you need to do, it's a good work environment.

Wednesday, February 27, 2013

Dates on a computer

Dates are one of those things in Computer Science that keep finding new and more entertaining ways to behave really badly. So here. Read this XKCD comic, and then everyone, please just follow the ISO standard. Hell, if everyone follows the ISO standard, everyone can keep doing that annoying thing where they store a date in an integer. That'll work okay until well after my lifetime is over. Note that other habits for storing time as a single integer don't sort as well ( the ISO standard, take out the dashes, sorts perfectly fine using the standard integer comparators ), or they have edge cases that'll behave oddly.

For example, if you stored dates as month-day-year, and you need to store January first, 2001, let's first convert that to its numerical equivalent in month-day-year: 01-01-2001. To get your integer, remove the dashes.

That leading zero gets lost. If you're using -three- ints to store the date ( a distressingly common thing I see ), -two- leading zeroes get lost in conversion to a single integer, and we wind up with 112001. Which I don't know what that is when your custom date format object gets passed to my code.

What I'm asking for, is if you're going to be sloppy about your date formats, store them in a single int, in the ISO format.

Though what I -really- want is for your date formats to actually be a robust first-class object in your system, but I understand that's a pain to code for.

If you decided to store dates as the number of seconds on your system, well, okay, that's fine, I can work with that. Please don't ignore things like the 2038 problem. ( Hint: If you're going to count seconds, use a bigger integer )

Wait, what's the 2038 problem?

Okay. Well, assuming a single 32 bit block of data is storing time information, and assuming you're working on an architecture that assumes the beginning of time is 1970-01-01 at 00:00:00 UTC, ( so, pretty much all Unix-based systems ), AND you're storing time as the number of seconds since this beginning of time, the last date that can be recorded correctly is 2038-01-19, at 03:14:07 UTC. At that point, the integer will overflow, and it'll be 1970 again for everyone.


Note to actual professionals reading this blog: I'm still a college student. These are problems I deal with. Somebody tell me better in the comments.

Trivia point: if we're counting the number of years since 'the beginning of time', last year was year 42. I hope everyone remembered their towels.

Tuesday, February 19, 2013

Late

Update will be late. I'll post something about my workflow later this week. Also, an announcement that's only exciting to me, but is really exciting for all that.

Tuesday, February 12, 2013

Technical Debt

This post is not about money.

Any serious programmer who has been programming for any reasonable period of time, whether as a hobbyist amateur or making big money at the big company, will eventually have to make this decision:

Accomplish your task doing something dirty -now- and fix it later, or do it the correct way ( whatever your idea of the correct way may be ), taking longer, possibly even taking up time that is unavailable. And in almost every situation, the temptation to do it dirty and do it now is overwhelming.

Doing it dirty and doing it now has the advantage of instant gratification. You can see the results of your work sooner, get immediate benefits, and spend time on something more interesting or less vexing. Of course, you’ll fix it proper later. Maybe during a maintenance cycle or during code review or in the next patch or when you revisit this program. Whenever it is, it is usually in the indeterminate and unplanned future but we as programmers are -certain- we’ll fix it.

Of course, we never fix it. We’ll forget how we programmed our quick and dirty fix in a week, forget the problem domain in a month, and just plain forget the whole program in a year. Either we will move on to another project, or the demands of this project will always stay high enough that we are never able to quite get back to fixing our quick and dirty code.

And in a high entropy environment, one quick and dirty fix becomes two becomes many. And we never get back to create those proper solutions we’ve always dreamed of. This is the technical debt. Taking out a loan in time now, and then never paying it back. The debt is incurred typically in code understanding or maintenance, and the quick and dirty fix can wind up costing us far more time than creating the proper solution in the first place.

It’s important to try to remember this. The technical debt for most projects will be incurred, sooner or later. It’s hard to keep in mind during crunch time or during finals or when you -just want to draw a box on the screen, dammit-, but that technical debt will tend to come back. Pay it now, and pay it forward. Don’t fall into the trap.

As a final note, though, moderation must be exercised here. While a quick and dirty fix now is almost never worth the end cost, a good solution now is often better than a perfect diamond solution, depending on what you are working on. Good judgement that comes with experience will help a lot here, but given the choice, I would recommend most programmers try for perfection rather than settle for quick and dirty.

Wednesday, February 6, 2013

Programming Paradigms

This is a placeholder for some terminology.

There are some ‘big’ programming paradigms that I’m familiar with. Procedural, functional, and object oriented are ones I have hands-on experience with. There’s also imperative versus declarative, and the idea of reflection, and macros, and... well. Programming languages tend to support one or more of these paradigms, and that, in turn, will affect the way in which a coder will code. Since production languages tend to be Turing complete, you can technically force any language into any paradigm. Having said that, programming languages are tools, and just like real tools, while you can force them into tasks they weren’t designed for, it’s probably better to just change tools.

Real fast, for those who aren’t programmers, Turing completeness describes a machine that can perform any calculation. You can imagine a machine that can read symbols that are printed on an infinite ribbon, and can interpret those symbols to carry out an arbitrary number of instructions, as well as modify those symbols. For practical purposes, if a programming language has an IF construct, and can read from, write to, and modify memory, it’s Turing complete.

IMPERATIVE programs are programs that can be described as a series of orders. Imperatives, if you will. PRINT a number, GOTO a line number, do this, do that. You can think of it as marching orders for the machine. Procedural and object oriented programming tends to fall in this category.

DECLARATIVE programs just describe what the program should do, but not a thing about how it should go about doing it. Functional and logical programming languages tend to enter this paradigm, as well as database languages such as SQL.


PROCEDURAL programs are the ones I am most familiar with. They have a list of commands to be followed, and procedures which can be called. C and BASIC are procedural programs, with the procedures that can be called being known as ‘functions’ in C, and ‘subroutines’ in BASIC. I think BASIC also has functions, but it’s been so long since I used it, I don’t really recall.

FUNCTIONAL programs are called that because they can be thought of as running a series of mathematically provable functions as programs. Haskell is an example here. They are noted for their lack of side effects, which is a way of saying that they don’t modify states like procedural/imperative languages do.

OBJECT ORIENTED programs have, well, objects. These objects tend to have a number of characteristics, such as encapsulation, and message passing. An object will tend to have a public interface, which is a way of passing commands to the object. This object will then have an implementation, which code outside the object doesn’t need to concern itself with. Objects often can also pass messages to each other, and act on these messages. C++ and Java are object oriented languages.

Hopefully this is clearer than the relevant Wiki pages.