Modularity is its own reward

A while back an friend and I were talking about programming. They'd recently taken it up in the job he was doing and were relatively fresh to the discipline. Asking for a code review from a collegue, they'd been told that their Python was pretty good but that it could do with breaking out into smaller functions.

My friend couldn't see the point of this - the code worked, it did the job, what's the problem? I had a crack at explaining why. Didn't do very well.

More recently I was getting some feedback for something small I'd written. It was, of course, awesome - like all the code I write.1 They said they liked it, but the modularity wasn't entirely necessary because the individual modular parts wouldn't be reused.

So this is just a scratch piece to try to explain why I think modularity is the most imortant thing

Oh and the title is an homage to my favourite xkcd comic:

tail recursion is its own reward

Wait, modularity?

When I say 'modular', I mean small and isolated and independent. A class that's a few lines long, the one line method in Ruby, the short function. At a larger level I mean, well, larger small things. A file with one class in it, or one function in it. Maybe I mean microservices. Maybe not.

Look, I mean small things. That's all. But why do I think that they're their own reward? Well, I don't2 - but I think that, most of the time, more good things come out of keeping the code smaller than larger. Such as...

Easier to Test

What sort of tests do you love? When there are hundreds of them - which ones make you happy inside? In my (admittedly limited) experience, it tends to be the ones that run really quickly and don't randomly fail. Sure, the ones that exercise the whole system are nice and necessary, but the ones that make me smirk a little are the unit tests that whizz by in the blink of an eye.

And in order to have those fast little tests, you need small little bits of code to test.

TDD makes us write the tests first - and the easiest tests to write are the ones that cover single, simple ideas that you want to implement. TDD wants us to write small tests that consequently should lead us to write small pieces of code.

Performing TDD produces code with tests - this is a given. But I find that people celebrate this more that what I think the bigger prize is: you have been forced into writing your code modularly, bringing with it other and possibly greater advantages.

Easier to Comprehend

This is probably the most important one. If your code is small and independent, then there is a much higher chance of you and everyone else understanding what it does. If a single function / class / method is longer than a screen, I would go so far as to say that it's near impossible to understand what it does.

If you're programming in small, easy to comprehend3 parts, then there is more chance that you'll be understood - if only because you'll have had to give them names. Possibly bad names, but names all the same. Names you'll be able to read and know what they mean and so what the parts do. Or at the worst, names that you read, don't understand, and then read the code, understand that because it's short, and then rename it with something (hopefully) better.

Easier to Reuse

Yes, reuse is good - it's a good benefit of small pieces of code. You write that Fibonacci function, you can use it everywhere that you need a Fibonacci number. It is part of the wonderful magic of small, independent things.

Whether you do reuse a part of code is often by the by - it can often come later on when you have a better idea about the thing that you're trying to build.

This all sounds familiar...

Look, if you've heard this before then you probably have - hell, I just worked out what I was talking about when I got to the end of writing it. It's basically the first two bulletpoints of the Unix philosophy - do one thing well, and integrate with other programs.

But as we're not writing programs - we're just writing parts of programs - we don't have to worry about the requirement for generalization (that can come later if at all). That integration is eased internally if we make the modules truely independent of each other.

Like Lego blocks - some are small and general (think a 3x4 flat piece), or small and specific (think a laser gun), but they're all small and easy to integrate with each other.


  1. For 'awesome' read 'adequate'. 

  2. Barely anything is its own reward. Maybe pizza. 

  3. From the Latin com together, and prehendere grasp. It's literally easier to hold stuff together when it's small. 

Learning the C Programming Language

Part Two: Types

This post follows on from my first post about the C programming language, and is the second in a series of posts about learning C

C has types - what does that mean? This post is going to focus on types and the role they perform in C. We're going to show how they're used but, more importantly, we're going to look at why they're used.

All the types!

A type in C is a type of data. For instance, if you want to use a whole number in C you can use an int, whereas if you want to use a decimal number you would need a different type of data, say a float.

The first temptation to watch out for, definitely from the Ruby side of my brain, is to think of types as being like classes. Types are ways of storing data in a computer (as we'll make clear), and not objects with methods, inheritance, attributes and all the other object oriented stuff. Forget about classes.

So to store an integer C has int. But it also has long, which will store a bigger integer, and, on top of that we have long long, which will store a really big integer. And if you just happen to know that the number you want to store is really, really big but will never be negative, you can go so far as to use the type unsigned long long.

This may all sound a little ridiculous from the point of view of a Rubyist or a JavaScripter - I mean, I can see why maybe a float is different to an int, but why do I need all these different ways to talk about integers?

The reason cuts to the heart of what a type system is for, and why C is a lower level programming language than Ruby and JavaScript.

Types for the memory

In C when we declare a variable we declare it with its type:

int my_number;

We can then assign a value to it:

int my_number;
my_number = 5;

In Ruby, you'd just need the second line, and in JavaScript it would be the same but int would be replaced with var.1

So, what's going on under the hood of your computer when you say "OK computer, let's have a variable"?

This is the bit where I remember the video of the American politician explaining that the internet is just a series of tubes.2 Well, I'm about to be just as reductivist and say that your computer's memory is 'just' a big long line of ones and zeroes.

a series zeroes and ones

So Matrix.

Each one of these ones and zeroes is called a bit (Binary DigIT - geddit?). And eight of those in a row is called a byte (no idea, you look it up).3 Byte is a good level of abstraction to work from for C, so let's replace our image of a very long list of zeroes and ones with a very long row of boxes, each box holding a byte of information.

a series of empty boxes

Less Matrix, but we're cool, right?

Why we have types

So now we need to do something with this memory - write programs! Ok, more specifically, we need to keep hold of an integer. And we can do this by reserving a specific portion of that very long row of boxes to keep the number in. But how many boxes do we need to do that? Well, basic maths tells us that a single byte could hold any number from 0 up to 255 as long as it's positive.4 Cool - so now we can keep hold of the number 5.

a series of empty boxes

But we might need to store much bigger numbers - what if we added 255 to our that variable, we'd not have enough space to store the number 260. So maybe we should reserve more bytes in memory to hold that number. How many? I don't know, maybe 128 of them, just to be safe.

a series of empty boxes

But isn't that terribly inefficient? We'd just be reserving a lot of bytes which would always be 0 if we never kept a number bigger than 20. I mean, this is C - the year is 1971, the most memory we're going to have available is 64KB. We don't want to run out of memory messing around with piddly little positive integers... how much space do we need to allocate to store a number?

And that's why we have different types for different magnitudes of integer. For small numbers there's things like char (a single byte) and int, and for bigger numbers we've got the mighty long long and unsigned long long.

The type of a variable is the space reserved for it in your computer's memory.5 C offers us control over memory allocation, at the price of us actually having to care about memory.

For instance, char which is good for storing information about a single ASCII characters (more about them later). But if we need to keep hold of a number bigger than 255, we can go with int, which is guaranteed to store a number between −32767 to 32767, which is two bytes.

We say "guaranteed", because a system's implementation of C could allocate more memory to an int, so the C standard tells us the maximum number a type can definitely store. In reality it's larger - on my Macbook Pro the maximum size of an int is in fact between -2147483648 and 2147483647 - four bytes in fact.

Integer overflow

Let's try some of this stuff out - here's a fun program.

#include <stdio.h>

int main() {
    int int_number;
    int_number = 2000000000;
    printf("int_number: %d\n", int_number);

    int_number = int_number + 2000000000;
    printf("int_number + 2000000000: %d\n", int_number);
}

Here we've got the main() function again, which will runs on execution. We're declaring a variable of type int called int_number on line 4 and assigning it the value of two billion on line 5. Then we're printing it out - printf() can take a format string as its first argument, allowing later arguments to be interpolated into the string - %d is the placeholder for an int to be inserted, so the value of int_number is printed instead of the &d in the string.

Then we reassign int_number to the value of int_number plus another two billion. And finally we print out the value of int_number again.

To compile and run it take a look at the first post in this series. Try it now and see what you get.

Something pretty odd, right? Maybe it'll be different on your computer but here for me the result of 2000000000 + 200000000 is -294967296. Which is just wrong.

What happened? Well we just experienced integer overflow, where C quite happily adds two numbers together and stores them in a variable, but if the type of the variable isn't big enough to hold the new number C will just store as many bits as it can in the space it's got. Look, try this variation:

#include <stdio.h>

int main() {
    int int_number;
    int_number = 2147483647;
    printf("int_number: %d\n", int_number);

    int_number = int_number + 1;
    printf("int_number + 2000000000: %d\n", int_number);
}

You should get -2147483648, not 2147483648.

Integer overflow is like the moment when all the numbers on your car's odometer are all 9s, and then they all roll over at once to all the 0s - you've run out of space to represent the new number with digits you're using. And for 'digits' in our example read 'bits' - 1111111111111111 becomes 0000000000000000, which is the representation of -2147483648 in binary.6

Fixing integer overflow

To solve this problem we need a ~bigger boat~ larger type to store our number in, which is as easy as changing an int to a unsigned long long:

#include <stdio.h>

int main() {
    unsigned long long int_number;
    int_number = 2000000000;
    printf("int_number: %d\n", int_number);

    int_number = int_number + 2000000000;
    printf("int_number + 2000000000: %d\n", int_number);
}

We should now be getting a nice round four billion.

Types in Ruby and JavaScript

Ruby and JavaScript also have types - but we just don't get to see them as often and they're not as granular. JavaScript numbers always take up 8 bytes - big enough to handle most numbers - and Ruby just switches the class of a number as it grows between classes like Fixnum and Bignum. These are both good solutions, and take away the headache of having to think about the correct type to use to represent an integer, but also lack the freedom for us to manage memory directly.

Practically speaking…

In practice when I write C, I start with using ints, wait until I see errors that are due to integer oveflow, and then find and replace to change the ints to long long or unsigned long long. In practice, on my highly specced modern computer, I'm not too worried about tinkering with how much memory I'm using for my toy C programs.

But it's nice to know I can.


  1. Or let or const or whatever the new flavour of the month is. Or you could do it in a single line, var number = 5, which some versions of C will also let you do: int number = 5 

  2. The late senator Ted Stevens 

  3. Worth noting that the size of a byte was only fixed when IBM decided it would be 8 bits. Maybe take a look at this

  4. Eight ones, 11111111, in binary is 255 in decimal. 

  5. This may be a contentious statement, and is worthy of another post, but here I'm refering to type as early programmers would have understood the idea of a type of data, rather than the types of type theory, based on Bertrand Russell's solution to the set theoretic paradoxes, which was later brought in to computer science by way of Alonzo Church and languages like ML and which functional programmers tend to wax lyrical about in languages like Scala. Take a look at this blog post and this short post

  6. If you want to know why this is, take a look at some articles on Two's Complement. This one is pretty good too. unsigned types don't have to worry about this and so can consequently store larger, non-negative integers. 

Fixing your last bash command

Guy I know - Oliver - command line ninja. Never makes a mistake. Can configure an AWS in a single long bash command. Typing speed through the roof. Bet you know someone like that too.

We mere mortals make mistakes and, while it's always good to learn from your mistakes, the first thing you have to do is fix them.

And to fix them you need to learn how to fix them.

Simple replace

Say you've typed an impossibly long command into the terminal with one irritating mistake. For me, it's usually something to do with xargs or curl

  curl -s -I -X POST https://en.wikipedia.org/wiki/CURL | grep HTTP | cut -d ' ' -f 2

Not the greatest command - but say I couldn't spell wikipedia...

  curl -s -I -X POST https://en.wikpedia.org/wiki/CURL | grep HTTP | cut -d ' ' -f 2

First solution: up and left

Naive, and effective. Press up to show the last command, keep tapping left until you get the the bit of the command you need to change, backspace to remove what you don't need and then enter what you do need

Second solution: bash vi mode

Bash has a vi mode, which can be activated by adding the following to your .bashrc.

set -o vi

If you're comfortable with vi you can now hit Escape to bounce into normal mode, Ctrl-P to go back to the last command, b a few times to get to the word you need to change... etc.

Vi mode is great - if you know a bit of vi. But you might not. So...

Third solution: quick substitution

How about something a little smarter:

^wikpedia^wikipedia

This is the bash quick substitution history expansion command - it runs the last command, substituting the first instance of the charaters after the first caret with the characters after the second caret.

Pretty neat huh? But that will olny work for the first instance - what if we need to replace every instance of wikpedia in the last command?

Fourth solution: full history substituton

Bash uses the ! character as the history expansion character - it is used to substitute a part of your current command with a previously executed command1. One ! does nothing - but the previous command can be accessed with the !! sequence. So, to print out the last command, try:

echo !!

These history expansions can also take modifier options to change the string before it gets inserted. The syntax is <select>:<modifier>. For instance, to put the last command in quotes:

echo !!:q

And to perform a global substitution on it:

echo !!:gs/wikpedia/wikipedia

There is lots that can be done with the above syntax - just take a look at the documentation.

Fifth solution: retype the command

Seriously, by the time you've remembered how to do some of the above, wouldn't it have just been easier to type it out again.

Just don't mess it up this time, right?


  1. And this is the reason I have to escape ! whenever I use it in commit messages 

Learning the C Programming Language

Part 1: hello, world

I've started to learn C. There's a number of reasons for this. First, it was the 'proper' programming language when I was a kid. Second, I've been learning quite a bit of Go recently, and just about every other page on the excellent Go Blog has a sentence that starts with "In C…".1

Third, I've started a course on Data Structures and Algorithms on Coursera, inspired by the ever-inspirational Denise Yu. The course only accepts submissions in four languages: Python, Java, C++ and C.2 Going through that list my mind went "Basically Ruby with bleaugh indentation, bleaugh Java bleaugh, sounds really hard, sounds hard." So I went with 'sounds hard'.

I thought I'd try and capture my process of learning C as it might be useful to others in a similar position - i.e. no computer science background but know how to program in Ruby and JavaScript. I'll be approaching this in a series of posts, most of which will be following the loose structure of a presentation I gave on C.3

Background

C was invented by Dennis Richie at Bell labs in the 1970s in order to write UNIX.4 He needed a language that provided sufficient abstraction to program quickly and efficiently, while at the same time being able to communicate directly with the computer's memory addresses to allow a programmer to perform low level optimizations. It has been a remarkably popular language, being used to write other languages (Ruby is written in C, NodeJS wouldn't work without libuv, written in C), and heavily influencing most modern programming languages (Java, Ruby, JavaScript, and most obviously, Go).

hello, world

Another of the ways C has influenced programming is through The C Programming Language by Richie and Brian Kernighan. The author's intials gave their name to a style of formatting code, along with giving us the de facto standard for your first program: "hello, world".

1 #include <stdio.h>
2 
3 int main() {
4     printf("hello, world\n");
5 }

Line 1: Include a file called stdio.h. It includes information about the functions in the C standard library that deal with I/O - input/output. In this case printing to the terminal.

Line 3: All C programs start with a function called main - this is the function that gets executed when the program is, well, executed. The arguments (of which we're using none at all) are in the parentheses. The body of the function is in the curly braces. So far so JavaScript. For people who have never seen a typed language the int is a little surprising. All it's telling us (and C) is that the return value for this function is an integer. We'll talk about types later, but for now the int is almost working like def in Ruby or function in JavaScript - it's a keyword declaring a function.

Line 4: The meat of the program. Here we're calling a function called printf which has already been written for us as a part of the C standard library - this is why we did that #include at the beginning. We're calling it with a single argument, a string literal inside double quotes, that just says "hello, world" with a newline character (\n) at the end.

At the end of the line we put a semi-colon to tell C that the line has finished.

Compiling and running

If you put all of that into a file called hello-world.c, save it, head to the terminal and type5

gcc hello-world.c

Then if you ls the same directory you should see a new file calleda.out.6 If you then run this with ./a.out, you'll see hello, world. Mission accomplished.

gcc is the Gnu Compiler Collection,7 which will compile your C program into an executable that your computer can run. All this means is that instead of translating each line of your program into something your computer can understand as you run it, as with something like Ruby or JavaScript, we're translating the whole program in one go before we run it.

a.out isn't that informative, so in order to get a better filename we can pass a flag to GCC:

gcc hello-world.c -o hello-word

Which outputs to the file hello-world, which we can now run with ./hello-world

Success, we're now all C programmers!

Why main returns an int

If you've run programs on the command line before you may be aware that you get exit codes with each program that runs. You may have even been (un)lucky enough to see something on the lines of Error: non-zero exit code. On a POSIX system, a process returns a number to the process that called it to let it know how things went - this is called the exit code. 0 is the good one, every other number is some flavour of 'gone wrong'.

The default return value for main, if we don't explicitly return a value, is 0. We can change this behaviour in our hello-world by returning an explicit value using the keyword return (very Ruby, so JavaScript).

#include <stdio.h>

int main() {
    printf("hello, world\n");
    return 1;
}

(don't forget the semicolon!)

Experiment with different return values. Remember to recompile your program each time you do. You may be able to see the returned value in your terminal's prompt. Otherwise you can echo out the last commands exit status with the command echo $?.


  1. Don't believe me? Look at this 

  2. You can now do the same course with a more diverse set of languages. 

  3. This was given at a Makers Academy alumni event. To view the speakers notes tap n

  4. A longer discussion of the origins of C was written by Richie and is available here 

  5. This assumes you have gcc installed, which is likely if you've been developing on your computer for a while. 

  6. You want to know why it's a.out? Read Richie's C History - link above. 

  7. Yes, it used to be called the Gnu C Compiler - acronyms are so wonderfully flexible... 

Double Dash

If you're anything like me you'll find your directories liberally scattered with some pretty strange directory and file names.

$ ls -la

-rw-r--r--   1 gypsydave5  1482096370   647 11 Oct 20:15 :w
drwxr-xr-x   2 gypsydave5  1482096370    68 10 Feb 08:55 -p/
-rw-r--r--   1 gypsydave5  1482096370  2900 11 Oct 20:15 \(

Hopefully you're nothing like me and you never get this, but I'm both a sloppy and impatient typist and so I will occasionally often mash the keyboard in Vim and name a file after the write command, or somehow create a directory called -p because I was using the recursive flag on mkdir.1

On that subject, let's try and get rid of the -p directory:

$ rm -rf -p

rm: illegal option -- p
usage: rm [-f | -i] [-dPRrvW] file ...
       unlink file

Hmmm, that sucks. What about...

$ rm -rf "-p"

rm: illegal option -- p
usage: rm [-f | -i] [-dPRrvW] file ...
       unlink file

Boo. Happily there's a *nix convention to help with these situations: --. Double-dash tells the command you're running that everything that comes after it is not to be treated as a command option, but is instead a filename. So:

$ rm -rf -- -p
$ ls -la

-rw-r--r--   1 gypsydave5  1482096370   647 11 Oct 20:15 :w
-rw-r--r--   1 gypsydave5  1482096370  2900 11 Oct 20:15 (

DISCO!

This behaviour is implemented in most of the command line tools you'll use on a *nix system - it's useful to know.


  1. I'm still not sure how I managed this. But I'm staring at the evidence now, so I know it must've happened.