The End of Null Checking

As any software engineer who works primarily in C++ can tell you, one of the largest class of bugs in any code base comes from unchecked dereferencing of null pointers. A healthy double digit fraction of crashes can be fixed with a simple “null check”. The “null check” is often the first thing you look for when attempting to address a crash in someone else’s code, because even if you don’t know anything about their code, you still know it’s one of the easiest things to get wrong.

Any engineer worth his salt feels a deep sense of guilt if they dereference a pointer without first checking to see if it isn’t null. It’s like, you’ve gotta look over your shoulder first to see if anyone is watching before you do it. After years of conditioning, the null checks just magically emerge as you type, and it takes great courage to force yourself not to do it.

But here’s a little secret: I don’t null check anymore.

Yes you heard that correctly. It took a little bit of convincing, and unwrapping of all the conditioning that has been afflicted to my poor null shy brain over the long years, but I no longer check for null before dereferencing a pointer.

But what exactly happens, then, if the pointer is actually null when I dereference it?

I don’t care.

I don’t care about doing null checks, and the result is that my code is simpler, and its more fire proof. In fact, I’ve dramatically reduced the chance that I will crash due to a null pointer.

‘How can this be?’, you wonder. Well, I’m going to tell you.

It all started one day when I was working with a Bool Provider.

The Story of a Bool Provider

What’s a Bool Provider, you ask?

Well, instead of passing around a boolean variable whose value is static, we could make things a bit more dynamic by possibly passing around a function that gives us the value of that boolean variable. Or even better, we can pass around an object, which can be serialized, which exposes a function that provides a bool. And that, ladies and gentlemen, is a Bool Provider.

In my particular case, I was dealing with a pointer to a Bool Provider. To get the actual boolean value, I had to write some code that looked like this:

bool Value = false;

if (Provider)
{
     if (Provider->ProvideBool(Value))
     {
          // 'Value' has been properly assigned by the Bool Provider
     }
     else
     {
          // Bool Provider failed to provide a bool, handle the error
     }
}
else
{
     // We can't get the Bool Provider, handle the error
}

That is so VERBOSE!!!

I mean, I’m a C++ developer, so what do I care? Well, as it turns out, sometimes I do care, and it makes me cry a little when I have to write such verbose code. In this case it really stung, because before the Bool Provider, we were just passing a bool around, and the assignment of ‘Value’ was just one very simple line of code.

Well, the first win here is that perhaps we don’t really need to distinguish between handling two different kinds of errors – we just want to know if the bool was successfully retrieved or not.

bool Value = false;

if (Provider && Provider->ProvideBool(Value))
{
     // 'Value' has been properly assigned by the Bool Provider
} 
else
{
     // We can't get the bool for reasons, handle the error
}

I mean, that’s a little better, but it still really sucks. It’s a lot more than one line of code.

Now, there are many cases where I’m fine using the default value, without any extra error checking, if the BoolProvider failed for some reason. Possibly most cases. And if I don’t care about handling an error, I don’t really even need to branch at all. Just the expression inside of the ‘if’ statement should be enough to do the job.

The result looks kind of strange though.

bool Value = false;
Provider && Provider->ProvideBool(Value);

// 'Value' *may* have been assigned by the Bool Provider.  Good enough.

I now have two lines of kind of weird looking code, and from here I’m basically stuck.

To tell you the truth, what I really wanted was a function on the Bool Provider that just directly returned the bool I was looking for. Perhaps this function had optional parameters that indicated what the result should be if I ran into an error, as well as a parameter that indicated whether an error had occured or not. Something like this:

// Returns the value of the Bool we are looking for
// Optional parameters for handling error cases
bool BoolProvider::GetBool(bool Default = false, bool* Failed = nullptr);

Empowered with this baby, I could write a super simple line of code, like this:

bool Value = Provider->GetBool();

Now that’s WAY better!

Conditioning backfires

Unfortunately, this is about where my null check spider sense started punching me in the face. Like a recipient of electro-shock therapy, my eye started to twitch.

My shiny new GetBool() function was exposed to a crash that I had been diligently safeguarding against the whole time.

Then I had an idea.

It was a weird idea. It made my years worth of null check conditioning start to crawl under my skin.

What if I did my null check inside the GetBool() function? I mean, just because Provider is null, doesn’t mean it will crash when I call the function. In fact, as every C++ developer with years of nullptr access violation experience knows, many times the crash from dereferencing a null pointer will occur inside of a function called on a nullptr. The remedy, in this case, is to figure out where that function was being called from, and then safely wrap a null check around it.

So executing on this weird idea, here’s basically what GetBool() should look like on the inside.

// Returns the value of the Bool we are looking for, has default parameters to handle error cases
bool BoolProvider::GetBool(bool Default = false, bool* Failed = nullptr)
{
     bool Result = Default;

     if (this)
     {
          // access 'this' to stuff the desired bool into Result
          // if that fails, write to Failed
     }
     else
     {
          // do some stuff without using 'this' to handle the failure
          Failed && *Failed = true;
     }

     return Result;
}

It took me a while to convince myself that this was actually legal.

Obviously, you can access ‘this’ inside of a member function. In this particular case, if ‘this’ is null, we will report that we Failed, and return the Default and do so without ever dereferencing ‘this’. If ‘this’ is not null, we will do our normal GetBool() processing.

I convinced myself that it was not only legal, but that it also had a sufficient level of error handling, if ever I wanted it.

But was it good? There’s lots of legal things that you can do in C++ that will earn you a place sitting on Satan’s lap, in the fieriest heart of hell. I tend to shy away from ‘technically legal’ solutions while coding, because the short term win is almost always outweighed by the long term loss. So is this weird solution actually good? Is it a pattern that I wouldn’t mind seeing replicated?

Maybe it is too soon in the article to ask whether this pattern is good or not. You see, while I’ve now been stripped of all of my null pointer conditioning, it’s likely that you haven’t been. And so your null check spider sense may be punching you in the face right now, which makes it hard for you to see any goodness at all.

After deciding to define my GetBool() function with the null check on the inside, I immediately had to grapple with reality of what I had just created.

The following code is valid:

BoolProvider = nullptr;
bool Value = BoolProvider->GetBool();

If your null check spider sense hadn’t yet face punched you up to this point, it should have just about now.

Making Sense

After grappling with some severe cognitive dissonance, I started to wrap my mind around the whole situation. Little by little, I started to actually like what I had just done. I started to wonder how it was that we had ever done it differently.

And that’s when I realized that I could finally join the elite ranks of programmer snobs who complain about the way C++ approaches Object Oriented Programming. (As it turns out, after revisiting a few of those gripes, turning them sideways and squinting really hard, it appears to me that their gripes are essentially just derivatives of my gripes).

Here’s my first gripe: ‘this’ is implicitly defined for Member Functions.

Actually, that’s not really a gripe against the language (I actually appreciate not having to type ‘this’), so much as it is a realization that this fact is the source of some really bad coding patterns that have wormed their way into a programming zeitgeist which inflicts Object Oriented C++ engineers the world over. My gripe is more with the zeitgeist than it is with the language.

Imagine a world where ‘this’ was defined explicitly. If we forced the member function to be defined the same way it is under the hood, as a thiscall, it would look something like this:

bool MyObject::Function(MyObject* this, ... other parms ...)

And for a const member function, instead of placing the const keyword after the function declaration, you would do this:

bool MyObject::ConstFunction(const MyObject* this, .... other parms ...)

While on this little imaginary trip you must also imagine that since ‘this’ is no longer implicit, we must use it explicitly within the body of the function as well, while accessing any member of ‘this’. And of course, before making any such access, we should have first checked that ‘this’ isn’t null.

Hopefully you realize that this imaginary view of a Member Function renders it almost equivalent to a normal Function, (caveats: member functions have access to non-public members of ‘this’, and can be virtual).

With this equivalency in mind, I will now ask you a question:

If you are passing a pointer as a parameter into a normal Function, would you think it is better to do the null check inside the function, where you only have to do it once, or outside the function where you’d have to do it every time you call the function? The answer is that in most cases, you’d want to do the null check inside.

In fact, imagine that you were writing non-Object Oriented procedural code, and someone informed you that you couldn’t or shouldn’t do the null check inside. You might (rightly) think that forcing all clients of the function to do their own null check would just be asking for a multitude of bugs. If you had a large enough code base, where this rule was globally enforced, you might go so far as to say that it was utterly insane.

But this is exactly how we treat the ‘this’ pointer. Primarily because it’s implicit.

If doing the null check inside is mostly good for the case of a normal Function, or even for the case of any standard parameter that we might be passing into a Member Function, why is it suddenly not good for the case of that special, implicit, parameter called ‘this’?

Every Rule has Exceptions

There are cases, for sure, where you don’t want to do the null check inside the function. I’m going to highlight some arguments (and non-arguments) for doing the null check outside, rather than inside.

A non-argument for placing the null check on the outside, is that perhaps you want some response other than what the Function defines internally as its response to that pointer being null. I call this a non-argument, because an internal null check doesn’t preclude an external null check. Think of the internal null check as the default response to the pointer being null, with the external null check as an optional override to that response.

Another non-argument for placing the null check outside, which only applies to a Member Function, is that the compiler forces you. Some very pedantic compilers deem it their business to enforce their own ideology on your code. They will emit a warning if you null check ‘this’ inside of a member function. If you are operating in a ‘warnings as errors’ paradigm, then this can stop your null pointer re-conditioning efforts in their tracks. I call this a non-argument, because the folks enforcing this are doing so in response to the zeitgeist, rather than actual principle. Also, there’s always ‘technically legal’ ways to fool the compiler to allow you to null check ‘this’ inside of a member function.

Now, to some plausible arguments.

You may be working on some high performance code, and you want to guarantee that the null check was done higher up in the call tree, and you don’t want to pay for that check further down. In this case, you may end up doing what I do. You’d define your function as taking a reference, rather than a pointer. (Technically, if you use a reference, you are no longer within the realm of “how to deal with pointers” discussion.) If this option isn’t available for some reason, you’d annotate the function, warning folks not to pass in a nullptr.

Finally, there is a very good, in fact unavoidable, argument for doing an external null check that applies only to a certain types of Member Function. Member Functions differ from normal Functions in that they can be defined as virtual. A virtual function requires a vtable lookup before actually calling the function, and the vtable lookup counts as dereferencing the pointer. You must always null check a pointer before calling a virtual function.

One More Complaint

We don’t live in a world where ‘this’ is defined explicitly for a Member Function, but that’s OK. We can actually take all of the considerations we’ve derived by imagining that world, and assemble them into a paradigm that is useful for the implicit ‘this’ world. In this paradigm we almost always null check ‘this’ inside of a member function, and almost never null check on the outside of the function.

Under this paradigm, we call member functions on null pointers with abandon, and it actually makes our lives better not worse.

Before formulating this paradigm, I’m going to register my second gripe with the C++ language folks. Unlike my first pseudo-gripe, this one is a legitimate gripe against the language:

How come there isn’t a convention that allows us to pass ‘this’ to a member function as a reference instead of a pointer??

Imagine a new world now, where the compiler can enforce that a member function be called with . but not with ->

The ability to create member functions like this would complete the analogy created earlier between Member Functions and normal Functions. In the explicit ‘this’ world, these ‘reference only’ member functions would be written as:

void MyObject::Func(MyObject& this);
void MyObject::ConstFunc(const MyObject& this);

Now, coming back to the implicit ‘this’ world, let’s say that we identify such member functions in a manner similar to the way we identify const member functions, by placing some token at the end of the function declaration. Since we want to identify that this function should be called from a reference, let’s use the token ‘&’.

Consider the following member function declarations that include our new special token:

void MyObject::Func() &;
void MyObject::ConstFunc() const&;

MyObject* A;

A->Func();   // <--- produces a compiler error
(*A).Func(); // <--- does not produce a compiler error

If it were possible to declare such functions, this would be the primary mechanism that I would use to indicate and enforce a null check to occur externally, rather than internally. This is the method I would prefer for handling the legitimate cases where we can’t or shouldn’t null check internally, such as performance critical or virtual functions.

But now we need to come back to the real world. ‘this’ is implicit, and there doesn’t exist a convention where we can pass ‘this’ to a member function by reference instead of as a pointer.

We’ll make the best with what we’ve got.

Follow the Rules

Let’s start building an actual formulation of my null pointer paradigm, which works for the real world.

Rule #1: If a pointer is passed into a function, we expect it to be null checked internally. If we want null to be checked externally, we enforce this by passing a reference instead of a pointer. This is true for normal functions as well as member functions.

Rule #2: (since we can’t fully enforce Rule #1 for ‘this’). Notate member functions that require an external null check. For the code that I write at home, I add an underscore to the beginning of the function name. If I’m using a function that has an underscore at the beginning, it’s a signal that I need to have previously null checked the object. Note: ALL virtual functions should have this mark.

Rule #3: Any member function that doesn’t have the ‘external null check’ mark should internally check if ‘this’ is not null before accessing any members of this. (For the first little while, you may want to force yourself to explicitly type out ‘this->’ for each member access, just to know you aren’t fooling yourself.) If ‘this’ is null, the member function can respond by operating on any non-this data available, including parameters passed into the function.

Rule #4: Try to keep marked functions as non-public as possible. This helps mitigate the lack of compiler supported ‘reference only’ member functions, reducing as much as possible the need for clients to support an external null check.

Rule #5: With Rule #2 and Rule #4 combined, the implication is that virtual functions are preferred to be non-public. This may require a public facing non-virtual function which does the internal null check before calling the virtual function.

Rule #6: No public facing naked access of member fields. If you retrieve these values through accessor functions, it gives the object a chance to do an internal null check and possibly return a default value.

I’ve discovered that the largest risk of following this paradigm, is that I may end up writing code that does “too much” internal null checking. For instance, calling 5 member functions in a row ends up resulting in 5 null checks. There are definitely ways to address this, but possibly those ways have just a bit more friction than simply calling 5 public facing, internal null checking, functions in a row. The other downside is that there’s a bit more boilerplate to write up front, especially when virtual functions come into play.

I follow these rules very regularly with my home code (I don’t do it much at work – the zeitgeist and all). I understand the cost of using this paradigm – it’s definitely not entirely free. But if I weigh the cost of the up front boilerplate, or possibly falling into the ‘too many null checks’ trap, against the cost of all of the external null checking that I no longer have to do, and the peace of mind that I will likely never get a null access violation again in my entire life, and the simplicity and readability of my code, I find that it’s definitely worth it.