[Admin] Systemd is bad parsing and should feel bad [security] [systemd]

FYI

Via: https://blog.erratasec.com/2018/10/systemd-is-bad-parsing-and-should-feel.html

Systemd has a remotely exploitable bug in it's DHCPv6 client. That means
anybody on the local network can send you a packet and take control of
your computer. The flaw is a typical buffer-overflow. Several news
stories have pointed out that this client was written from scratch, as
if that were the moral failing, instead of reusing existing code. That's
not the problem.

The problem is that it was written from scratch without taking advantage
of the lessons of the past. It makes the same mistakes all over again.

In the late 1990s and early 2000s, we learned that parsing input is a
problem. The traditional ad hoc approach you were taught in school is
wrong. It's wrong from an abstract theoretical point of view. It's wrong
from the practical point of view, error prone and leading to spaghetti
code.

The first thing you need to unlearn is byte-swapping. I know that this
was some sort epiphany you had when you learned network programming but
byte-swapping is wrong. If you find yourself using a macro to swap
bytes, like the be16toh() macro used in this code, then you are doing it
wrong.

But, you say, the network byte-order is big-endian, while today's Intel
and ARM processors are little-endian. So you have to swap bytes, don't
you?

No. As proof of the matter I point to every other language other than
C/C++. They don't don't swap bytes. Their internal integer format is
undefined. Indeed, something like JavaScript may be storing numbers as a
floating points. You can't muck around with the internal format of their
integers even if you wanted to.

An example of byte swapping in the code is something like this:

In this code, it's taking a buffer of raw bytes from the DHCPv6 packet
and "casting" it as a C internal structure. The packet contains a
two-byte big-endian length field, "option->len", which the code must
byte-swap in order to use.

Among the errors here is casting an internal structure over external
data. From an abstract theory point of view, this is wrong. Internal
structures are undefined. Just because you can sort of know the
definition in C/C++ doesn't change the fact that they are still
undefined.

From a practical point of view, this leads to confusion, as the
programmer is never quite clear as to the boundary between external and
internal data. You are supposed to rigorously verify external data,
because the hacker controls it. You don't keep double-checking and
second-guessing internal data, because that would be stupid. When you
blur the lines between internal and external data, then your checks get
muddled up.

Yes you can, in C/C++, cast an internal structure over external data.
But just because you can doesn't mean you should. What you should do
instead is parse data the same way as if you were writing code in
JavaScript. For example, to grab the DHCP6 option length field, you
should write something like:

The thing about this code is that you don't know whether it's JavaScript
or C, because it's both, and it does the same thing for both.

Byte "swapping" isn't happening. We aren't extracting an integer from a
packet, then changing it's internal format. Instead, we are extracting
two bytes and combining them. This description may seem like needless
pedantry, but it's really really important that you grok this. For
example, there is no conditional macro here that does one operation for
a little-endian CPU, and another operation for a big-endian CPU -- it
does the same thing for both CPUs. Whatever words you want to use to
describe the difference, it's still profound and important.

The other thing that you shouldn't do, even though C/C++ allows it, is
pointer arithmetic. Again, it's one of those epiphany things C
programmers remember from their early days. It's something they just
couldn't grasp until one day they did, and then they fell in love with
it. Except it's bad. The reason you struggled to grok it is because it's
stupid and you shouldn't be using it. No other language has it, because
it's bad.

I mean, back in the day, it was a useful performance optimization.
Iterating through an array can be faster adding to pointer than
incrementing an index. But that's virtually never the case today, and it
just leads to needless spaghetti code. It leads to convoluted
constructions like the following at the heart of this bug where you have
to both do arithmetic on the pointer as well as on the length which you
are checking against. This nonsense leads to confusion and ultimately,
buffer overflows.

In a heckofalot of buffer overflows over the years, there's a construct
just like this lurking near the bug. If you are going to do a rewrite of
code, then this is a construct you need to avoid. Just say no to pointer
arithmetic.

In my code, you see a lot of constructs where it's buf, offset, and
length. The buf variable points to the start of the buffer and is never
incremented. The length variable is the max length of the buffer and
likewise never changes. It's the offset variable that is incremented
throughout.

Because of simplicity, buffer overflow checks become obvious, as it's
always "offset + x < length", and easy to verify. In contrast, here is
the fix for the DHCPv6 buffer overflow. That this is checking for an
external buffer overflow is less obvious:

Now let's look at that error code. That's not what ENOBUFS really means.
That's an operating system error code that has specific meaning about
kernel buffers. Overloading it for userland code is inappropriate.

That argument is a bit pedantic I grant you, but that's only the start.
The bigger issue is that it's identifying the symptom not the problem.
The ultimate problem is that the code failed to sanitize the original
input, allowing the hacker to drive computation this deep in the system.
The error isn't that the buffer is too small to hold the output, the
original error is that the input was too big. Imagine if this gets
logged and the sysadmin reviewing dmesg asks themselves how they can
allocate bigger buffers to the DHCP6 daemon. That is entirely the wrong
idea.

Again, we go back to lessons of 20 years that this code ignored, the
necessity of sanitizing input.

Now let's look at assert(). This is a feature in C that we use to
double-check things in order to catch programming mistakes early. An
example is the code below, which is checking for programming mistakes
where the caller of the function may have used NULL-pointers:

This is pretty normal, but now consider this other use of assert().

This isn't checking errors by the programmer here, but is instead
validating input. That's not what you are supposed to use asserts for.
This are very different things. It's a coding horror that makes you
shriek and run away when you see it. In my fact, that's my Halloween
costume this year, using asserts to validate network input.

This reflects a naive misunderstanding by programmers who don't
understand the difference between out-of-band checks validating the
code, and what the code is supposed to be doing validating input. Like
the buffer overflow check above, EINVAL because a programmer made a
mistake is a vastly different error than EINVAL because a hacker tried
to inject bad input. These aren't the same things, they aren't even in
the same realm.

Conclusion

Rewriting old code is a good thing -- as long as you are fixing the
problems of the past and not repeating them. We have 20 years of
experience with what goes wrong in network code. We have academic
disciplines like langsec that study the problem. We have lots of good
examples of network parsing done well. There is really no excuse for
code that is of this low quality.

This code has no redeeming features. It must be thrown away and
rewritten yet again. This time by an experienced programmer who know
what error codes mean, how to use asserts properly, and most of all, who
has experience at network programming.