Email or username:

Password:

Forgot your password?
Drew DeVault

For my first little thread here on Fosstodon, I thought it would be cool to talk about an interesting topic I was researching this week: address sanitizer (ASan). This tool audits memory usage in C and C++ programs using Clang/LLVM, and the implementation is so simple and brilliant that I was really quite impressed by it.

7 comments
Drew DeVault

First of all, some background. On x86_64 systems, there are two parts of virtual memory: high memory and low memory. Low memory goes from 0 to 0x00007FFFFFFFFFFF, and high memory goes from 0xFFFF800000000000 to ~0. Useful memory (physical memory, shared memory, DMA, etc) can only be mapped in these address ranges.

Generally speaking, the user's program image (code, globals, etc) is stored in low memory, and the heap and stack are stored at opposite ends of high memory.

Drew DeVault

ASan uses mmap at runtime to set up two "shadow" memory areas, one for high memory and one for low memory. This produces a virtual address space layout similar to what you see in this image. The memory layout is designed such that each addressable bit in the shadow area can be mapped to one byte in "real" memory.

Drew DeVault

Here's a screenshot of some Hare code that sets up similar memory mappings. MAP_NORESERVE is important here: it tells the kernel not to actually allocate anything yet, but just to set up the memory mapping. This leaves the shadow areas mapped in the process's address space, but does not allocate physical memory to fill it. When you try to read or write an address in these areas, a page fault occurs and the kernel automatically allocates and maps a physical page at the appropriate address.

Drew DeVault

Now for the trick: the compiler instruments every load and store instruction it compiles to take the address >> 3 + 0x7fff8000 to obtain a shadow address which corresponds to the appropriate address in real memory. It then checks if the bits for the required load/store are zero, and if so, lets the instruction proceed. If not, an ASan error is detected and the program stops with a backtrace and other error details.

Drew DeVault

This allows the compiler to *poison* addresses by marking setting the corresponding bits in the shadow area.

When you allocate a stack variable or a global, the compiler adds a "red zone", unused bytes on either end of the allocation, and marks them as poisoned. Thus, buffer overflows are detectable.

Memory allocations work similarly, but with a bonus that memory is poisoned when freed. Thus, use-after-free is detectable.

Drew DeVault

The performance implications of this are pretty agreeable, on the order of 2x CPU time and 3-4x memory usage. Much better than, say, Valgrind, which uses a dynamic recompilation approach that adds on the order of 20x overhead.

Beautifully simple solution that robustly captures essentially all memory issues in C family programming languages.

Drew DeVault

I plan on implementing ASan for Hare soon, and we'll likely see a cproc implementation as well. /thread

Go Up