Telios I

Memory Safety, or Why mmap Is Unsafe

I’ve recently been diving into unsafe rust, and I’ve been fascinated with how things interact. One of the most interesting aspects, to me, are memory safety, and ownership.

Now, I’m going to make a major disclaimer here: I am by no means an expert in Rust, or memory safety, or even low-level programming. While I have been writing Rust for a while, I haven’t delved into the unsafety aspects of the language. This post will be an attempt to explain why I haven’t.

The Star of the Show: mmap

Before I go too far into Rust and memory unsafety, I first want to talk about the use of the mmap(2) syscall. I’m going to copy the signature verbatum, and describe some of its uses, before we move forward. In case you’re alaready familiar with mmap, feel free to skip ahead.

mmap(2) (and its corresponding munmap(2) syscall) are used to map a file or device into memory.

1
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
2
int munmap(void *addr, size_t length);

At its core, mmap maps the file described by fd from the offset in offset at the address addr, valid for length bytes. This allows us to read from and write to the file described by fd as-if it were just in memory. (The file descriptor can be freely closed after the mapping is created.) If addr is null, the kernel will choose an address for us (and return it). If it errors, it’ll return a special value (not NULL), and set errno.

The mmap call supports many flags in flags, but there are three that we’re really interested in:

  • MAP_SHARED/MAP_SHARED_VALIDATE: These flags allow us to share the mapped memory with other processes. MAP_SHARED allows us to share the memory with other processes, while MAP_SHARED_VALIDATE validates all of the other flags that we pass. (MAP_SHARED just ignores invalid flags.)
  • MAP_PRIVATE: This flag creates “a private copy-on-write mapping.” Updates to the mapping do not carry over to the file, or to other processes; they are private to the process. However, according to the manual, “it is unspecified whether changes made to the file after the mmap() call are visible in the mapped region.
  • MAP_ANON/MAP_ANONYMOUS: The mapping is not backed by any file. fd is ignored, but for compatibility reasons, it’s suggested to pass -1. (The offset argument should also be set to 0.)

Creating an Anonymous Private Mapping

We’ll start in C, so that we can translate it over to Rust later. For this example, we’ll create an anonymous private mapping. This is useful for allocating memory - we create a request to the operating system for a chunk of memory that we can use for our own purposes. Many allocators (e.g., malloc) sometimes use mmap to allocate memory. I will gloss over the details of pages for this post, since we don’t need to know what they are for now.

For this example, we’ll create a buffer of 32 bytes, and write the "foo" string to it, before printing. This is a very simple example, but it’s just a way to demonstrate how mmap works. Behold!

1
#include <stddef.h>
2
#include <stdio.h>
3
#include <stdlib.h>
4
#include <sys/mman.h>
5
6
int main() {
7
  void *addr = mmap(NULL, 32, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); // (1)
8
  if (addr == MAP_FAILED) { // (2)
9
    perror("mmap");
10
    exit(EXIT_FAILURE);
11
  }
12
13
  char *bytes = addr;
14
15
  bytes[0] = 'f';
16
  bytes[1] = 'o';
17
  bytes[2] = 'o';
18
  bytes[3] = '\0';
19
20
  printf("%s\n", bytes); // (3)
21
22
  if (munmap(addr, 32) == -1) { // (4)
23
    perror("munmap");
24
    exit(EXIT_FAILURE);
25
  }
26
27
  return 0;
28
}

First, for (1), we create the mmap mapping; since we do not care where the allocation is in our address space is1, we’ll let the system choose the address for us by passing NULL as the first parameter - and it’ll return where it is allocated to us. Since we want to be able to read and write to the memory, we pass PROT_READ | PROT_WRITE as the second parameter. And since this is a private allocation with no backing file, we’ll pass MAP_ANON as the fourth parameter with MAP_PRIVATE to make it private. With no backing file, we’ll pass -1 as the fifth parameter and 0 as the sixth parameter.

If the allocation failed for some reason, in (2), mmap will have returned MAP_FAILED2. If this happens, we should handle the error by printing an error message and exiting the program. (Note that perror checks the value of errno and prints a message based on the value of errno.)

We’ll then put the string in the buffer - individually assigning the characters to not overcomplicate things - and use printf to print the string.

Finally, we unmap the memory. After that point, any memory accesses to the mapped region will be invalid and result in undefined behavior. Fun!

1
$ cc -Wall -Wextra private.c -o private
2
$ ./private
3
foo

And it works as expected!

Creating a Shared Mapping for a File

Now, we’re going to use mmap to create a shared mapping for a file. This is a bit more involved, but it’s the same basic concept:

1
#define _GNU_SOURCE
2
3
#include <stddef.h>
4
#include <stdio.h>
5
#include <stdlib.h>
6
#include <sys/mman.h>
7
#include <fcntl.h>
8
#include <sys/stat.h>
9
10
int main() {
11
  int fd = open("example.txt", O_RDWR | O_CREAT, 0644); // (1)
12
  if(fd == -1) {
13
    perror("open");
14
    exit(EXIT_FAILURE);
15
  }
16
17
  if(fallocate(fd, 0, 0, 4) == -1) { // (2)
18
    perror("fallocate");
19
    exit(EXIT_FAILURE);
20
  }
21
22
  struct stat st;
23
  if(fstat(fd, &st) == -1) {
24
    perror("fstat");
25
    exit(EXIT_FAILURE);
26
  }
27
28
  void *addr = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); // (3)
29
  if (addr == MAP_FAILED) {
30
    perror("mmap");
31
    exit(EXIT_FAILURE);
32
  }
33
34
  char *bytes = addr;
35
36
  bytes[0] = 'f';
37
  bytes[1] = 'o';
38
  bytes[2] = 'o';
39
  bytes[3] = '\0';
40
41
  printf("%s\n", bytes);
42
43
  if (munmap(addr, 32) == -1) {
44
    perror("munmap");
45
    exit(EXIT_FAILURE);
46
  }
47
48
  return 0;
49
}

It’s very similar to what we were previously doing, but there’s a few changes:

  1. We’re now opening the file. We need to ensure that the file exists; so if it doesn’t, we’ll need to create it.
  2. However, if we just created the file, the file may be empty; in order to write 4 bytes to it, we need to ensure that the file is at least 4 bytes long; this call to fallocate ensures that. If the file is longer, this call does nothing. If we don’t do this, one of two things happen:
    • If the file has nothing in it, i.e., is size 0, the call to mmap will fail with EINVAL. (Since Linux 2.6.12, the manual says. Neat!)
    • If the file has not enough space in it, we’ll get undefined behavior when we try to write past the end of the file3.
  3. Then, we perform the mmap. We’ll map the entire file, from start to end, into memory; as a shared mapping. If we do not use MAP_SHARED here, any modifications we make to the mapped memory will not be reflected in the file.

At the end, this is what we get:

1
$ cc shared.c -o shared
2
$ ./shared
3
foo
4
$ hexdump -C example.txt
5
00000000  66 6f 6f 00                                       |foo.|
6
00000004

Cool. Now we’ve got a solid idea of what we’re working with.

Let’s Try This in Rust

Let’s try to bring this over to Rust. There already exists multiple crates to handle interacting with mmap, but we won’t use it here yet because it’s spoilers.

We’ll use libc. libc contains libc::mmap (and libc::mmap64, for 64-bit file offsets, which we don’t need right now). Let’s take a look at the signature for libc::mmap:

1
pub unsafe extern "C" fn mmap(
2
    addr: *mut c_void,
3
    len: size_t,
4
    prot: c_int,
5
    flags: c_int,
6
    fd: c_int,
7
    offset: off_t,
8
) -> *mut c_void;

This makes sense; all of these match the arguments we need to pass to mmap, and they all have the same types as the corresponding arguments in the C function.

We’ll create a very small wrapper around it to handle the error checking and returning a Result:

1
#![allow(unsafe_code)]
2
3
use libc::{c_int, c_void, off_t, size_t};
4
use std::ptr::NonNull;
5
6
pub unsafe fn mmap(
7
    addr: *mut c_void,
8
    len: size_t,
9
    prot: c_int,
10
    flags: c_int,
11
    fd: c_int,
12
    offset: off_t,
13
) -> Result<NonNull<c_void>, std::io::Error> {
14
    let addr = libc::mmap(addr, len, prot, flags, fd, offset);
15
    if addr == libc::MAP_FAILED {
16
        Err(std::io::Error::last_os_error())
17
    } else {
18
        Ok(NonNull::new_unchecked(addr))
19
    }
20
}
21
22
pub unsafe fn munmap(addr: NonNull<c_void>, len: size_t) -> Result<(), std::io::Error> {
23
    let result = libc::munmap(addr.as_ptr(), len);
24
    if result == -1 {
25
        Err(std::io::Error::last_os_error())
26
    } else {
27
        Ok(())
28
    }
29
}

io::Error::last_os_error() reads the last value of errno and creates a new io::Error from it, which is perfect for our needs. However, we are always going to be passing in std::ptr::null() as the first argument to mmap; doing anything else would complicate things.

Anonymous Private Mapping in Rust

Let’s keep things simple and just… copy over what we did for the first anonymous private mapping example:

1
mod sys;
2
3
use std::ffi::CStr;
4
5
#[allow(unsafe_code)]
6
fn main() -> Result<(), Box<dyn std::error::Error>> {
7
    let map = unsafe { // (1)
8
        self::sys::mmap(
9
            std::ptr::null_mut(),
10
            32,
11
            libc::PROT_READ | libc::PROT_WRITE,
12
            libc::MAP_ANON | libc::MAP_PRIVATE,
13
            -1,
14
            0,
15
        )
16
    }?;
17
18
    {
19
        let bytes_ptr = map.as_ptr().cast::<u8>(); // (2)
20
        let slice_ptr = std::ptr::slice_from_raw_parts_mut(bytes_ptr, 32); // (3)
21
        let slice = unsafe { &mut *slice_ptr };
22
        slice[0..4].copy_from_slice(b"foo\0"); // (4)
23
        let string = CStr::from_bytes_until_nul(&slice[..])?.to_str()?; // (5)
24
        println!("{string}");
25
    }
26
27
    unsafe { self::sys::munmap(map, 32) }?; // (6)
28
    Ok(())
29
}

This… is already intimidating. But we’ll step through it.

One of the things that I like to do is document everything - for every use of unsafe - what makes it unsafe, and why we can use it safely. Let’s start from the top.

At (1), we’re calling mmap. We haven’t yet introduced references, so we don’t have to worry about any invariants with those. Holding pointers isn’t unsafe; calling std::ptr::null() isn’t unsafe, as long as we don’t dereference it. We know that when mmap returns successfully, it returns a an allocation, starting at the address returned, and continuing for the specified length in bytes. mmap cannot return a NULL pointer, and the API we wrote captures that; the pointer uses std::ptr::NonNull (which is a fancy *mut T with features). However, for reasons we’ll get into later, we do need the allocation to be both readable and writable4.

Surprisingly, the cast at (2) can be dangerous! The return result of mmap returns *mut c_void, so we need to cast that into something to be useable. Here, we’re just casting it into u8. This isn’t marked as unsafe because we’re not yet dereferencing it - it would be unsafe if this were to dereference the pointer. Consider instead if we were to cast it into a bool, for example. bool only has two valid bit patterns - it is “undefined behavior for an object with the boolean type to have any other bit pattern.” However, we don’t have any guarantee what the underlying value would be5! If we cast it into a bool and then try to read the first one, that could potentially be undefined behavior - unsafe!

Another thing to be concerned about here is alignment. Certain types have alignment requirements, where the address of the values of the types needs to be a multiple of the alignment. For example, u64 requires an alignment of 8 bytes, meaning that the address of a u64 must be a multiple of 8. In our case, mmap returns a page-aligned address, which means that the address is a multiple of the page size (typically 4096 bytes).

Finally, this is where the magic happens. At (3), we convert the raw pointer into a pointer to a slice. Recall that in Rust, a a pointer to a slice is actually a fat pointer, which contains both the pointer to the data, and the length of pointed-to data. Casting our pointer into a slice pointer is not unsafe - it’s not unsafe yet because we haven’t done anything with it yet!

It becomes unsafe the moment we convert our pointer to a reference, as we now have to abide by the rules of references. Let’s go through a few of them, and keep in mind this is by no means an exhaustive list:

  1. It must be aligned. u8 has an alignment of 1 byte, so it is always aligned; even still, mmap returns a page-aligned address, which means that any value that has an alignment smaller than a page size is guaranteed to be aligned.
  2. It must be non-null. We know this to be true, because mmap does not return null pointers.
  3. It must be dereferencable for the type it points to. The documentation states that if size_of_val(t) > 0, then t must be dereferencable for size_of_val(t) bytes. Here, size_of_val(t) would correspond to the number of elements in the slice, times the size of the element in the slice; len * size_of::<u8>(). Since the allocation was 32 bytes, and size_of::<u8>() == 1, and len is 32, the slice is dereferencable for 32 bytes, satisfying this constraint.
  4. The pointer must point to a valid value of the type it points to. In our case, the pointer points to a u8, so it must point to a valid u8. Thankfully, for u8, there are no invalid bit patterns, so this constraint is satisfied.
  5. Aliasing. Rust has aliasing rules; while a &mut reference exists, no other reference to that data can exist; and while a & reference exists, no other mutable reference to that data can exist. Here, we’re creating the first (mutable) reference to the data; we had not created any other references to that data.
  6. Lifetimes. Rust has rules around lifetimes, but since we’re freely casting into a reference, we have to constrain the lifetime. The reference cannot outlive the allocation. In this case, the reference to map cannot live past point (6), when the allocation is de-allocated. We’ve ensured that by enclosing the reference in a scope that ends before point (6).

That covers our bases! It should be safe to cast this pointer to a reference. We’ll then copy the data into that slice at (4), then at (5) we’ll convert our “C-string” (nul-terminated) to a &str. While we could unsafely cast the pointer to a &str, we’ll instead use the CStr type from the standard library to safely convert the pointer to a &str.

Ok, so far, so good.

Shared File Mapping in Rust

Since the anonymous, private mapping went so well, this should be easy, right? Unfortunately, Rust doesn’t natively have a way to call fallocate, nor to determine the size of the file (apart from seeking to the end of the file and asking for your current position), so we’ll add those to the sys module:

1
// ...previous functions
2
3
use std::os::fd::AsRawFd;
4
5
pub fn fstat<F: AsRawFd>(file: F) -> Result<libc::stat, std::io::Error> {
6
    let mut buf = std::mem::MaybeUninit::zeroed();
7
    let result = unsafe {
8
        libc::fstat(file.as_raw_fd(), buf.as_mut_ptr())
9
    };
10
    if result == -1 {
11
        Err(std::io::Error::last_os_error())
12
    } else {
13
        Ok(unsafe { buf.assume_init() })
14
    }
15
}
16
17
pub fn fallocate<F: AsRawFd>(
18
    file: F,
19
    mode: c_int,
20
    offset: off_t,
21
    len: off_t,
22
) -> Result<(), std::io::Error> {
23
    let result = unsafe {
24
        libc::fallocate(file.as_raw_fd(), mode, offset, len)
25
    };
26
    if result == -1 {
27
        Err(std::io::Error::last_os_error())
28
    } else {
29
        Ok(())
30
    }
31
}

Annoying, but there are crates out there that do this sort of thing for us; we’re avoiding them for the time being.

Now onto the actual display:

1
mod sys;
2
3
use std::ffi::CStr;
4
use std::os::fd::AsRawFd as _;
5
6
#[allow(unsafe_code)]
7
fn main() -> Result<(), Box<dyn std::error::Error>> {
8
    let file = std::fs::OpenOptions::new()
9
        .read(true)
10
        .write(true)
11
        .create(true)
12
        .truncate(true)
13
        .open("example.txt")?; // (1)
14
15
    self::sys::fallocate(file.as_raw_fd(), 0, 0, 32)?; // (2)
16
    let stat = self::sys::fstat(file.as_raw_fd())?; // (3)
17
18
    let map = unsafe {
19
        // (4)
20
        self::sys::mmap(
21
            std::ptr::null_mut(),
22
            usize::try_from(stat.st_size)?,
23
            libc::PROT_READ | libc::PROT_WRITE,
24
            libc::MAP_SHARED,
25
            file.as_raw_fd(),
26
            0,
27
        )
28
    }?;
29
30
    {
31
        let bytes_ptr = map.as_ptr().cast::<u8>();
32
        let slice_ptr = std::ptr::slice_from_raw_parts_mut(bytes_ptr, 32);
33
        let slice = unsafe { &mut *slice_ptr }; // (5)
34
        slice[0..4].copy_from_slice(b"foo\0");
35
        let string = CStr::from_bytes_until_nul(&slice[..])?.to_str()?;
36
        println!("{string}");
37
    }
38
39
    unsafe { self::sys::munmap(map, 32) }?; // (6)
40
    Ok(())
41
}

This is very similar to the previous instance. We open the file at (1), fallocate on (2), load the file size at (3), mmap at (4), mutate at (5), and unmap at (6).

However, this version holds a critical safety issue that the previous variant didn’t. Can you spot it?


The issue in this version is libc::MAP_SHARED. If we read the documentation for MAP_SHARED again:

Share this mapping. Updates to the mapping are visible to other processes mapping the same region, and (in the case of file-backed mappings) are carried through to the underlying file.

In other words, if process A updates their mapping of the file, it should be visible to process B. Or in other words, process A is able to mutate an object in process B’s memory.

Conceptually, we can think of this as the kernel holding a mutable reference to the memory returned by mmap6. While the kernel holds a mutable reference, we cannot create any sort of reference to any part of the memory region[^atomic-memory] - which leaves us with no way to read/write to that memory region in safe Rust!

It’s not just MAP_SHARED, either; if you don’t use MAP_ANONYMOUS, and another process uses the same memory region as your process, it leads to the same problem.

It’s a non-starter.

What Now

Unfortunately, there’s not much of a way around it. If you add invariants around how the file is accessed - use OS abstractions to prevent concurrent fil access, and file permissions, etc. - you can try to convince yourself that it will be fine. But Linux doesn’t have mandatory file locking, and many different concurrency issues could make it easy to trip yourself up. It’s possible to use it as a interprocess communication mechanism, but there are plenty of other forms of interprocess communication that don’t involve so many footguns (UNIX sockets, named/unnamed pipes, TCP/IP sockets)…

Plus, using mmap to map files into memory hides errors that occur when reading or writing to the file - and with very little guarantee that what you’re writing will make it to disk7!

Maybe next time I’ll write about fsync

  1. You probably shouldn’t be messing with this unless you know what you’re doing. While the first parameter does allow you to specify where you want the mapping to be, it’s generally better to let the system choose the address for you - lest you accidentally clobber something else in your address space. And there’s no real good way to determine where everything is in your address space in a way that prevents race conditions. 

  2. Note that mmap doesn’t have to fail only because the OS ran out of memory - it’s entirely possible that even when mmap returns, the OS hasn’t even actually allocated the memory yet. (And when you first access the map, it may trigger an OOM error.) It can fail because too much memory has been mmaped, or you passed invalid data (I love the man docs on this - it says it can return EINVAL if “we don’t like addr, length, or offset”). 

  3. While this is undefined behavior, it may not actually cause a SEGFAULT or SEGBUS. The manual states that it would SEGBUS if we attempt to access a page of the buffer that lies outside of the end of the file, but since our file is so small, it fits in one page. In the Notes section, it specifically states “the remaining bytes in the partial page at the end of the mapping are zeroed when mapped, and modifications to that region are not written out to the file.” 

  4. Something that I haven’t been able to determine is whether or not it does actually need to be writable to construct a &mut reference to it; i.e., whether or not PROT_WRITE is required to hold a &mut reference to it. I suspect, given the declared safety constraints, that it is not actually required - &mut requires that (a) it is dereferencable, and (b) no other mutable reference to the allocation exists. Since, with PROT_READ, we can read from the &mut, I could see the argument that (a) is satisfied, even if you can’t write to it. This ties back into the idea that &mut is really just a way to express an exclusive hold on the memory region, rather than actually mutating the memory region. But since nothing I’ve read clearly indicated it, I’ll refrain from making the assumption. 

  5. Ok, technically, mmap by default clears uninitialized pages - according to the man page, Linux 2.6.33 added the MAP_UNINITIALIZED flag, which causes mmap to not clear the page. This is only respected if the Linux kernel is configured to allow it. Let’s say someone mmaps a random anonymous page, and the kernel sets up a region of RAM to back that page. If the kernel did not clear the page, then the process could read whatever was left in that region - even if that region had previously held secrets that shouldn’t be shared! Thus, the kernel is often configured not to allow it. The manual doesn’t specify what “clearing” means, so it could just mean setting the bytes to an arbitrary value - but most likely it just means setting all of the bytes to null bytes. …which would be a valid bit pattern for a Rust bool. Lucky! 

  6. This… is a little more unclear. The mmap crate, archived in 2022, bypasses this issue by washing its hands of it; it returns a raw pointer for you to deal with. memmap/memmap2 makes creating a file-backed memory map unsafe, but doesn’t say what about it is unsafe. rustix gives you a raw pointer. However, it is a fact that the contents of the memory region can change due to out-of-process (or even in-process) memory accesses, as enabled by the kernel. Assumptions that the memory can’t change if we don’t change it aren’t valid. read_volatile/write_volatile may help (volatile prevents the compiler from optimizing access to memory, especially since the “memory” can change from under us), but it won’t help prevent read/write tears when another process (or our process) changes the region in the file while we’re reading/writing to it. (Unless, of course, you’re doing an atomic read/write). Regardless, it’s unsafe. [^atomic-memory]: Again, this may be fine for certain types - for example, if your target platform guarantees atomic read/writes, then you could cast the pointer to e.g. &[AtomicU8], and load each byte atomically… this can be used to communicate across multiple threads using atomic load/store instructions, but those already have an expectation of interior mutability - which is the issue at hand. 

  7. Unless you use msync(2), but again, this hides any number of I/O errors while trying to persist the changes to disk, and you’ll be none the wiser.