~clay

merely my musings

Getting past ptrace()

with 7 comments

The holidays have given me a chance to relax and geek around with Mac OS X, and I’ve finally gotten around to installing the Developer Tools package, which includes the GNU C compiler (gcc) and the GNU debugger (gdb). Over the years I’ve gotten pretty comfortable using gdb to troubleshoot programs on SPARC and Intel platforms. Debugging requires that you know a bit about assembly language, and I had learned x86 assembly back in the day when I was coding fun little graphics toys for DOS, and had learned some SPARC assembly trying to, uhm, correct an annoying license issue in a piece of commercial software.

During the holiday break I figured I’d learn some PowerPC (PPC) assembly while I still had the chance, given Apple’s decision to move to x86 early next year. Debugging simple programs isn’t much fun, though, so I figured I’d start poking around with a big application. An annoying thing kept happening everytime I fired up the app under the debugger, though; it exited immediately with a strange error code:

% gdb /Applications/blah.app/Contents/MacOS/blah
(gdb) run
Program exited with code 055.

Bummer. I remembered the same problem happening with that commercial Solaris app years before, but I never paid much attention to it back then, because it was possible to work around the problem by attaching to the program after it was already up and running. Apple seems to be a bit smarter than that, though, because whenever I attached to a running copy of the application, GDB seg faulted:

(gdb) attach 17813
Attaching to program: `/Applications/blah.app/Contents/MacOS/blah', process 17813.
Segmentation fault

Since I wanted to know why the application was exiting, I figured I’d step through it one instruction at a time until I found the culprit.

In simple programs, it’s usually sufficient to set a breakpoint on the main() function, but that skips over the C run-time start-up code. Since I wanted to step through each and every instruction in the program, including the start-up code, I used Apple’s otool command to print the application’s text segment, which gave me the address of the first instruction. I’ll demonstrate with /bin/ls:

[satellite:~] clay% otool -tvV /bin/ls | head -5
/bin/ls:
(__TEXT,__text) section
00001ac4 or r26,r1,r1
00001ac8 addi r1,r1,0xfffc
00001acc rlwinm r1,r1,0,0,26
00001ad0 li r0,0x0

This shows that the first instruction in /bin/ls is located at address 0×1ac4, so I set a breakpoint for that and started running the program. Again, using /bin/ls to demonstrate:

(gdb) break *0x1ac4
Breakpoint 1 at 0x1ac4
(gdb) run
Starting program: /bin/ls

Breakpoint 1, 0x00001ac4 in ?? ()

Now it helps to see the instruction that’s about to be executed, so I setup a display that prints the instruction at $pc (the program counter) everytime the program stops:

(gdb) display/i $pc
1: x/i $pc 0x1ac4: mr r26,r1

At first glance it may seem that GDB and otool print different instructions at address 0×1ac4:

otool:   or r26,r1,r1
GDB:     mr r26,r1

GDB is nice and prints user-friendly mnemonics like mr (for move register), which are easier to read and understand than the literal instructions that otool prints, but the instructions are identical.

Now the plan was to step over instruction after instruction, using stepi, until I found the offending bit of code. This was somewhat tedious, so I ended up using nexti quite a bit, which steps over function calls instead of stepping into them, speeding up the debugging. The only problem with that was that sometimes I would step over the function that caused the program to abort. Whenever that happened, I added a breakpoint on the address of the function call and then used stepi to step into the offending function. After several rounds of this (I think I was up to 11 breakpoints) I finally found where the program was stopping:

0x90054204 in ptrace ()
1: x/i $pc 0x90054204 <ptrace +36>: sc
(gdb) stepi
Program exited with code 055.

“sc” is the PPC instruction that invokes a system call. Since GDB is nice enough to print symbols when it can, I decided to see if ptrace is a documented system call in Darwin. Sure enough, it is:


PTRACE(2) BSD System Calls Manual PTRACE(2)

NAME
ptrace -- process tracing and debugging

SYNOPSIS
#include <sys/types.h>
#include <sys/ptrace.h>

int
ptrace(int request, pid_t pid, caddr_t addr, int data);

DESCRIPTION
ptrace() provides tracing and debugging facilities.

Well that sounds interesting, since I was having trouble debugging the process, and the process is calling this system call that provides some type of debugging facility. The man page describes how a debugger (like GDB) might use ptrace() to control another process, but that wasn’t what I was interested in. I was looking for a way a debugged program might be able to control the debugger. Reading on, the man page hinted that this might be possible:


[...] except for one special case noted below, all ptrace() calls are made by the tracing process [...]

Hmm, does that one special case allow a traced process to invoke ptrace() to control a debugger? Sure enough, it does:


PT_DENY_ATTACH

This request is the other operation used by the traced process; it allows a process that is not currently being traced to deny future traces by its parent. All other arguments are ignored. If the process is currently being traced, it will exit with the exit status of ENOTSUP; otherwise, it sets a flag that denies future traces. An attempt by the parent to trace a process which has set this flag will result in a segmentation violation in the parent.

Bingo! This explains both the seg fault and the exit immediately after startup. So now that I knew why the program wasn’t cooperating with GDB, I could coerce it into playing nicely. First, a disassembly of the ptrace() function:

(gdb) disas
Dump of assembler code for function ptrace:
0x900541e0 <ptrace +0>: li r7,0
0x900541e4 <ptrace +4>: mflr r0
0x900541e8 <ptrace +8>: bcl- 20,4*cr7+so,0x900541ec <ptrace +12>
0x900541ec <ptrace +12>: mflr r8
0x900541f0 <ptrace +16>: mtlr r0
0x900541f4 <ptrace +20>: addis r8,r8,4091
0x900541f8 <ptrace +24>: lwz r8,7860(r8)
0x900541fc <ptrace +28>: stw r7,0(r8)
0x90054200 <ptrace +32>: li r0,26
0x90054204 <ptrace +36>: sc
0x90054208 <ptrace +40>: b 0x90054210 <ptrace +48>
0x9005420c <ptrace +44>: b 0x90054230 <ptrace +80>
0x90054210 <ptrace +48>: mflr r0
0x90054214 <ptrace +52>: bcl- 20,4*cr7+so,0x90054218 <ptrace +56>
0x90054218 <ptrace +56>: mflr r12
0x9005421c <ptrace +60>: mtlr r0
0x90054220 <ptrace +64>: addis r12,r12,4091
0x90054224 <ptrace +68>: lwz r12,7792(r12)
0x90054228 <ptrace +72>: mtctr r12
0x9005422c <ptrace +76>: bctr
0x90054230 <ptrace +80>: blr
End of assembler dump.

Now I don’t know exactly what all of that means, but I saw the “sc” instruction in the middle of the function and I knew I wanted to avoid making that system call. The obvious thing to try was to set a breakpoint on function entry and then use the jump command to jump right to the last instruction in the function, effectively bypassing that pesky system call:

(gdb) break ptrace
Breakpoint 11 at 0x900541f4
(gdb) run
Starting program: /Applications/blah.app/Contents/MacOS/blah

Breakpoint 11, 0x900541f4 in ptrace ()
(gdb) jump *0x90054230
Continuing at 0x90054230.

When I did that, breakpoint 11 fired again, so I did the jump again, and breakpoint 11 fired again, so, well, the short story is that I had to do the jump 6 times before the program continued on. Maybe the developers were paranoid and called ptrace() six times. Whatever the reason, after finally getting past all six ptrace()s, the program started up just fine:

(gdb) jump *0x90054230
Continuing at 0x90054230.
Reading symbols for shared libraries . done

The program ran as normal, under control of the debugger. Now that I know how to get it to run under GDB I can start poking around with some of the more interesting bits of code.

I hope everyone’s having a nice holiday, and I promise to post more often! :)

Written by clay

December 26th, 2005 at 6:38 am

Posted in Geek

Tagged with , , ,