Evan A. Sultanik, Ph.D.

Evan's First Name @ Sultanik .com

Computer Security Researcher
Trail of Bits

Adjunct Professor
Drexel University College of Computing & Informatics
Department of Computer Science

Recent Content:

iPhone Toolchain on Linux

A Tutorial

I have an iPhone. I also use Google Mail as my web-based mail client. Unfortunately, there is really no good way to get push Gmail on an iPhone. Even now, post firmware 3.0, these are the best ways:
  1. Pay for a service like MobileMe.
    Problem: service fees seem like overkill, and the push E-mail would be the only benefit I’d get from it.
  2. Wrap Gmail’s IMAP service in an exchange server. There are some paid services that do this, however, Z-Push is free (if one can host it one’s self).
    Problem: the iPhone only supports a single Exchange server at a time. Therefore, I’d have to choose between getting push E-mail versus over-the-air calendar/contacts synchronization that is currently provided through Google’s own “Sync” Exchange wrapper.
  3. Write an app that uses the new Push Notification service in firmware 3.0 to remotely push mail to the phone.
    Problem: this would probably be a very lucrative solution (i.e., I’ll bet lots of people would pay a nominal one-time fee for this app), but it would almost definitely be rejected from the App Store. Furthermore, it would require me to set up a back-end server running 24/7 to push the updates.
  4. Jailbreak the phone and write a daemon that runs in the background, connects to Google’s IMAP service, and goes into IDLE mode.
    Problem: the only Apple device I own is my iPhone; how might I compile my own apps for it? (Sure, my wife does have a PowerBook, but that would be cheating, right? Right‽)
Read on to discover how I was able to set up an iPhone development toolchain on Linux.

File Drop

Computer-to-Computer File Transfer for the Masses

Continuing the recent theme of posting-random-scripts-as-blog-entries…

I recently needed a quick and dirty way to send a really large (~1 gigabyte) file to someone. We were both on the same LAN, so it didn’t really make sense for me to upload it to my externally hosted web server. I do not have a web server installed on my laptop and, at the time, it seemed like overkill to install a web server just so I could send him my file. Using a thumb drive or scp would have been an option, but each would require the recipient to be physically at my computer (despite being on the same LAN, he was a 10 minute walk away). Therefore, I gave myself a 10 minute deadline to code my own solution (plus it would be a fun diversion from writing my journal paper due later that day).

Given that I had a whole 10 minutes (an eternity when it comes to Perl hacking), I figured I might as well make my method generalized (i.e., not only should my script be able to send files, but it should also be able to receive).

First, I had to decide on a method. FTP seemed like a logical choice, but, besides really tech savvy people, who has full-blown FTP clients installed these days? In keeping with my generality goal, my solution would ideally be usable by, say, my mom. And moms don’t know ‘bout my FTP. Everyone, my mom (and my mom’s mom) included, has a web browser and knows how to use it. Therefore, good ol’ HTTP it was. And I even had a bunch of old code to hack together!

I ended up with a script that I call filedrop. Here’s the usage:

$ filedrop
Version: filedrop 0.1 2009-07-01 http://www.sultanik.com/
Copyright (C) 2009 Evan A. Sultanik

Usage: filedrop [OPTIONS] FILE_PATH

  -s           send a file by hosting it on a local web server (default)
  -r           receive a file by accepting it from a local web server.
               FILE_PATH should be a directory to which the files should be
               saved.  FILE_PATH will default to ‘.’ in this mode.
  -n, --num=N  quit after sending/receiving N files.  If N is less than zero
               the program will send/receive files until manually
               terminated.  If N is zero then the program will immediately
               quit.  Default is -1.

And here’s an example of how the file transfer went down:

LeEtH4X0r: Y0 Home Skillet! Can you fry me up some juarez‽
Me: Indubitably!
$ filedrop -s -n1 ./hugefile.tar.gz
Server running at: http://my_ip:47489/
Me: Go to http://my_ip:47489/

Here’s the code:

#!/usr/bin/perl -w

use HTTP::Daemon;
use HTTP::Status;

my $version = “0.1”;
my $date = “2009-07-01”;
my $copyright = “2009”;

my $port = 80;

sub print_usage {
    print “Version: filedrop $version $date http://www.sultanik.com/\n”;
    print “Copyright (C) $copyright Evan A. Sultanik\n\n”;
    print “Usage: filedrop [OPTIONS] FILE_PATH\n\n”;
    print “Options:\n”;
    print “  -s           send a file by hosting it on a local web server (default)\n”;
    print “  -r           receive a file by accepting it from a local web server.\n”;
    print “               FILE_PATH should be a directory to which the files should be\n”;
    print “               saved.  FILE_PATH will default to ‘.’ in this mode.\n”;
    print “  -n, --num=N  quit after sending/receiving N files.  If N is less than zero\n”;
    print “               the program will send/receive files until manually\n”;
    print “               terminated.  If N is zero then the program will immediately\n”;
    print “               quit.  Default is -1.\n”;
    print “\n”;

my $mode = “s”;
my $num = -1;

my $last = “”;
my $nextIsN = 0;
foreach my $arg (@ARGV) {
    if($arg eq “-s”) {
        $mode = “s”;
    } elsif($arg eq “-r”) {
        $mode = “r”;
    } elsif($arg eq “-n”) {
        $nextIsN = 1;
    } elsif($arg =~ /-n(\d+)/) {
        $num = $1;
    } elsif($arg =~ m/--num=(\d+)/) {
        $num = $1;
    } elsif($nextIsN) {
        $num = $arg;
    } else {
        if(!($last eq “”)) {
            print_usage() && die(”Invalid option: “ . $last . “\n”);
        $last = $arg;
    $nextIsN = 0;
if($last eq “” && $mode eq “s”) {
    print_usage() && die(”Path to a file to host expected!\n”);
} elsif($last eq “” && $mode eq “r”) {
    $last = “.”;

my $file = $last;

exit(0) if($num == 0);

my $d = HTTP::Daemon->new(LocalPort => $port) || HTTP::Daemon->new() || die;
print “Server running at: “, $d->url, “\n”;
my $servings = 0;
while(my $c = $d->accept) {
    while(my $r = $c->get_request) {
        if($mode eq “s”) {
            if($r->method eq ‘GET’) {
                print “Someone’s downloading!\n”;
                print “Download finished!\n”;
            } else {
        } elsif($mode eq “r”) {
            if($r->method eq ‘POST’) {
                print “Someone is uploading!\n”;
                my $url = $r->content;
                while($url =~ m/.*?-+(\d+)\r\nContent-Disposition:.*? filename=”([^”]+)”.*?\r\n\r\n(.*?)\r\n-+\1-+(.*)$/ism){
                    my $id = $1;
                    my $filename = $2;
                    my $content = $3;
                    $url = $4;
                    my $newName = $filename;
                    my $i = 0;
                    $newName = $filename . “.” . ++$i while(-e $file . “/” . $newName);
                    if($i > 0) {
                        print “A file of named $filename already exists in $file!\n”;
                        print “Saving to “ . $file . “/” . $newName . “ instead.\n”;
                        $filename = $newName;
                    open(OUTFILE,”>” . $file . “/” . $filename) or die(”Error opening $file/$filename for writing!\n”);
                    binmode OUTFILE;
                    print OUTFILE $content;
                    print “Received $filename (ID: $id)\n”;                    
                $h = HTTP::Headers->new;
                $h->header(’Content-Type’ => ‘text/html’);
                my $msg = “Uploaded


”; $msg .= “

Click here to upload another file.

” if($num < 0 || $servings < $num); $msg .= “”; $r = HTTP::Response->new( HTTP_OK, “”, $h, $msg); $c->send_response($r); } elsif($r->method eq ‘GET’) { print “Someone connected! Sending the upload form...\n”; $h = HTTP::Headers->new; $h->header(’Content-Type’ => ‘text/html’); $r = HTTP::Response->new( HTTP_OK, “”, $h, “Upload

Please specify a file, or a set of files:

“); $c->send_response($r); print “Sent!\n”; } else { $c->send_error(RC_FORBIDDEN); } last if($num > 0 && $servings >= $num); } last if($num > 0 && $servings >= $num); } $c->close; undef($c); last if($num > 0 && $servings >= $num); } close($d);

Mail Notifier

Gmail Notifications in Linux

Screenshot of the notifier notifying.

An example of the notifier, well, notifying.

I recently caught a glimpse of how Gmail Notifier works on a friend’s Mac. It looked pretty cool. Unfortunately for me, though, there’s no reasonable facsimile in Linux. Sure, there are a couple options, but they aren’t available in Gentoo’s package management system. Given my recent experience dealing with E-mail from Perl, I figured it would be just as easy to write my own E-mail notifier as it would be to manually install these programs (along with their dependencies). I was right. I just spent the last ~20 minutes (while idling through a meeting) writing such an app. The code follows below. Its only dependency is XOSD.

Disclaimer: I blatantly cribbed some of my code from Flavio Poletti (for the MTA stuff) and Bill Luebkert (for the password input).

Future work: right now the code simply polls the mail server once every three minutes. In the future I’ll post an update that uses IMAP Idle to reduce bandwidth.

#!/usr/bin/perl -w

use Term::ReadKey;	END { ReadMode (’restore’); }	# just in case
use Mail::IMAPClient;
use IO::Socket::SSL;
use File::HomeDir;

my $username = ‘youremail@domain.com’;
my $sleeptime = 180; # Time between checks, in seconds.
my $conffile = File::HomeDir->my_home . “/.checkmail”;


$canceled = 0;
$inwhile = 0;

sub get_passwd {
    # legal clear passwd chrs (26+26+10+24=86): “a-zA-Z0-9!#$%&()*+,-./:;<=> ?@[\]^”;
    my @legal_clear = (’a’..’z’, ‘A’..’Z’, ‘0’..’9’, split //,
                       ‘!#$%&()*+,-./:;<=> ?@[\]^’);
    my %legal_clear; foreach (@legal_clear) { $legal_clear{$_} = 1; }
    $| = 1;	# unbuffer stdout to force unterminated line out
    ReadMode (’cbreak’);
    my $ch = ‘’;
    while (defined ($ch = ReadKey ())) {
	last if $ch eq “\x0D” or $ch eq “\x0A”;
	if ($ch eq “\x08”) {	# backspace
            print “\b \b” if $passwd;	# back up 1
            chop $passwd;
	if ($ch eq “\x15”) {	# ^U
            print “\b \b” x length $passwd;	# back 1 for each char
            $passwd = ‘’;
	if (not exists $legal_clear{$ch}) {
            print “\n’$ch’ not a legal password character\n”;
            print ‘Password: ‘;
	$passwd .= $ch;
    print “\n”;
    ReadMode (’restore’);
    return $passwd;

$SIG{’INT’} = ‘INT_handler’;

sub INT_handler {
    exit(0) if(!$inwhile);
    $canceled = 1;
    print “\nCaught Signal; exiting gracefully!\n”;

print “Password: “;
my $password = &get_passwd();

while(!$canceled) {
    $inwhile = 1;

    my $socket = IO::Socket::SSL->new(
        PeerAddr => ‘imap.gmail.com’,
        PeerPort => 993,
        or (print STDERR “Warning: lost internet connection!\n” && next); # Perhaps we lost the internet connection?
    my $greeting = <$socket>;
    my ($id, $answer) = split /\s+/, $greeting;
    die “problems logging in: $greeting” if $answer ne ‘OK’;

    my $client = Mail::IMAPClient->new(
        Socket   => $socket,
        User     => $username,
        Password => $password,
        Uid => 1,
        or die “new(): $@”;
    $client->login() or die ‘login(): ‘ . $client->LastError();

    die(”Failed authentication!\n”) unless $client->IsAuthenticated();

    $client->examine(’INBOX’) or die “Could not examine: $@\n”;
    my @msgs = $client->unseen or die “Could not search the inbox! $@\n”;

    my $last_max = -2;
    if(-e $conffile) {
        # Load the old largest
        open(CONFFILE, “<” . $conffile) or die(”Error opening “ . $conffile . “\n”);
        while() {
            my $line = $_;
            $last_max = $1 if($line =~ /^\s*last_max_uid\s*=\s*(\d+)\s*$/i);

    my $max = -1;
    my @over;
    for my $msg (@msgs) {
        $max = $msg if $msg > $max;
        push(@over, $msg) if $msg > $last_max;

    if($max >= 0) {
        open(CONFFILE, “>” . $conffile) or die(”Error opening $conffile for writing!\n”);
        print CONFFILE “last_max_uid = “ . $max . “\n”;

    if($last_max >= 0) {
        open(OSDC, “| osd_cat -c green -p middle -A center -s 2 -l 5 -f \”-bitstream-bitstream vera serif-*-*-*-*-17-*-*-*-*-*-*-*\””);
        for my $m (@over) {
            my $hashref = $client->parse_headers($m, “From”)
                or die “Could not parse_headers: $@\n”;
            print OSDC “New mail from “ . $hashref->{”From”}->[0] . “!\n”;

    sleep $sleeptime;

Awaiting Death

In which I coerce processes to email me as they die.

I’ve been running a number of experiments recently that require a lot of computing time. “A lot” in this case being on the order of days. It would therefore be nice to have a script that would automatically E-mail me when my experiments finish so I know to check the results. I fully expected there to be some magic shell script out there somewhere dedicated to this very purpose: sending out an E-mail when a specified process dies. Something like this:

$ ./run_experiments&
[1] 1337
$ emailwhendone 1337
Awaiting process 1337’s death...

As far as I can tell, however, there is no such script/program. So, as usual, I took it upon myself to write my own. The E-mailing part turned out to be a bit trickier than I had expected.

I didn’t want my script to be dependent on the existence of a local mail server; therefore, I first tried using sSMTP. It turns out that sSMTP requires one to hard-code the remote SMTP server address in a .conf file, so that approach was out.

Next I tried Mail::Sendmail, however, that module’s support for authentication is poor at best. That module also doesn’t support SSL, so emailing through servers like Google Mail is out.

Therefore, I finally settled on using Net::SMTP::SSL, which unfortunately has four dependencies. Luckily for me, those dependencies are all easily installable on Gentoo:

  1. dev-perl/Authen-SASL
  2. dev-perl/IO-Socket-SSL
  3. dev-perl/Net-SSLeay
  4. dev-perl/Net-SMTP-SSL

I call my script emailwhendone because, well, that’s exactly what it does. The code follows at the end of this post.

Disclaimer:Robert Maldon (for the MTA stuff) and Bill Luebkert (for the password input).

The script can be given one of two parameters: either the PID of the process for which to wait or the unique name of the process (if there are multiple processes with the same name you will need to use the PID). Right now I have the recipient E-mail address hard-coded; it should be fairly self evident from the code how to customize this. Here’s an example:

$ ./run_experiments&
[1] 1337
$ emailwhendone 1337
Password for youremail@domain.com: *******************
Waiting for process 1337 (run_experiments) to finish...
The process finished!
Sending an email to youremail@domain.com...

Here’s the code:

#!/usr/bin/perl -w

use Net::SMTP::SSL;
use Term::ReadKey;	END { ReadMode (’restore’); }	# just in case

my $destination = ‘youremail@domain.com’;
my $server = ‘smtp.domain.com’;
my $port = 465;


sub usage {
    print “ Usage: emailwhendone [PID|PROCESS_NAME]\n”;

my $pid = $ARGV[0] or die &usage();
my $hostname = `hostname`;
my $pidmatch = -1;
my $processmatch = “”;
my @pidmatches;

open PRO, “/bin/ps axo pid,comm |” or die ‘Failed to open pipe to `ps`’;

while() {
    if($_ =~ m/^\s*(\d+)\s+(.+)$/) {
        my $matchpid = $1;
        my $matchprocess = $2;
        if($matchpid eq $pid) {
            $pidmatch = $matchpid;
            $processmatch = $matchprocess;
            @pidmatches = [$matchpid];
        } elsif($pid =~ m/^\s*$matchprocess\s*$/) {
            $pidmatch = $matchpid;
            push(@pidmatches, $matchpid);
            $processmatch = $matchprocess;

close PRO;

if(scalar(@pidmatches) <= 0) {
    if($pid =~ m/^\s*\d+\s*$/) {
        print “Error: no process with ID “ . $pid . “!\n”;
    } else {
        print “Error: no process named \”” . $pid . “\”!\n”;
} elsif(scalar(@pidmatches) > 1) {
    print “There are multiple PIDs that match this process name!\n”;
    for my $match (@pidmatches) {
        print $match . “\t” . $pid . “\n”;

sub get_passwd {
    # legal clear passwd chrs (26+26+10+24=86): “a-zA-Z0-9!#$%&()*+,-./:;<=> ?@[\]^”;
    my @legal_clear = (’a’..’z’, ‘A’..’Z’, ‘0’..’9’, split //,
                       ‘!#$%&()*+,-./:;<=> ?@[\]^’);
    my %legal_clear; foreach (@legal_clear) { $legal_clear{$_} = 1; }
    $| = 1;	# unbuffer stdout to force unterminated line out
    ReadMode (’cbreak’);
    my $ch = ‘’;
    while (defined ($ch = ReadKey ())) {
	last if $ch eq “\x0D” or $ch eq “\x0A”;
	if ($ch eq “\x08”) {	# backspace
            print “\b \b” if $passwd;	# back up 1
            chop $passwd;
	if ($ch eq “\x15”) {	# ^U
            print “\b \b” x length $passwd;	# back 1 for each char
            $passwd = ‘’;
	if (not exists $legal_clear{$ch}) {
            print “\n’$ch’ not a legal password character\n”;
            print ‘Password: ‘, “*” x length $passwd; # retype *’s
	$passwd .= $ch;
	print ‘*’;
    print “\n”;
    ReadMode (’restore’);
    return $passwd;

print “Password for “ . $destination . “: “;
my $password = get_passwd();

sub send_mail {
    my $subject = $_[0];
    my $body = $_[1];
    my $smtp;

    if (not $smtp = Net::SMTP::SSL->new($server,
                                        Port => $port,
                                        Debug => 0)) {
        die “Could not connect to server.\n”;

    $smtp->auth($destination, $password)
        || die “Authentication failed!\n”;

    $smtp->mail($destination . “\n”);
    $smtp->to($destination . “\n”);
    $smtp->datasend(”From: “ . $destination . “\n”);
    $smtp->datasend(”To: “ . $destination . “\n”);
    $smtp->datasend(”Subject: “ . $subject . “\n”);
    $smtp->datasend($body . “\n”);

print “Waiting for process “ . $pidmatch . “ (” . $processmatch . “) to finish...”;

my $done = 0;
do {
    $done = 1;
    open PRO, “/bin/ps axo pid |” or die ‘Failed to open pipe to `ps`’;
    while() {
        if($_ =~ m/^\s*$pidmatch\s*$/) {
            $done = 0;
    close PRO;
} while(!$done);

print “The process finished!\nSending an email to “ . $destination . “...”;

&send_mail(’Process ‘ . $pidmatch . ‘ (’ . $processmatch . ‘) on ‘ . $hostname . ‘ finished!’, ‘It\’s done!’);

print “\n”;

Vizualizing Twitter

Journey to the Center of the Twitterverse

I’ve now been using Twitter for about six months. While Twitter’s minimalism is no doubt responsible for much of its success, I often pine for some additional social networking features. High up on that list is a simple way of representing my closest neighbors—perhaps through a visualization—without having to manually navigate individual users’ following/followers pages. A well designed representation could be useful in a number of ways:

  1. It could expose previously unknown mutual relationships (i.e., “Wow, I didn’t know X and Y knew each other!);
  2. It could reveal mutual acquaintances whom one did not know were on Twitter; and
  3. Metrics on the social network could be aggregated (e.g., degrees of separation).
This afternoon I spent an hour or so hacking together a Python script, which I have dubbed TwitterGraph, to accomplish this. Here is an example of the ~100 people nearest to me in the network:

The code for TwitterGraph follows at the end of this post. The code depends on the simplejson module and also imagemagick. It uses the Twitter API to construct the network graph. You don’t need to have a Twitter account for this to work; it doesn’t require authentication. Each IP is, however, limited to 100 API calls per hour, unless your IP has been whitelisted. My script takes this into account. Each Twitter user requires three API to download their information, so one can load about 33 users per hour before reaching the rate limit. TwitterGraph saves its data, so successive calls will continue off where it previously left. Finally, TwitterGraph also calculates the PageRank algorithm).

Usage: paste the code below into TwitterGraph.py and run the following:

$ chmod 755 ./TwitterGraph.py
$ ./TwitterGraph.py
You have 100 API calls remaining this hour; how many would you like to use now? 80
What is the twitter username for which you’d like to build a graph? ESultanik
Building the graph for ESultanik (output will be ESultanik.dot)...
$ dot -Tps ESultanik.dot -o ESultanik.ps && epstopdf ESultanik.ps && acroread ESultanik.pdf
$ dot -Tsvgz ESultanik.dot -o ESultanik.svgz

There are also (unnecessary) command line options, the usage for which should be evident from the sourcecode.


import simplejson
import urllib2
import urllib
import getopt, sys
import re
import os

class TwitterError(Exception):
  def message(self):
    return self.args[0]

def CheckForTwitterError(data):
    if ‘error’ in data:
      raise TwitterError(data[’error’])

def fetch_url(url):
    opener = urllib2.build_opener()
    url_data = opener.open(url).read()
    return url_data

def remaining_api_hits():
    json = fetch_url(”http://twitter.com/account/rate_limit_status.json”)
    data = simplejson.loads(json)
    return data[’remaining_hits’]

def get_user_info(id):
    global is_username
    global calls
    json = None
    calls += 1
    if is_username:
        json = fetch_url(”http://twitter.com/users/show.json?screen_name=” + str(id))
        json = fetch_url(”http://twitter.com/users/show.json?user_id=” + str(id))
    data = simplejson.loads(json)
    return data

def get_friends(id):
    global calls
    calls += 1
    json = fetch_url(”http://twitter.com/friends/ids.json?user_id=” + str(id))
    data = simplejson.loads(json)
    return data

def get_followers(id):
    global calls
    calls += 1
    json = fetch_url(”http://twitter.com/followers/ids.json?user_id=” + str(id))
    data = simplejson.loads(json)
    return data

last_status_msg = “”
def update_status(message):
    global last_status_msg
    # clear the last message
    p = re.compile(r”[^\s]”)
    sys.stdout.write(p.sub(’ ‘, last_status_msg))
    last_status_msg = message

def clear_status():
    last_status_msg = “”

def save_state():
    global history
    global user_info
    global friends
    global followers
    global queue
    global username
    data = simplejson.dumps([history, user_info, friends, followers, queue])
    bakfile = open(username + “.json”, “w”)

def build_adjacency():
    global friends
    idxes = {}
    adj = []
    idx = 0
    for user in friends:
        idxes[user] = idx
        idx += 1
    for user in friends:
        if len(friends[user]) <= 0:
        amount_to_give = 1.0 / len(friends[user])
        for f in friends[user]:
            if str(f) in idxes:
                adj[idxes[user]][idxes[str(f)]] = amount_to_give
    return [idxes, adj]

    opts, args = getopt.getopt(sys.argv[1:], “hu:c:r”, [”help”, “user=”, “calls=”, “resume”])
except getopt.GetoptError, err:
    print err

max_calls = -1
username = “”
load_prev = None

for o, a in opts:
    if o in (”-h”, “--help”):
    elif o in (”-u”, “--user”):
        username = a
    elif o in (”-c”, “--calls”):
        max_calls = int(a)
    elif o in (”-r”, “--resume”):
        load_prev = True
        assert False, “unhandled option”

if max_calls != 0:
    # First, let’s find out how many API calls we have left before we are rate limited:
    update_status(”Contacting Twitter to see how many API calls are left on your account...”)
    max_hits = remaining_api_hits()
    if max_calls < 0 or max_hits < max_calls:
        update_status(”You have “ + str(max_hits) + “ API calls remaining this hour; how many would you like to use now? “)
        max_calls = int(raw_input())
        if max_calls > max_hits:
            max_calls = max_hits
if username == “”:
    print “What is the twitter username for which you’d like to build a graph? “,
    username = re.compile(r”\n”).sub(””, raw_input())

update_status(”Trying to open “ + username + “.dot for output...”)
dotfile = open(username + “.dot”, “w”)
print “Building the graph for “ + username + “ (output will be “ + username + “.dot)...”

is_username = True
history = {}
queue = [username]
calls = 0
user_info = {}
friends = {}
followers = {}

# Let’s see if there’s any partial data...
if os.path.isfile(username + “.json”):
    print “It appears as if you have some partial data for this user.”
    resume = “”
    if not load_prev:
        print “Do you want to start off from where you last finished? (y/n) “,
        resume = re.compile(r”\n”).sub(””, raw_input())
    if load_prev == True or resume == “y” or resume == “Y” or resume == “yes” or resume == “Yes” or resume == “YES”:
        is_username = False
        bakfile = open(username + “.json”, “r”)
        [history, user_info, friends, followers, queue] = simplejson.loads(bakfile.read())
        print str(len(friends)) + “ friends!”
        print “Loaded “ + str(len(history)) + “ previous Twitterers!”
        print “The current queue size is “ + str(len(queue)) + “.”
        print “You are about to overwrite the partial data; are you sure? (y/n) “,
        resume = re.compile(r”\n”).sub(””, raw_input())
        if not (resume == “y” or resume == “Y” or resume == “yes” or resume == “Yes” or resume == “YES”):

while len(queue) > 0 and calls + 3 <= max_calls:
    next_user = queue.pop(0)
    # Let’s just double-check that we haven’t already processed this user!
    if str(next_user) in history:
    update_status(str(next_user) + “\t(? Followers,\t? Following)\tQueue Size: “ + str(len(queue)))
    if next_user in user_info:
        info = user_info[next_user]
            info = get_user_info(next_user)
        except urllib2.HTTPError:
            update_status(”It appears as if user “ + str(next_user) + “’s account has been suspended!”)
            print “”
    uid = next_user
    if is_username:
        uid = info[’id’]
        history[uid] = True
        is_username = False
    user_info[uid] = info
    update_status(info[’screen_name’] + “\t(? Followers,\t? Following)\tQueue Size: “ + str(len(queue)))
    followers[uid] = get_followers(uid)
    for i in followers[uid]:
        if str(i) not in history:
            history[i] = True
    update_status(info[’screen_name’] + “\t(” + str(len(followers[uid])) + “ Followers,\t? Following)\tQueue Size: “ + str(len(queue)))
    friends[uid] = get_friends(uid)
    for i in friends[uid]:
        if str(i) not in history:
            history[i] = True
    update_status(info[’screen_name’] + “\t(” + str(len(followers[uid])) + “ Followers,\t” + str(len(friends[uid])) + “ Following)”)

# Get some extra user info if we have any API calls remaining
# Find someone in the history for whom we haven’t downloaded user info
for user in history:
    if calls >= max_calls:
    if not user in user_info:
            user_info[user] = get_user_info(user)
        except urllib2.HTTPError:
            # This almost always means the user’s account has been disabled!

if calls > 0:

# Now download any user profile pictures that we might be missing...
update_status(”Downloading missing user profile pictures...”)
if not os.path.isdir(username + “.images”):
    os.mkdir(username + “.images”)
user_image_raw = {}
for u in friends:
    _, _, filetype = user_info[u][’profile_image_url’].rpartition(”.”)
    filename = username + “.images/” + str(u) + “.” + filetype
    user_image_raw[u] = filename
    if not os.path.isfile(filename):
        # we need to download the file!
        update_status(”Downloading missing user profile picture for “ + user_info[u][’screen_name’] + “...”)
        urllib.urlretrieve(user_info[u][’profile_image_url’], filename)
update_status(”Profile pictures are up to date!”)
print “”

# Now scale the profile pictures
update_status(”Scaling profile pictures...”)
user_image = {}
for u in friends:
    _, _, filetype = user_info[u][’profile_image_url’].rpartition(”.”)
    filename = username + “.images/” + str(u) + “.scaled.” + filetype
    user_image[u] = filename
    if not os.path.isfile(filename):
        # we need to scale the image!
        update_status(”Scaling profile picture for “ + user_info[u][’screen_name’] + “...”)
        os.system(”convert -resize 48x48 “ + user_image_raw[u] + “ “ + user_image[u])
update_status(”Profile pictures are all scaled!”)
print “”

update_status(”Building the adjacency matrix...”)
[idxes, adj] = build_adjacency()
print “”
update_status(”Calculating the stationary distribution...”)
iterations = 500
damping_factor = 0.25
st = [1.0]*len(friends)
last_percent = -1
for i in range(iterations):
    users = 0
    for u in friends:
        users += 1
        percent = round(float(i * len(friends) + users) / float(iterations * len(friends)) * 100.0, 1)
        if percent > last_percent:
            last_percent = percent
            update_status(”Calculating the stationary distribution... “ + str(percent) + “%”)
        idx = idxes[str(u)]
        given_away = 0.0
        give_away = st[idx] * (1.0 - damping_factor)
        if give_away <= 0.0:
        for f in friends[u]:
            if str(f) not in friends:
            fidx = idxes[str(f)]
            ga = adj[idx][fidx] * give_away
            given_away += ga
            st[fidx] += ga
        st[idx] -= given_away
print “”
# Now calculate the ranks of the users
deco = [ (st[idxes[u]], i, u) for i, u in enumerate(friends.keys()) ]
rank = {}
last_st = None
last_rank = 1
for st, _, u in deco:
    if last_st == None:
        rank[u] = 1
    elif st == last_st:
        rank[u] = last_rank
        rank[u] = last_rank + 1
    last_rank = rank[u]
    last_st = st
    print user_info[u][’screen_name’] + “\t” + str(rank[u])

update_status(”Generating the .dot file...”)

# Now generate the .dot file
dotfile.write(”digraph twitter {\n”)
dotfile.write(”  /* A TwitterGraph automatically generated by Evan Sultanik’s Python script! */\n”)
dotfile.write(”  /* http://www.sultanik.com/                                                 */\n”)
for user in friends:
    dotfile.write(”  n” + str(user) + “ [label=< ”)
“ + user_info[user][’name’]) if not (user_info[user][’name’] == user_info[user][’screen_name’]): dotfile.write(”
(” + user_info[user][’screen_name’] + “)”) dotfile.write(”
Rank: “ + str(rank[user]) + “
>”); if user_info[user][’screen_name’] == username: dotfile.write(” color=\”green\” shape=\”doubleoctagon\””) dotfile.write(”];\n”) dotfile.write(”\n”) for user in friends: for f in friends[user]: if str(f) in friends: dotfile.write(” n” + str(user) + “ -> “ + “ n” + str(f) + “;\n”) dotfile.write(”}\n”) dotfile.close() print “” clear_status()


In which Evan and Joe teach you how to make beautiful documents.

Earlier today, Joe Kopena and I once again presented our tag-team LATEX talk. Not familiar with LATEX? Why not read the Wikipedia article! It’s essentially a professional grade system for beautifully typesetting documents/books. There are various books and Internet tutorials that do a fairly good job of introducing the basics, so, in our talk, Joe and I cover some more advanced topics and also general typesetting snags that novices often encounter. We always get requests for our slides after each of our talks, so I figured I’d post them online (which is the purpose of this blog entry).

Note that the entire presentation was created in LATEX using Beamer. You may also want to read my notes on BIBTEX, which will eventually become a part of our talk. You can read some of Joe’s notes on LATEX on his personal wiki, here. Feel free to browse and/or post any of your general typesetting questions to this public mailing list.

On the Economics of Higher Education

In which I apply flimsy math and hand-waving to justify the time I’ve wasted in school.

There has been much “messaging on twitter” [sic] and “posting to blogs” [sic] of late regarding the economic benefit of pursuing a graduate degree in Computer Science. For example, there are claims, among other things, that a masters degree will require 10 years to earn back the income lost during study. A Ph.D. will require a staggering 50 years. Most everything I’ve read cites this article based upon Dr. Norman Matloff’s testimony to the U.S. House Judiciary Committee Subcommittee on Immigration. Curiously, the article everyone seems to cite does not itself have a bibliography. It does, however, credit “a highly biased pro-industry National Research Council committee” for calculating these numbers. Five to ten minutes of “searching on Google” [sic] and I was unable to find a report from the National Research Council corroborating such a claim. Can anyone point me to a link?

I do not dispute that these numbers may be correct; the purpose of this blog entry is to point out that, at least in the case of most with whom I’ve matriculated, it is flat out false.

Here is my (admittedly simple) mathematical model:

$n=\frac{t ( E[s_w] + c )}{E[s_a]-E[s_w]},$
  • $t$ is the number of years spent in school;
  • $E[s_w]$ is the expected salary one would have earned if one did not attend school;
  • $c$ is the net monetary cost of attending school per year, such as tuition paid, books purchased, &c. This value should also take into account any income earned during a school year (e.g., one’s stipend) and in many cases will be a negative number;
  • $E[s_a]$ one’s expected salary after graduating school; and
  • $n$ is the number of years one would have to work after graduating to make up for lost income.

Note that this model does not take attrition into account.

As an example, let’s say John is a Ph.D. student who, through a research assistantship, receives tuition remission and a stipend of $20,000 a year. This is quite reasonable (and actually a bit conservative according to this study). If John had not chosen to pursue a Ph.D. he would have been hired in a $65k entry level position, which is slightly on the high end. Once he has graduated (in the quite average term of five years), he expects to receive a salary of $85k which, according to this survey is on the low end. We also, however, have to account for taxes! From my own experience and from consulting virtually every graduate student I know, John will receive a refund for practically all of the money taxed from his income. Without going to school, John would be in the 25% tax bracket, with a normalized income of about $52k (taking the tiered bracketing system into account). After earning his Ph.D. John would have a normalized income of about $67k. Plugging these values into the model we get:

$n=\frac{5 \times ( 52 + (-20) )}{67-52} \approx 11.$
Therefore, John will require about 11 years to recoup the income lost during school.

I think I was relatively conservative with my income estimates, and that’s still a lot less time than 50 years! I plugged in my own stats/estimates into the model and I project that I will need fewer than five years (and I don’t even make as much as some other students I know)! Furthermore, with a Ph.D., John has theoretically more potential for advancement/promotion. Once the 11 years are over, he will have much more earning potential than a degreeless John (assuming the market for Ph.D.s remains strong, which I don’t think is a huge assumption given the lack of domestic technical/science Ph.D.s in the US right now).

Computer Science

An Introduction

People often ask me what I do or about what I am studying. Many have certain misconceptions and stereotypes that render the simple answer of “Computer Science” insufficient. For example, the vast majority of non-technical people with whom I’ve talked seem to think that learning new programming languages and writing programs are the primary areas of study for computer-related university majors. That’s like believing literature majors go to university to learn the intricacies of using pens and typewriters. In the ~7 years—and counting (gasp!)—in which I’ve been in higher education, I haven’t been taught a single programming language.

The following is an attempt on my part to answer these questions, in the hopes that I can hereafter simply refer people to this page instead of having to explain this for the thousandth time.

Hacking the Law

Thought Experiments Testing the Limits of the Law


First of all, I am neither a lawyer nor a trained ethicist. The following are a list of thought experiments related to “hacking” (i.e., testing the limits of) the law. Unless otherwise noted, I have not done any research to confirm whether or not the questions posted herein are either novel or have already been answered. Although the following contains some material related to computers, I have tried my best to write it in such a way as to be accessible to the widest audience.

Copyrighting a Number

Is it legal?

It is obviously legal to copyright an artistic work, like a digital photo. A digital photo, however, is really stored on a computer’s hard drive as a sequence of numbers, each representing the color of a dot in the picture. This sequence of numbers could be summed such that it amounts to a single, unique number. Would it be legal for one to give that number—which uniquely represents the copyrighted image—to a friend? The friend could then divide that number back into its sequence on the hard drive, thus reconstructing the original copyrighted picture. If copyrighting numbers is not legal, then I do not see why what I just described would not be legal.

The issue is actually a bit more complicated than it seems.

It is entirely possible that the method used to convert the digital picture to a single number could be slightly modified (e.g., by adding 1 to the resulting number). If the recipient of the number does not know that this was done then the resulting reconstructed picture will look like noise. If the recipient knows to subtract 1 from the number before reconstructing the picture, however, the picture will be exactly the same as the copyrighted picture.

To add even more complication, it is entirely possible that, by adding 1 to the number, the improperly decoded picture might in fact become a completely different copyrighted picture.


  1. Person X has a copyrighted picture, called picture A, that he/she legally owns.
  2. X converts the picture to a number, $n$.
  3. X sends the number $n+1$ to person Y.

Case 1:

  • Y converts the number $n-1$ back to a picture, resulting in picture A.

Case 2:

  • Y converts the number $n$ to a picture, resulting in a completely different picture B.
  • Picture B turns out to be copyrighted by person Z.
  • Neither person X nor person Y have ever even seen picture B before.

At what point is copyright lost?

Related to copyrighting a number is the following.

When the picture is represented as a sequence of numbers (representing the colors of the individual dots in the picture), it is possible to increment each of the colors of the individual dots. For example, let’s say the dot in the upper left corner of picture A is currently black. We could iteratively increment the color of that dot so that it eventually becomes white (going through a sequence of lightening grays in the process). We could even increment all of the dots in the picture at the same time.

Now, let’s say picture A is a photo of the Mona Lisa of which we do not own the copyright. Picture B is a photo of the Empire State Building that you took and of which therefore own the copyright. Both of the pictures have the same dimensions; therefore each dot in picture A has a corresponding dot in picture B.

Now, we iteratively increment the dots in A such that they all move toward the color of their corresponding dot in picture B. Let’s call the result of this picture C. At the beginning, C will look exactly like picture A. At the end, C will look exactly like picture B. In the middle of the process, C will look like a linear combination of A and B.

Question 1

At what point during the “morph” from A to B will the “copyright” of picture C transition from that of picture A to picture B?

Question 2

Is there any point during the process that picture C might not be protected by either picture A or picture B’s copyrights?

Celebrating 200 Poetic Years

In which Rob and I embark on yet another crazy trip.

Rob Lass and I have shared many an adventure. We have embarked on a number of multi-day cycling trips. He accompanied me on a crazy U-Haul road trip to the Canadian border to retrieve a 1.5 tonne pallet of IBM servers I had acquired. We have masqueraded as lawyerly fat-cats at whiskey festivals. We both share an unnatural fascination with the life and works of Leslie Lamport. We were once collectively mooned and subsequently chided by Jello Biafra. Yet another time, we shared drinks in the hotel bar of a Holiday Inn in Monmouth, NJ, sitting next to Ron Jeremy. We have also shared a number of moments in close proximity to RMS (an activity which, incidentally, I recommend only in moderation).

I was not in the least surprised, then, when Rob approached me about going down to Baltimore for the bicentennial anniversary of Edgar Allan Poe’s birth, followed by a stakeout of Poe’s grave to catch the Poe Toaster. The intervening hours were to be filled at The Horse You Came In On Saloon, which was supposedly one of Poe’s favorite hangouts, and is said to be the last place he was seen before his death. I heartily endorsed this plan.

The first matter of business was to make our two hour road trip as pleasant as possible. This obviously entailed gratuitous electronics.

How We Roll

Upon our arrival at Westminster Hall (the location of the bicentennial ceremony), we first set out to examine Poe’s grave in what remained of the daylight.

Rob and Evan at Poe's Grave
Please ignore the two fops and focus your attention on the fence in the background: this is the one over which we suspect the toaster makes his entrance. The building behind the fence is the Law Library of the University of Maryland. The courtyard between the fence and the building is secured and only accessible from either the interior of the library or by scaling two consecutive fences in an adjacent alley (more on this below).

Charm City Cakes (of Ace of Cakes fame) created a cake for the event.

Charm City Poe Cake
The cake was raffled off to the guests, and I am sorry to report that neither of us won.

I’d also like to report that many Poe fans are certified weirdos. Some also have extreme dedication.

Extreme Dedication
In this particular case, however, to what the dedication is I am not sure (the ceremony overlapped with the Baltimore Ravens’ unsuccessful bid at the Super Bowl).

The celebration as a whole, however, was quite fun, including a number of very good performances. Rob and I also got to get to know John Astin, which turned out to be somewhat of a letdown. But he’s ancient, so it’s okay.

The View from Inside Westminster Hall

Afterward we got a bite to eat and caught the tail end of said Ravens game at The Horse You Came In On.

The Horse You Came In On
I learned four things from this experience:
  1. Yuengling seems to be as popular in Baltimore as it is in Philly;
  2. in Baltimore, Yuengling is not pronounced “lager;”
  3. despite the fact that Baltimore lost to the Pittsburgh Steelers and my car has a PA license plate, no one mistook my car for that of a Steelers fan and flipped it over in a riot (as would undoubtedly have been the case if Baltimore were populated by Philadelphia sports fans); and
  4. the “frat” scene seems to descend on The Horse You Came In On immediately after the completion of sports games.

The gate closest to the monument.

We got back to the graveyard around 00:30 on the 19th to find a crowd of about 60 people. We really didn’t know what to expect; apparently neither did anyone else, as wild rumors started to fly. One rumor claimed that the toaster often made rounds to the fences surrounding the graveyard to say hi (and undoubtedly sign countless autographs and pose for pictures). Another rumor claimed that the toaster was none other than Poe House curator Jeff Jerome himself. This is all complicated by the fact that Poe actually has two graves (he was exhumed in the late 19th century to make way for his monument and re-buried in the back of the graveyard—a location not visible from the sidewalk/gates). The grave in the back is the one in which Rob and I were photoed above. Some people thought the toaster visited the monument (which is visible from the street), while others thought that he visited the grave in the back. There were therefore two groups of people each clustered around the gate closest to one of the graves. The “monument” group seemed to be a mix of the aforementioned weirdos with a healthy dose of hipsters. They spent their time reading poetry. The group at the other gate (closest to the back grave) was decidedly more hardcore; spirits flowed from many a hip flask.

The gate closest to the rear grave (where the toaster usually goes).

At this latter gate, Rob and I met up with a guy who had actually attended this thing before; in fact, he claimed to have attended every year since 1983. He and his son (a teenager) come every year to try and get a picture of the toaster, most likely to sell to a magazine (there is only one known photo of the toaster from a 1990 issue of Life magazine reproduced here). He said that the toaster almost always goes to the back grave. The toaster gets no cooperation from any authorities; neither the Westminster Burial Grounds nor the UMD Law Library provide him with any assistance. Jeff Jerome camps out in the church every year to simply confirm that the toaster is the same person as the year before (i.e., there is not an impostor) and also to ensure the identity of the toaster remains secret (because if his identity were ever revealed the magic of the tradition might be lost). Jerome does not know who exactly the toaster is, however, and he does not want to know. Once the toaster arrives, does is toast, and makes his exit, Jerome goes into the graveyard, collects the bottle of liquor, flowers, and any notes the toaster may have left, puts them in the church, and leaves. It is Jerome’s exit that cues the hordes of weirdos, hipsters, alcoholics, and amateur journalists that the toaster has come and done his deed.

The alley next to the graveyard.

At around 01:30, the man’s teenage son came up to his father saying that he had been surveiling the alley adjacent to the graveyard that I mentioned above. Three guys had gone in, but he only saw two of them come out. Rob immediately walked down to the alley and I followed close behind. Rob got there first and apparently saw two guys on the other side of the two fences (one fence of which was about 10 feet tall). One fellow jumped over the brick wall to the graveyard. The other hid behind a small half wall, peeked his head out to look at Rob, and then sprinted over the wall to follow his companion. About five minutes later, camera flashes could be seen reflecting off of the walls of the law library, seeming to emanate from the area of the back grave. We assumed this was the Poe Toaster having pictures taken for his own record. We waited for another hour or so but nothing happened. It was cold, and the toaster had likely already come and gone, so we drove home.

All in all, it was an awesome adventure.

You can read Rob’s account of it here.