Evan A. Sultanik, Ph.D.

Evan's First Name @ Sultanik .com

Computer Security Researcher
Trail of Bits

Adjunct Professor
Drexel University College of Computing & Informatics
Department of Computer Science

Recent Content:

New website!

…for the fifth—and by no means last—time.

So, I’ve gotten rid of MediaWiki and switched back to Drupal (now in version 7). The old version of this site is still available here. I’ll be posting another blog entry in the coming days explaining why I’ve chosen to abandon MediaWiki. I’ve already migrated most of the content from my old website, but there is still some to go. The address for my old RSS feed still works, however, Google Reader seems to be ordering the entries non-chronologically and I am not sure why.

Grading PDFs (a Workflow)

Annotation of multi-page PDFs using open-source tools.

I'm currently teaching a class and receive all of the homework submissions digitally (in the form of PDFs). Printing out the submissions seems like a waste, so I devised a workflow for efficiently grading the assignments digitally by annotating the PDFs. My method relies solely on free, open-source tools.

Here's the general process:

  1. Import the PDF into GIMP. GIMP will automatically create one layer of the image for each page of the PDF.
  2. Add an additional transparent layer on top of each page layer.
  3. Use the transparent layer to make grading annotations on the underlying page. One can progress through the pages simply by hiding the upper layers.
  4. Save the image as an XCF (i.e., GIMP format) file.
  5. Use my gimplayers2pngs script (see below) to export the layers of the XCF file into independent PNG image files.
  6. Use the composite function of ImageMagick to overlay the annotation layers with the underlying page layers, converting to an output PDF.

Below is the code for my xcflayers2pngs script. It is a Scheme Script-Fu script embedded in a Bash script that exports each layer of the XCF to a PNG file.

#!/bin/bash
{
cat < i 0)
        (set! bottom-to-top (append bottom-to-top (cons (aref all-layers (- i 1)) ‘())))
        (set! i (- i 1))
        )
      (reverse bottom-to-top)
      )
    )

(define (format-number base-string n min-length)
  (let* (
	 (s (string-append base-string (number->string n)))
	 )
    (if (< (string-length s) min-length)
	(format-number (string-append base-string “0”) n min-length)
	s)))

(define (get-full-name outfile i)
  (string-append outfile (format-number “” i 4) “.png”)
  )

(define (save-layers image layers outfile layer)
  (let* (
	 (name (get-full-name outfile layer))
	 )
    (file-png-save RUN-NONINTERACTIVE image (car layers) name name 0 9 1 1 1 1 1)
    (if (> (length layers) 1)
	(save-layers image (cdr layers) outfile (+ layer 1)))
    ))

(define (convert-xcf-to-png filename outfile)
  (let* (
	 (image (car (gimp-file-load RUN-NONINTERACTIVE filename filename)))
	 (layers (get-all-layers image))
	 )
    (save-layers image layers outfile 0)
    (gimp-image-delete image)
    )
  )

(gimp-message-set-handler 1) ; Send all of the messages to STDOUT
EOF

echo “(convert-xcf-to-png \”$1\” \”$2\”)”

echo “(gimp-quit 0)”
} | gimp -i -b -

Running xcflayers2pngs file.xcf output will create output0000.png, output0001.png, output0002.png, ..., one for each layer of file.xcf. Each even-numbered PNG file will correspond to an annotation layer, while each odd PNG file will correspond to a page of the submission. We can then zip the even pages with their associated odd pages using the following ImageMagick trickery: convert output???[13579].png null: output???[02468].png -layers composite output.pdf

We can automate this process by creating a Makefile rule to convert a graded assignment in the form of an XCF into a PDF:

%.pdf : %.xcf
        rm -f $**.png
        ./xcflayers2pngs $< $*
        convert $*???[13579].png null: $*???[02468].png -layers composite $@
        rm -f $**.png

Gender Representation on the Internet

In which I discover that male names appear much more often than female names on the Internet.

There is a lot that has happened since last August. I successfully defended my Ph.D., for one. I could give a report on our post-defense trip to Spain. I could talk about some interesting work I'm now doing. Instead, I'm going to devote this blog entry to gender inequity.

This all started in November of last year in response to one of Dave's blog posts. Long-story-short, he was blogging about a girl he had met; in an effort to conceal her identity (lest she discover the blog entry about herself), he replaced her name with its MD5 hash. Curious, I decided to brute force the hash to retrieve her actual name. This was very simple in Perl:

#!/usr/bin/perl -w
use Digest::MD5 qw(md5_hex);
my $s = $ARGV[0] or die(“Usage: crackname MD5SUM\n\n”);
system(”wget http://www.census.gov/genealogy/names/dist.female.first”) unless(-e ‘dist.female.first’);
open(NAMES, ‘dist.female.first’) or die(”Error opening dist.female.first for reading!\n”);
while() {
   if($_ =~ m/^\s*(\w+)/) {
       my $name = lc($1);
       if(md5_hex(ucfirst($name)) eq $s || md5_hex($name) eq $s ||
md5_hex(ucfirst($name) . “\n”) eq $s || md5_hex($name . “\n”) eq $s) {
           print ucfirst($name) . “\n”;
           exit(0);
       }
   }
}
close(NAMES);
exit(1);
Note that I am using a file called dist.female.first, which is freely available from the US Census Bureau. This file contains the most common female first names in the United States, sorted by popularity, according to the most recent census.

This script was able to crack Dave's MD5 hash in about 10 milliseconds.

This got me thinking: For what else could this census data be used?

My first idea was also inspired by Dave. You see, he was writing a novel at the time. Wouldn't it be great if I could create a tool to automatically generate plausible character names for stories?

#!/usr/bin/perl -w
use Cwd ‘abs_path’;
use File::Basename;
my($scriptfile, $scriptdir) = fileparse(abs_path($0));
my $prob;
$prob = $ARGV[0] or $prob = rand();
system(”cd $scriptdir ; wget http://www.census.gov/genealogy/names/dist.all.last”) unless(-e $scriptdir . ‘dist.all.last’);
system(”cd $scriptdir ; wget http://www.census.gov/genealogy/names/dist.male.first”) unless(-e $scriptdir . ‘dist.male.first’);
system(”cd $scriptdir ; wget http://www.census.gov/genealogy/names/dist.female.first”) unless(-e $scriptdir . ‘dist.female.first’);
sub get_rand {
    my($filename, $percent) = @_;

    open(NAMES, $filename) or die(”Error opening $filename for reading!\n”);
    $percent *= 100.0;
    my $nameval = -1;
    my @names;
    my $lastname;
    while() {
        if($_ =~ m/^\s*(\w+)\s+([^\s]+)\s+([^\s]+)/) {
            $lastname = ucfirst(lc($1));
            if($3 >= $percent) {
                last if($nameval >= $percent && $3 > $nameval);
                $nameval = $3;
                push(@names, $lastname);
            }
        }
    }
    close(NAMES);
    return $lastname if($#names < 0);
    return $names[int(rand($#names + 1))];
}
sub random_name {
    my($male, $p) = @_;
    my $firstnameprob;
    my $lastnameprob;
    do {
        $firstnameprob = rand($p);
        $lastnameprob = $p - $firstnameprob;
    } while($lastnameprob > 1.0);
    return &get_rand($male ? ‘dist.male.first’ : ‘dist.female.first’, $firstnameprob) . “ “ . &get_rand(’dist.all.last’, $lastnameprob);
}
sub flushall {
    my $old_fh = select(STDERR);
    $| = 1;
    select(STDOUT);
    $| = 1;
    select($old_fh);
}
print STDERR “Male: “;
&flushall();
print &random_name(1, $prob) . “\t”;
&flushall();
print STDERR “\nFemale: “;
&flushall();
print &random_name(0, $prob) . “\n”;

This script does just that. Given a real number between 0 and 1 representing the scarcity of the name, this script randomly generates a name according to the distribution of names in the United States according to the census. Values closer to zero produce more common names, and values closer to one produce more rare names. The parameter can be thought of as the scarcity percentile of the name; a value of $x$ means that the name is less common than $x$% of the other names. Note, though, that I'm not actually calculating the joint probability distribution between first and last names (for efficiency reasons), so the value you input doesn't necessarily correlate to the probability that a given first/last name combination occurs in the US population.

$ ./randomname 0.0000001
Male: James Smith
Female: Mary Smith
$ ./randomname 0.5
Male: Robert Shepard
Female: Shannon Jones
$ ./randomname 0.99999
Male: Kendall Narvaiz
Female: Roxanne Lambetr

The “Male” and “Female” portions are actually printed to STDERR. This allows you to use this in scripting without having to parse the output:

$ ./randomname 0.75 2>/dev/null
Gerald Castillo Christine Aaron

But I didn't stop there. Here's the punchline of this Shandy-esque recount:
Inspired by Randall Munroe style Google result frequency charts, I became interested in seeing how the frequency of names in the US correlates to the frequency of names on the Internet. I therefore quickly patched my script to retrieve Google search query result counts using the Google Search API. I generated 60 random names (half male, half female) for increasing scarcity values (in increments of 0.01). The results are pretty surprising:

Note that the $y$-axis is on a logarithmic scale.

As expected, the number of Google search results is exponentially correlated to the scarcity of the name. What is unexpected is the disparity between representation of male names on the Internet versus female names on the Internet. On average, a male name of a certain scarcity will have over 6.6 times more Google results than a female name of equal scarcity!

鯖の味噌煮 (Saba no Misoni)

A Receipt for My Favorite Japanese Dish

Software

  • Fillets from one large mackerel cut in 4cm pieces.
  • 2.5cm ginger cut in matchsticks
  • 2 tbsp. mirin
  • 1 tbsp. sake
  • 1.5 tbsp. sugar
  • 1 tbsp. soy sauce
  • 1 cup dashi stock (or water)
  • 5 tbsp. red (“aka”) miso paste
  • 2 scallions cut in 2cm pieces, whites and greens separated.

Algorithm

  1. Combine dashi, soy, mirin, sake, 3 tbsp. miso, and sugar in a sauce pan and bring to boil.
  2. Add ginger, scallion whites, and mackerel, put on tinfoil otoshi buta, and simmer on medium heat for 10 minutes, basting every few.
  3. Stir in remaining miso and cook for an additional 5 minutes.
  4. Remove from heat, add scallion greens, replace otoshi buta, and cool to room temperature in pan.

Azerbaijani Sous

A recipe for my wife’s favorite stew.

Although they are not Azeri, my wife and her family lived in Bakı from the mid-1980s through the first half of the Nagorno-Karabakh war (my father-in-law, a Colonel in the Soviet Army, was stationed there). During that time my mother-in-law developed both an appreciation for and ability to cook Azerbaijani cuisine. One dish that became my wife's favorite is called “Азербайджанский соус” (Pronounced: /ˌæzərbaɪˈdʒɑːnskʲɪj sos/ Translit.: Azerbaidzhanskiy sous, English: Azerbaijani sauce). Sous, when pronounced by a Russian, sounds like the English word “so” with an “s” consonant added at the end. It is more of a stew than a sauce, at least in my mother-in-law's interpretation. The main ingredients include eggplant, peppers, tomato, cilantro, and a whole chicken. Given that my mother-in-law's rendition is so delicious, I wanted to learn how to reproduce it in my own kitchen. Having mastered the art of cookery through observation, my mother-in-law—like most other old world home cooks—is not a woman of weights and measures; her recipes are conveyed in units of “pinches of this” and “splashes of that.” This makes reproducing a dish in foreign kitchens quite difficult. One of the problems with finding a formal, written recipe for Sous is that the same word has many other meanings in French with respect to gastronomy (e.g., “Sous-chef” and “Sous-vide”), which makes Google searches (even in Russian) ripe with false-positives. Furthermore, I can't seem to find any English-language recipes. For all of these reasons, I set out to grok my mother-in-law's method. I think I have converged upon a fairly stable recipe (with some of my own modifications) which I have outlined below. In doing so, I have tried to highlight the changes I have made to the original recipe.

Biggest

A solution for digital hoarding.

I have a problem. I admit it. I have a problem deleting files. In the “good times”—vi&., when I have gigabytes to spare on my hard drive—I simply don’t bother deleting temporary files. That video I encoded/compressed to MPEG? Sure, I’ll keep the raw original! Why not? Just in case I ever need to re-encode it at a higher bitrate, you see.

Inevitably, I run low on disk space months later, at which point I’ve forgotten where all of those pesky large files are living.

Enter my script, which I simply call biggest. This script will conveniently print the $n$ biggest files that are rooted at a given directory. Here’s an example:

$ biggest 10
. [92MB]
|- art [15MB]
|  |- .svn [7MB]
|  |  `- text-base [7MB]
|  |     |- heat.png.svn-base [2MB]
|  |     `- SWATipaq.png.svn-base [2MB]
|  |
|  |- heat.png [2MB]
|  `- SWATipaq.png [2MB]
|
|- os [7MB]
|  `- os.pdf [3MB]
|
|- .svn [9MB]
|  `- text-base [9MB]
|     `- proposalpresentation.pdf.svn-base [8MB]
|
|- eas28@palm [14MB]
|- ESultanikPhDProposalPresentation.tar.gz [12MB]
|- APLTalk.pdf [9MB]
|- proposalpresentation.pdf [8MB]
`- proposalhandouts.pdf [7MB]

It is available on GitHub, here:

Ballmer Peak

A Generalization

Through my many years of coding, I have come to this realization:

The so-called “Ballmer Peak”, as it is currently understood, is but a two dimensional projection of what in reality is a higher dimensional space, vi&.,

The

AAMAS 2010

The Ninth International Conference on Autonomous Agents and Multiagent Systems

If you've been keeping up, you know that I was attending the Ninth International Conference on Autonomous Agents and Multiagent Systems (AAMAS) last week in Toronto, Canada. The conference went really well, as did the two workshops I also attended:

  1. The Third International Workshop on Optimisation in Multi-Agent Systems (OptMAS); and
  2. The Twelfth International Workshop on Distributed Constraint Reasoning (DCR), which Rob and I co-chaired.

I presented my paper on distributedly solving art gallery and dominating set problems on Thursday. AAMAS also allows for full papers to additionally present a poster for the work. This was my first time making a poster purely in LaTeX, and it was a very smooth experience. I created a poster template for the Drexel CS department which can be downloaded here. You can view and download the presentation slides and poster for my paper here.

Success in Linux

Linux: Simultaneously the source of and solution to all of my computing problems.

Oh Linux, you’re simultaneously the source of and solution to all of my computing problems. I recently had my own version of Linux “success”. Read on to hear the whole story.

Retirement Planning through Treasure Hunting

A metaphor for the process of earning a Ph.D.

I've deliberated long and hard to decide whether or not to publish this blog entry.

I inevitably end up liking things that require extensive deliberation as to whether or not I like them.
@ESultanik
Evan Sultanik

I'm afraid that, given I've not yet completed my Ph.D., it will be misconstrued as boastful or, worse yet, exculpating. This is not my intention. If anything, I hope that my observations below will help others pursuing a Ph.D. to both rationalize their own situation and educate others on the process.


It's been a long while since I've posted to this blog, in large part due to the fact that I've been in the throes of wrapping up my degree and hunting for a job. As such, I get a lot of questions from non-academics as to when I will finish. It seems like many people are under the impression that a Ph.D. is “what you get if you tough it out and stay in school and study a few more years after undergrad.” While this is technically true, many either don't realize or don't understand that a Ph.D.—at least in the sciences, engineering, and mathematics—requires independent research. A Ph.D. isn't earned through studying hard and passing tests. It's particularly frustrating when people see that I've been pursuing my Ph.D. for ~4 years (an average, if not short, duration for the degree) and immediately draw the conclusion that, “Since [he] hasn't graduated yet, [he] must not be studying hard enough!” Over the years, I've developed a metaphor that I give to people to explain the process:

Earning a Ph.D. is like trying to fund one's retirement through treasure hunting.

When I started my Ph.D., I had to choose a topic to study. That's like choosing a location in a vast field on which to begin digging for buried treasure. You know that the treasure is out there somewhere, but you're not sure where or how deep. This is not entirely up to chance: One's advisor(s)—experts in gold digging—do help in choosing the location, and there are ways to intelligently predict where the treasure might lie. The trouble is that one might dig for years and years and only be left with a huge pile of dirt. Worse yet, one might be only a few millimeters of soil away from the gold and not even know it. Every once in a while one might find a small nugget of gold in the dirt, egging him or her on, but there is no guarantee that one will eventually hit the mother lode. In other instances, one knows exactly where the treasure lies, but it is underneath an impenetrable rock that requires years and years of chipping away to exhume. In some cases one does find a huge treasure chest, but more often than not one's fortune is amassed from accumulation of the small nuggets. Defending one's dissertation, then, is like choosing to go into early retirement based off of one's fortune amassed from treasure hunting. Is it enough? Will I be able to support myself on what I've found so far? In order to complete one's degree, one has to defend his or her work to a committee of experts—all of whom are expert treasure hunters and, in a sense, one's competition. You have to convince them that your fortune will be enough to support yourself. And you have to do all of that without wasting too much time creating fanciful metaphors of debatable import.