# Evan A. Sultanik, Ph.D.

Evan's First Name @ Sultanik .com

Computer Security Researcher
Trail of Bits

Universitato Drexel Colegio de Komputanta kaj Informa Teknologio
Departemento de Komputscienco

## Tracking Trains

### Reverse Engineering and Hacking Text Messages for Great Good

I’ve been regularly commuting between Philly and DC on Amtrak for over six months now. Being in the Northeast corridor—the only profitable region in Amtrak’s network—and specifically being in the NYC↔Philly↔Washington trifecta that accounts for about a third of Amtrak’s overall nationwide ridership, the service is generally excellent. Not Shinkansen excellent, but good enough to take me where I need to go in relative comfort, in a third less time than would be necessary to drive. And in English, too, so I can die with a smile on my face, without feelin’ like the good Lord gypped me.

For a pseudo-public entity, Amtrak does a surprisingly good job keeping up with the technological times. It’s had free WiFi on its trains for a number of years, the conductors scan tickets using ruggedized iPhones, and its iOS app lets me store and organize my myriad tickets in Passbook. One can even present tickets via an Apple Watch.

The one gripe I have about the system is that, despite Amtrak’s excellent online tools for tracking the exact location of trains, there is no good way to get useful train status alerts. I often work at a location ~45 minutes outside of DC, so I want to know whether my train is delayed before I depart for the station.

The Southbound trains I take from Philadelphia originate in either New York or Boston. Therefore, I want to get an alert texted to my phone when the train has departed New York. If it departed New York on time, then I can proceed as normal. If it was delayed, then I can hit the snooze button. If it didn’t even leave yet, then I know that it will be at least an hour before it arrives in Philly.

The first thing I had to do was figure out a way to send text messages for free. Every major cellular provider has some sort of gateway service where an E-mail sent to the proper address will be forwarded to the associated phone number. Therefore, I created a simple Python library (in pure Python; no dependencies) for sending free (to the sender) text messages to phones from every major international carrier:

Next, I had to reverse engineerdevise a way to get accurate train status updates. This was relatively straightforward, but I hesitate to go into details because it might implicate me in several terms-of-use violations.

I pieced this all together into a new service that I have been using and allowing several fellow commuters to beta-test for the past couple months. I’m excited to release it to the public now:

http://trains.sultanik.com/

This website has no affiliation with any railways or train operators. If you are a railway or train operator, please keep in mind that this is a toy website created by a single guy with altruistic motives. Please don’t sue me. Instead, let’s talk about how I can give you what I’ve created here so that you might host it and provide it as a service for your customers.

No profit is made off of the services on this website. In fact, I lose money by running it. Therefore, there are certain concessions that have been made due to lack of fuding. Namely, there is the possibility that a malicious party could illegally intercept the messages sent between a user’s web browser and this server, potentially altering his or her account and alerts. I have implemented as many security features as possible to protect against this, but in order to be almost foolproof I would need to host the site on SSL, which requires money. If you desire additional security and features, please consider donating.

Enjoy, and please drop me a line if you find the service useful!

## Musical Uncanny Valley

### In which I compare annoying music to artisanal ketchup.

Recently, a friend of mine shared a link to Max Richter’s solo debut album Memoryhouse on social media:

While I appreciate Richter’s work and think it’s good, it’s very hard for me to enjoy it. The problem is that I find much of it—and particularly Memoryhouse—too reminiscent of Philip Glass’. The latter’s minimalism and distinctive brand of ostinato, harmonic chord progressions, variations in half-time, &c., is clearly being referenced in Memoryhouse, albeit with perhaps “post-minimalist” orchestration. For example, when I first listened to November, the first track of Memoryhouse (q.v. above), it immediately reminded me of Glass’s String Quartet No. 3, which predates Memoryhouse by about 20 years:

It’s a bit like when a TV show wants to parody Jeopardy but doesn’t have the budget to license rights to the Jeopardy Think! music, so it creates a slightly different version which ends up sounding wrong, despite the fact that if the original Jeopardy theme had never existed this new version would be just as popular. What is called a musical pastiche.

That just sounds wrong to me, to the extent that my subconscious is offended by it. It’s like going to a Michelin 3-starred restaurant and being served artisanal, house-made ketchup, from locally sourced organic tomatoes and garlic harvested from the chef’s own private garden during the last full moon in spring: No matter how good that ketchup tastes, it’s not going to taste as right as Heinz, because that’s what you grew up eating. And heaven forbid you’re served Hunt’s. Did we lose a war?

In my mind, a pastiche is distinct from something like an homage or inspiration since its similarity to the source material is much more noticeable. For example, Glass’s predilection for pairing low-pitch ostinato with higher-pitch simple melodies is technically very similar to Mozart’s modus operandi, yet we rarely ever hear Glass being directly compared to Mozart.

A number of years ago I had a subscription to the Philadelphia Orchestra and attended a concert debuting a new symphony by a relatively unknown composer (whose name escapes me). The theme was “The United States.” I’m guessing the concert was held around the time of Independence Day, but I neither remember nor care to look it up. The whole thing was very frustrating for me to sit through, because it was clear to me that every single movement was simply a pastiche of the work of a much more famous American composer. I would have much rather heard a performance of “the real thing.”

Music from completely different genres can also either purposefully or accidentally trick our brains into finding similarity. From almost exactly five years ago today:

Been bugging me for years: Does Aesop Rock’s “9-5er’s Anthem” quote Gershwin’s “Rhapsody in Blue”?
@ESultanik
Evan Sultanik

The field of æsthetics has produced a hypothesis of what is know as the uncanny valley: When an object moves and/or looks very similar to (but not exactly like) a human being, it causes revulsion to the observer. Here is a video with some examples, if you want to be creeped out a bit. As objects become increasingly similar to the likeness of a human, human observers become increasingly empathetic toward the object. However, once the object passes a certain threshold of human likeness, the human starts to be repulsed, until another likeness threshold at which point the object is almost indistinguishable from a real human.

CC BY-SA 3.0, from here

I posit that there is a similar phenomenon in music, and that is what I am experiencing when I listen to Memoryhouse. One of the theories explaining the uncanny valley is that conflicting perceptual cues cause a sort of cognitive dissonance. It is well-known that the brain behaves almost identically when imagining a familiar piece of music as it does when listening to it. This suggests that the brain is internally “playing” the music in synchrony with what it is hearing, anticipating each note. In a sense, one’s brain is subconsciously humming along to the tune. My theory is that when a song (or, particularly, a pastiche) is similar enough to another song that is much more familiar, this evokes the same synchronous imagining. However, once the pastiche deviates from anticipated pattern, this ruins the synchrony and causes cognitive dissonance.

Excuse the pun, but on that note I’ll leave you with something that will hopefully not be in your musical uncanny valley: This wonderful, recent recomposition of Glass’s String Quartet No. 3 for guitar:

## Defending Cyberspace

### or: No, I thought you were doing it.

Yesterday I attended the Balancing Act symposium on big data, cybersecurity, and privacy. During the panel discussion, an excerpt from Clarke’s oft-quoted and debatably hyperbolic book on cyber war was invoked:

At the beginning of the era of strategic nuclear war capability, the U.S. deployed thousands of air defense fighter aircraft and ground-based missiles to defend the population and the industrial base, not just to protect military facilities. Every major city was ringed with Nike missile bases to shoot down Soviet bombers. At the beginning of the age of cyber war, the U.S. government is telling the population and industry to defend themselves. As one friend of mine asked, “Can you imagine if in 1958 the Pentagon told U.S. Steel and General Motors to go buy their own Nike missiles to protect themselves? That’s in effect what the Obama Administration is saying to industry today.”

That passage has always struck me as proffering a false equivalence. The US Government has near absolute control over the country’s airspace, and is thereby responsible to defend against any foreign aggression therein. Cyber attacks, on the other hand, are much less overt—with the possible exception of relatively unsophisticated and ephemeral denial of service (DOS) attacks. A typical attacker targeting a private company is most likely interested stealing intellectual property. That’s certainly the trend we’ve seen since the publication of Clarke’s book back in 2012, vi&., Sony, Target, Home Depot, &c. The way those types of attacks are perpetrated is more similar to a single, covert operative sabotaging or exfiltrating data from the companies rather than an overt aerial bombardment.

The actual crime happens within the private company’s infrastructure, not on the “public” Internet. The Internet is just used as a means of communication between the endpoints. More often than not the intermediate communication is encrypted, so even if a third party snooping on the “conversation” were somehow able to detect that a crime were being committed, the conversation would only be intelligible if the eavesdropper were “listening” at the endpoints.

A man has a completely clean criminal record. He drives a legally registered vehicle, obeying all traffic regulations, on a federal interstate highway. A police officer observing the vehicle would have no probable cause to stop him. The man drives to a bank which he robs using an illegal, unregistered gun. Had a police officer stopped the bank robber while in the car, the gun would have been found and the bank robber arrested. Does that give the police the right to stop cars without probable cause? Should the bank have expected the government to protect it from this bank robber?

The only way I can see for the government to provide such protections to the bank, without eroding civil liberties (i.e., the fourth amendment), would be to provide additional security within the bank. Now, that may be well and good in the bank analogy, but jumping back to the case of cyber warfare, would a huge, multi-national company like Sony be willing to allow the US Government to install security hardware/software/analysts into its private network? I think not.

## Social Signals Part 2

### The (gratuitous) math behind the magic.

In the last post, I presented a new phenomenon called Social Signals. If you have not yet read that post, please do so before reading this continuation. Herein I shall describe the nitty-gritty details of the approach. If you don’t care for all the formalism, the salient point is that we present an unsupervised approach to efficiently detect social signals. It is the approach that achieved the results I presented in the previous post. Read on for the details.

## Social Signals

### or, the basis for an article that was nominated for best paper and subsequently rejected.

Looking back on my previous job at The Johns Hopkins University, I am struck by and grateful for the breadth of different topics I was able to research: distributed phased array radar resource scheduling, Linux kernel integrity monitoring, TLS, hypervisors, probabilistic databases, streaming algorithms, and even some DNA sequencing, just to name a few.

Toward the end of my—to abuse an overloaded term—tenure at JHU/APL, I became involved in their social media analysis work. For example, I had some success creating novel algorithms to rapidly geolocate the context of social media posts, based solely upon their textual content. But one of the most interesting things I encountered along these lines was a phenomena that we came to call Social Signals.

I wrote a paper about this, which was simulaneously nominated for best paper and yet rejected from a prestegious Computer Science conference. Read on for the full story.

## Success in OS X

### 10 easy steps (and lots of unnecessary prose) on how to set up a new PowerBook in 48 hours or more.

Five years ago I wrote about my travails solving some networking issues on my Linux laptop. I equated the experience to a classic XKCD comic in which a simple software fix escalates to a life-and-death situation. The same thing happened to me again, this time on Mac OS X. Read on to find out.

## Hashing Pointers

### or: How I Learned to Stop Worrying and Love C++11

For as long as I’ve understood object-oriented programming, I’ve had an ambivalent relationship with C++. On the one hand, it promised the low-level control and performance of C, but, on the other, it also had many pitfalls that did not exist in other high-level managed languages like Java. I would often find myself in a position where I wouldn’t be sure exactly what the compiler would do under-the-hood. For example, when passing around objects by value would the compiler be “smart” enough to know that it can rip out the guts of an object instead of making an expensive copy? Granted, much of my uncomfortableness could have been remedied by a better understanding of the language specification. But why should I have to read an ~800 page specification just to understand what the compiler is allowed to do? The template engine and the STL are incredibly powerful, but it can make code just as verbose as Java, one of Java’s primary criticisms. Therefore, I found myself gravitating toward more “purely” object-oriented languages, like Java, when a project fit with that type of abstraction, and falling back to C when I needed absolute control and speed.

A couple years ago, around the time when compilers started having full support for C++11, I started a project that was tightly coupled to the LLVM codebase, which was written in C++. Therefore, I slowly started to learn about C++11’s features. I now completely agree with Stroustrup: It’s best to think of C++11 like a completely new language. Features like move semantics give the programmer complete control over when the compiler is able to move the guts of objects. The new auto type deduction keyword gets rid of a significant amount of verbosity, and makes work with complex templates much easier. Coupled with the new decltype keyword, refactoring object member variable types becomes a breeze. STL threads now make porting concurrent code much easier. That’s not to mention syntactic sugar like ranged for statements, constructor inheritance, and casting keywords. And C++ finally has lambdas! C++11 seems to be a bigger leap forward from C++03 than even Java 1.5 (with its addition of generics) was to its predecessor.

As an example, I recently needed an unordered hash map where the keys were all pointers. For example,

std::unordered_map<char*,bool> foo;
I wanted the keys to be hashed based upon the memory addresses of the character pointers, not the actual strings. This is similar to Java’s concept of an IdentityHashMap. Unfortunately, the STL does not have a built-in hash function for pointers. So I created one thusly:
/* for SIZE_MAX and UINTPTR_MAX: */
#include <cstdint>
namespace hashutils {
/* hash any pointer */
template<typename T>
struct PointerHash {
inline size_t operator()(const T* pointer) const {
#if SIZE_MAX < UINTPTR_MAX
/* size_t is not large enough to hold the pointer’s memory address */
addr %= SIZE_MAX; /* truncate the address so it is small enough to fit in a size_t */
#endif
}
};
}

Note that I am using auto here to reduce verbosity, since it is evident that addr is a uintptr_t from the righthand side of the assignment. The hashutils::PointerHash object allows me to do this:
std::unordered_map<char*,bool,hashutils::PointerHash<char>> foo;

The neat part is that C++11 has a new using keyword that essentially lets me generically alias that definition:
template<typename K,typename V>
using unordered_pointer_map = std::unordered_map<K,V,hashutils::PointerHash<typename std::remove_pointer<K>::type>>;

unordered_pointer_map<char*,bool> foo;

Note the use of std::remove_pointer<_>, a great new STL template that gets the base type of a pointer.

In another instance, I wanted to have a hash map where the keys were pointers, but the hash was based off of the dereferenced version of the keys. This can be useful, e.g., if you need to hash a bunch of objects that are stored on the heap, or whose memory is managed outside of the current scope. This, too, was easy to implement:

namespace hashutils {
template<typename T>
inline size_t hash(const T& v) {
return std::hash<T>()(v);
}

/* hash based off of a pointer dereference */
template<typename T>
struct PointerDereferenceHash {
inline size_t operator()(const T& pointer) const {
return hash(*pointer);
}
};

/* equality based off of pointer dereference */
template<typename T>
struct PointerDereferenceEqualTo {
inline bool operator()(const T& lhs, const T& rhs) const {
return *lhs == *rhs;
}
};

template<typename K,typename V>
using unordered_pointer_dereference_map = std::unordered_map<K,V,PointerDereferenceHash<K>,PointerDereferenceEqualTo<K>>;
}

Note that, through the magic of the C++ template engine, this code supports keys that are pure pointers as well as C++11’s new smart pointers.

As another example of the afforementioned auto keyword and ranged for statements, this is how easy it is in C++11 to hash an entire collection (e.g., a std::vector or std::set):

namespace hashutils {
class HashCombiner {
private:
size_t h;
public:
HashCombiner() : h(0) {}
template <class T>
inline HashCombiner& operator<<(const T& obj) {
h ^= hash(obj) + 0x9e3779b9 + (h << 6) + (h >> 2);
return *this;
}
operator size_t() const { return h; }
};

/* hash any container */
template<typename T>
struct ContainerHash {
size_t operator()(const T& v) const {
HashCombiner h;
for(const auto& e : v) {
h << e;
}
return h;
}
};
}

Then, to make all sets hashable (and thereby valid to be used as keys in a map), simply add this:
namespace std {
template<typename... T>
struct hash<set<T...>> : hashutils::ContainerHash<set<T...>> {};
}


I realize that this post is a collection of rather mundane code snippets that are nowhere near a comprehensive representation of the new language features. Nevertheless, I hope that they will give you as much hope and excitement as they have given me, and perhaps inspire you to (re)visit this “new” language called C++11.

## Killing Programs Softly

### A quick script to gently kill intermittently unresponsive programs on OS X.

Seven months ago I asked the following question on StackExchange:

Sometimes, when I have many applications open doing many memory and IO-intensive things, my computer inevitably starts to thrash a bit. While waiting for things to settle down, I often decide to close some applications that don’t really need to be open. The trouble is that many of the applications (especially ones that have background/idle processes) tend to be intermittently unresponsive until the thrashing subsides, so it either takes a very long time for them to get focus to send +q, or when I go to close them by clicking on their icon in the dock I am only presented with the option to force quit. I know that those applications aren’t permanently unresponsive, so I’d prefer to send them a gentle TERM signal and have them quit gracefully when they are able. I usually end up killing them by using pkill from the terminal, however, that’s not always feasible, especially if the terminal is also hosed.

What is the easiest way to gently send the signal to kill a process if/when that process is intermittently unresponsive? (In a situation in which access to the terminal and/or starting a new application is not convenient.)

The question didn’t get much fanfare on StackExchange, and the only answer so far has been to use AppleScript to essentially programmatically send the +q signal to the application. I’ll bet it’s basically equivalent to using pkill from the terminal to send a SIGTERM to the process, but it might work in the event that my terminal emulator app is also unresponsive. Anyhow, it was the best solution I had, so I matured the idea by making a friendly standalone AppleScript that enumerates all of the currently running processes and prompts the users for which to gently kill. Here it is:

local procnames
tell application “System Events”
set procnames to (get the name of every process whose background only is false and name is not “GentleKill”)
end tell
(choose from list procnames with prompt “Which applications would you like to gently kill?” with multiple selections allowed)
if result is not false then
set tokill to result
set text item delimiters to {”, “}
display dialog “Are you sure you want to gently kill “ & tokill & “?”
repeat with prog in tokill
tell application prog to quit
end repeat
end if