Recent Content:
Note: My advice is specifically related to the higher education system in the US; other countries have very different systems. My advice is also solely based on experience in the field of Computer Science; this may translate to other STEM fields, but certainly not all academic fields.
Do you really want a Ph.D.?
Make sure you are doing this for the right reasons!
Reasons to get a Ph.D.
- You want to work in academia. If you want to get a tenure track research position at a university, you need a Ph.D.
- You need it to get a promotion at work. This is typically only true at larger corporations, where having certain credentials is necessary to progress through the ranks.
- You enjoy being in an academic environment and solely want the experience. This is perfectly valid. Being in grad school can (but isn’t guaranteed to) be incredibly invigorating.
Reasons not to get a Ph.D.
Dat Phizzle Dizzle Doh
The novelty of putting those letters after your name and/or being called “Doctor” wears off real quick.You just want to teach at a university
Universities are really hungry for adjunct professors these days. You usually don’t need a Ph.D. to teach. Try teaching a class or two before you commit to getting a Ph.D. and making that your career. I’ve taught a bunch of classes, both at the undergraduate and graduate levels, including developing the entire syllabus, assignments, and slides all from scratch. It is a lot of work, and usually not worth what they pay you.You want to work on your dissertation topic for the rest of your career
I know dozens, maybe hundreds of Ph.D.s, and I can count on one hand the number of people who continued research related to their dissertation topic after graduation.Imposter Syndrome
I’ve worked with plenty of people with no degree who are far more capable than people with multiple grad degrees.You’re probably not going to get a job in academia
I know several people who got Ph.D.s from top tier schools like CMU and MIT and all of them either ended up in industry or teaching at small liberal arts schools with no graduate program (because they were dead set on ascending the ivory tower). The academic market is super tough, and will only get tougher thanks to the inevitable closure of smaller schools and a general preference for adjuncts over tenure-track professors. Ph.D. programs do not limit the number of students they accept based on the anticipated number of professorial job openings. There are many more Ph.D.s than professorships!Even if you do get your dream job in academia, you’ll probably hate it and/or lose it
I don’t think I know of a single professor, tenured or otherwise, who loves their job. They need to take on a huge teaching load thanks to the dearth of adjuncts and/or assume administrative positions that do not incur an increased salary.The financial implications and opportunity cost of leaving your job for three to ten years
Some studies have concluded that getting a Ph.D. will require up to fifty years to outweigh the opportunity cost of staying in the workforce and advancing your career. But in some circumstances it can be as few as five years. This is because there is a lot of variance in the compensation for graduate students. Be prepared for three to ten years of significantly reduced income.The Snake Fight
Beware.Common reasons you will fail
There are myriad reasons why you might fail to complete your Ph.D. that are almost all completely out of your control.Survivor bias is real. Trust me, I’m a survivor.
Only two out of the ~dozen students in my research lab ever completed their degree. Keep that in mind when you take advice from the survivors.Someone “scoops” your research
This is typically more of an issue in the more theoretical specializations, but it does happen. If someone solves your problem before you, you’ve got to start over. I’ve written an essay about this.Your advisor switches schools
Sometimes this can be a good thing; for example, a friend of mine’s advisor transferred from an average school to MIT right before my friend graduated, and he was able to transfer over all of his credits, defend at MIT, and get MIT on his diploma. But I also know a bunch of people who were bitten by this, where the new school would not accept their credits and/or there was no funding at the new school for tuition remission and a stipend, so they basically had to drop out.Your advisor goes on sabbatical
It’s hard to be advised if your advisor is off galavanting for a year or two. Good advisors plan for this, but I know of several cases where students had to drop out because their advisor didn’t plan ahead.You don’t get along with your advisor, or they go AWOL, or their tenure application is denied, (or they die)
Ever look at a 101-level STEM textbook? There are usually at least two authors: The one who knows what they’re talking about, and the one who speaks English as a native language. I had two advisors for basically the same reason. My first advisor had lots of research grants and contracts which afforded me tuition remission and a good stipend. But that work was mostly applied science; nothing was deep enough to turn into a thesis topic. So I took on a co-advisor who had just been hired as a new tenure-track professor and was full of cool ideas. A few months later he unfortunately, suddenly, died. Then I took on a third co-advisor (who was happy to take me on because he didn’t have to worry about paying me from his own grants). This amounted to more work for me, since I had to do the work to “pay the bills” with my original advisor as well as my actual research with my new co-advisor. But it also gave me a bunch of freedom to just work on what I wanted. With that said, I had two labmates who were working under my late co-advisor who were unable to find a place for themselves after his death, and never completed their dissertations. A lot of it is luck. If a professor’s tenure application is denied, they are usually given a year to wrap up their business and leave the school. Most departments have contingency plans to pair orphaned graduate students with new advisors, but this can be a devastating blow to most students, for all intents and purposes similar to their passing away.You don’t have enough time to focus on your research
In my case, I was effectively working a full time job doing applied research for my first advisor in addition to being a “regular” graduate student with my co-advisor. This entailed 60+ hour work weeks for basically my entire Ph.D. It’s super hard to focus on research if you have a full time job; I only know of one or two people who successfully completed a part-time Ph.D. while working a full time job at the same time.You still want to get a Ph.D.?
Things you need to do before applying
Do you already have a master’s degree?
Some schools in the US do not require you to take any classes, while most others require you to effectively earn a Master’s in the process of getting your Ph.D. If you do not have a Master’s yet, try and choose a school that effectively forces you to earn a Master’s in the process of getting a Ph.D., as that will be a nice consolation prize in the event that you do not complete your dissertation. If you do have a Master’s, check on your prospective school’s degree requirements and whether they will accept your existing credits. If a school requires you to take a bunch of classes and you don’t want to earn a second Master’s, make sure they will waive that requirement for you. This affected me when I was deciding on which program to join, since I already had a Master’s and many schools wouldn’t budge on forcing me to re-do all of my Master’s classes.Decide in what you want to specialize
More on this below, but it’s best if you know what you want to research before you apply to a program. You don’t choose a school for a Ph.D. the same way you choose a school for an undergrad or even a Master’s degree. Choose the school based upon how your desired research aligns with what the professors there are doing, not based upon the name or prestige of the school.Read the entire Ph.D. Comics archive
Seriously, it’s really good, and completely accurate.How to get into a good program
Find potential advisors first
Compile a list of everyone who is actively doing research in an area related to your desired specialization. Send them an E-mail stating your desire to pursue a Ph.D. Ask them if they are taking on new students, and, if so, what their time frame is. The easiest way to get accepted to a Ph.D. program is to have a professor advocating on your behalf. If a professor wants to work with you, they can guide and even fast-track you through the admissions process.You should not have to pay your way through a Computer Science Ph.D.
Don’t expect to be rich, but every decent computer science Ph.D. program should offer tuition remission and a slightly-above-the-poverty-line stipend, either by acting as a teaching assistant (TA) or as a research assistant (RA). Do not accept any less. If the school wants you to pay for your Ph.D., something is wrong. Being an RA is preferable to a TA.Having a source of external funding lined up will get you into almost any school
The biggest limiting factor in taking on new students both at the departmental and professorial levels is funding. A professor will not advise a new student unless they have enough funding (either through grants/contracts that fund RA-ship, or through departmental TA-ships). If there are no TA openings and/or the professor doesn’t have enough grant funding to pay for your stipend and tuition remission, then they won’t take you on as a student. In such situations, you can either pay for your own tuition and forego a stipend (never, ever, do this!), or you can come to the table with your own external source of funding! This can be through a grant program (e.g., the NSF GRFP). If a professor doesn’t have to worry about how to “feed” you, they’ll be thrilled to work with you, and it will give you a lot more freedom in directing your own research.How to minimize your chances of failing
Talk to your potential advisor’s current and past students
Do this preferably before even applying. How many of the professor’s students actually graduate? Where do they end up after graduation? How many papers does each student publish per year? How is travel to conferences funded?

Interview your potential advisor in advance
What is a typical week in the life of an advisee like? How many regular meetings are there? How often do students typically meet with their advisor 1-on-1? Will the professor expect you to take any specific classes, regardless of the school’s degree requirements? Professors will often require you to take certain classes they think are really good. Is the professor planning to go on sabbatical any time soon? Do they have tenure? (Typically, professors will earn and usually take a sabbatical shortly after earning tenure. If a professor applies for tenure and is denied, they usually only get a year before they are fired. Some schools give professors two chances at getting tenure.) What is the professor’s teaching load like? Would they expect you to teach and/or TA? Or could you be an RA full time? What is their research grant pipeline like? When do their current grants end?Choose a topic that is unlikely to be “scooped”
I already linked to this above, but please read this essay I wrote. Try and choose a dissertation topic that can’t be duplicated by someone else first. This usually means choosing a topic that—even if it is inherently theoretical—could worst-case be “proven” empirically in the event that you can’t prove it formally.Remember that your dissertation is unlikely to ever be read by anyone other than your committee members
It is, in a sense, a rite of passage. If you don’t intend to go into academia, there is no shame in doing the minimum amount possible that can get you graduated. With that said, my dissertation is super awesome and you should totally read it right now. Plz, become the sixth person to have ever read it.
What to expect after you have a Ph.D.


Breaking into Google Headquarters
In which I tempt the criminal justice system while flaunting the California statute of limitations.

Over the past few years I’ve been a frequent contributor to and editor of the journal PoC or GTFO. Part of my contributions have been to help create several of the recent file polyglots for the electronic releases. I just presented a talk on this topic at BSides Philly, 2017.
A polyglot is a file that can be interpreted as multiple different filetypes depending on how it is parsed. While polyglots serve the noble purpose of being a nifty parlor trick, they also have much more nefarious uses, e.g., hiding malicious printer firmware inside a document that subverts a printer when printed, or a document that displays completely different content depending on which viewer opens it. This talk does a deep dive into the technical details of how to create such special files, using examples from some of the recent issues of the International Journal of PoC||GTFO. Learn how we made a PDF that is also a valid NES ROM that, when emulated, displays the MD5 sum of the PDF. Learn how we created a PDF that is also a valid PostScript document that, when printed to a PostScript printer, produces a completely different document. Oh, and the PostScript also prints your /etc/passwd file, for good measure. Learn how to create a PDF that is also a valid Git repository containing its own LaTeX source code and a copy of itself. And many more!
Here is a recording of the talk,
as well as my slide deck:
Has there been an increase in potential residency in Philadelphia?
Analyzing Zoning Density Changes, 2012–2017
In contrast to my last entry, I figured I should provide a counterexample to Betteridge’s law of headlines.
About a week ago, Jon Geeting posted an inquiry to the Code for Philly #project-ideas
Slack channel
asking,
I had recently been toying with another mapping-related side-project (which is not yet complete and therefore I’ll have to leave you in suspense), so I offered Jon my assistance.
In the remainder of this article, I first describe the regulatory reforms that have occured over the past five years in the Philadelphia zoning code, and why they render before-and-after zoning change analysis challenging. I then describe my technical approach to overcoming the challenges, which will likely only appeal to computer scientists; you can feel free to skip that section unless you plan on reusing, extending, or repurposing my code. Finally, I conclude with the preliminary results of my analysis.
Primer on Philly Zoning Reform
In 2012, Philly’s zoning code underwent an overhaul that basically amounted to a complete rewrite. Prior to that, the zoning code hadn’t been touched for the previous 50 years. The number of zoning district types decreased by over a third, and the rules regulating them were significantly simplified and deobfuscated.

Over the past five years, the borders of many zoning districts were redrawn; some of this may have even occurred instantaneously when transitioning from the old zoning code to the new. For example, the 600 block on the North side of Washington Avenue used to be zoned R-10A, but is now subdivided: The interior parcels have been split into a separate RSA-5 (single family row home) district and those fronting Washington Avenue have been rezoned into CMX-2 (mixed-use). The corner parcel at 7th street is now RM-1 (multi-family residential). Likewise, the 500 block used to almost entirely be zoned R-10, but is now a mixture of RM-1 fronting 6th street.

Furthermore, the GPS coordinates of the zoning districts for the pre-2012 and post-2012 datasets on Open Data Philly don’t always line up. So, even if a zoning district’s borders didn’t change, the GPS coordinates of its new boundary might not be identical to its pre-2012 boundary. Therefore, there is not a straightforward way of mapping old zoning districts to new ones, even using automation.
Even if the zoning boundaries did line up perfectly, there are over thirty thousand zoning districts in Philly: Mapping old districts to new ones using a naïve approach will require making over one billion “Does this old district line up with this other new one?” comparisons. Even if a human could do one comparison per second working eight hours per day, he or she would require close to a hundred years to complete the mapping. By then, there’s a good chance there might be another change to the zoning code 🤣! Even if a computer could do a hundred comparisons per second, it would take about four months non-stop to complete. So, how did I complete the mapping in less than a day? I answer that in the following section. But if you’re impatient and just want to skip to the analysis, it’s safe to skip ahead.
Technical Approach
Here’s a rough outline of the naïve algorithm I mentioned above:
1: | procedure Merge-Zoning($Z$, $Z^\prime$) |
Require: $Z = \{D_1, D_2 \ldots\}$ is a set of the old zoning districts, each being a list $D = \{p_{1}, p_{2}, \ldots\}$ of its boundary points. $Z^\prime$ is the list of new zoning districts. | |
Ensure: $M = \{ I_1, I_2, \ldots \}$ is a set of merged zoning districts, where each $I \in M$ is a tuple $\langle i, \{ j_1, j_2, \ldots \}, \{ D_1, D_2, \ldots \}$ with the first element containing the index $Z^\prime_i$ of the new district, the second element containing the set of old districts $\{Z_{j_1}, Z_{j_2}, \ldots\}$ it subsumes, and the third element containing its new boundary. | |
2: | for all $D_i^\prime \in Z^\prime$ do |
3: | for all $D_j \in Z$ do |
4: | if $D^\prime \cap D \neq \emptyset$ then |
/* The zoning districts intersect */ | |
5: | $J \gets \{j$ for all $j$ in all preexisting districts in $D_j \in M$ that intersect with $D_i^\prime\}$ |
6: | Append $\langle i, J \cup \{ j \}, D_i^\prime \cap D_j \rangle$ to $M$ |
7: | end if |
8: | end for |
9: | end for |
10: | end procedure |
Using a combination of Shapely and pyproj makes things like intersecting polygons (i.e., zoning district boundaries) and calculationg the area inside polygons easy.
I’m not sure which algorithm Shapely uses for polygon intersection, but its computational complexity is likely $\Omega((n\times m)\log(n\times m))$, where $n$ and $m$ are the number of edges in the polygons being intersected. The zoning districts have on average 39 edges, so that means a single intersection is going to take at least 5000 operations. Therefore, it’s not realistic to hit the afforementioned speed of a hundred automated comparisons per second. (In fact, my final implementation ended up averaging about two seconds per comparison.)
The first optimization we can make stems from the observation that zoning districts are non-overlapping. Therefore, any portion of an old zoning district that overlaps with a new zoning district can be deleted once they are merged:
1: | procedure Merge-Zoning($Z$, $Z^\prime$) |
2: | for all $D_i^\prime \in Z^\prime$ do |
3: | for all $D_j \in Z$ do |
4: | if $D^\prime \cap D \neq \emptyset$ then |
/* The zoning districts intersect */ | |
5: | $J \gets \{j$ for all $j$ in all preexisting districts in $D_j \in M$ that intersect with $D_i^\prime\}$ |
6: | Append $\langle i, J \cup \{ j \}, D_i^\prime \cap D_j \rangle$ to $M$ |
7: | $D_j \gets D_j \setminus D^\prime_i$ |
8: | if $D_j = \emptyset$ then |
/* The old district is fully contained within $D^\prime_i$ */ | |
9: | break |
10: | end if |
11: | end if |
12: | end for |
13: | end for |
14: | end procedure |
The final optimization that gets the running time down to tens of
hours instead of days is to get rid of the cumbersome $J$ calculation
using some clever caching. If you’re interested, check out intersect_maps.py
in my code.
I could probably reduce the computation time by another order of magnitude by more intelligently pruning old zoning districts, e.g., by storing them in a datastructure like an R-Tree. I chose not to do that because it would have required more development time, and the existing computation time of “less than a day” was sufficient. I may add that feature in the future.
Preliminary Analysis Results
At this point we have merged the old zoning districts with the new, producing a KML file which can be loaded into Google Earth to interactively explore the zoning changes. You can download it here.
Structural Density
The first thing I’d like to discuss is what I call structural density. (I’m a Computer Scientist, not a City Planner, so please excuse me if I’m not using the correct terminology.) I wanted to know: What is the maximum amount of square feet of building the zoning code would allow to be built on a parcel if whatever’s currently there were replaced? I calculated the structural density metric for both pre-2012 and post-2012 zoning.

The results turned out to be less interesting than I had hoped; you can download the KML file here. The overall square footage capacity of the city has increased with the new zoning code by over 38 million square feet, which might seem like a lot, but that only constitutes about one half of one percent change. Most of the changes seemed to emanate from the reduction in the number of zoning district types, so old zoning districts without a good modern analogue either had to be “rounded down” or “rounded up.”
Residential Density
The next metric I investigated was what I call residential density: an upper bound on the number of residents a parcel can support. This is of course poorly defined, because the zoning code itself doesn’t explicitly limit residency. I describe my methodology and parameterization for estimating it in the following section.

You can download the KML file for this map here. The residential capacity of the city increased by about 386 thousand (~7%) from a maximum of 5,229,657 in 2012 to a maximum of 5,615,158 today. Assuming the population grows at a steady but slow rate of 0.4% per year, it will take at most $$\frac{\log\frac{\mbox{residential capacity}}{\mbox{current population}}}{\log(1 + \mbox{growth rate})} = \frac{\log\frac{5615158}{1567872}}{\log 1.004}\approx 320\ \mbox{years}$$ before the city will reach maximum residency under the current zoning regulations. You heard it here first, folks! Time to get the ball rolling with a new Zoning Code Commission so we’re prepared for 2337!
Seriously, though, there are some other interesting patterns in the map. There are a few clusters of down-zoning and reduced residential capacity around the city, particularly in Strawberry Mansion/Brewerytown, Queen Village/Southwark, deep South Philly, Kensington, Harrowgate, and Frankford. The majority of these seem to be due to old R-10 parcels (which permitted multi-family use) being re-zoned RSA-5, which only permits single-family use. This primarily happened in preexisting single family townhouse communities that aren’t likely to be redeveloped any time soon. But re-zoning from R-10 to RSA-5 will constitute a 60 to 70% reduction in residential capacity of a typical parcel.
With that said, across the board and even within the down-zoned clusters, there are pockets of up-zoning, almost exclusively around large retail corridors like American street. This is thanks to up-zoning to newly created mixed-use districts like CMX and IRMX.
Callowhill, NoLibs, Fishtown, and the river wards were the standout beneficiaries of mixed-use up-zoning, with an excess of what were previously G-2 (medium density industrial) districts that have now become a mixture of CMX, IRMX, and RMX.

Methodology and Caveats
The Internet tells me that the average number of residents per household in Philadelphia is 2.35. (Someone should double-check that with the latest census.) I also assume that the minimum space per resident is 450 sqft.; lowering that requirement will increase the maximum residency estimates for multi-family homes. For districts that only permit single family homes, I say that their maximum residency is 2.35. Likewise, for duplexes: 4.7. For zoning districts that have a maximum height/floors and a minimum percentage of the lot that must be open (e.g., R-10 and RM-1), I calculate the maximum residency as: $$\frac{(\mbox{Lot Area}) \times (1 - \mbox{Minimum Open Area Percentage}) \times (\mbox{Maximum Floors})}{450\ \mbox{sqft.}}.$$ For districts with a Floor Area Ratio (FAR, e.g., R-11 and RM-2), I calculate the maximum residency as: $$\frac{(\mbox{Lot Area}) \times (\mbox{FAR})}{450\ \mbox{sqft.}}.$$ Finally, I round the maximum residency down to the greatest multiple of 2.35.
The old zoning code had a nice series of tables that let me easily determine the parameters for most zoning district classes such that I could estimate maximum residency. However, not all of the zoning classes were covered! I didn’t have time yet to pore over the pages and pages of fifty-year-old legalese to grok the parameters for the missing classes, so instead I used a relative mapping from old zoning classes to new zoning classes that is provided in the new zoning code. Here are the missing zoning classes and their mappings:
C1 | → | CMX-1 | L1 | → | I-1 | |
C2 | → | CMX-2 | L2 | → | I-1 | |
OC | → | CMX-2 | L3 | → | I-1 | |
C3 | → | CMX-3 | L4 | → | ICMX | |
C4 | → | CMX-4 | L5 | → | ICMX | |
C5 | → | CMX-5 | G1 | → | I-2 | |
LR | → | I-3 | G2 | → | I-2 |
For zoning districts with bonuses, I always assume that the maximum bonus is achieved when making residency and square footage estimates.
Be mindful that we have to take some of these changes with a grain of salt, because they affect large preexisting planned communities/neighborhoods that weren’t before and aren’t now likely to be redeveloped. For example, the theoretical maximum residency of the Society Hill Towers dropped by over 50%, but they ain’t going anywhere anytime soon.
If you’d like to help out or just follow our progress, feel free to
join the newly created Code for Philly #zoning
Slack channel.