Saturday, February 7, 2015

I Redesigned Our Hiring Process

Why?

I've always disliked the software hiring process, from both sides of the equation. I find that the types of things which tend to be asked, whether it be the Microsoft style brain teasers, the algorithmic brain picking, the whiteboard coding and all the rest are optimized to select people who are good at those, but the very large majority of software jobs these aren't the right things for which to select.  In the interest of full disclosure I'll admit to some bias, for a variety of reasons the typical tech interview is not something at which I excel, yet I've managed to do pretty well for myself. At least with an n of 1 something isn't right here.

The software interviews at my company are standard fare, a candidate will talk to 8-12 people spread over 4-6 one hour sessions. Most of the panels will go through the usual progression - describing their current work, answering some questions to assess their technical knowledge and some sort of whiteboard coding. The whiteboard questions are fairly typical, chances are you've seen at least one of them before. Candidates will also have taken a canned test using Codility before they're brought in, solving a handful of small problems which are scored automatically. When a candidate completes the process I don't see how anyone could really have a great feel - there's simply not enough time to cover all of those bases ... will they fit in? Are they smart? Can they get things done?

Another problem is that I know via second and third hand tales that our process has turned people off. The key part there is that people talk and in our case saying negative things - all of the tales I heard only crossed my ears by coincidence so clearly there's a bad image out there.

A few months ago my company formed a new department, merging together a few similar groups of software engineers and data scientists. We've been given a lot of leeway to run things how we'd like and apparently people have noticed that I constantly gripe about our hiring process as I was asked to come up with a new scheme. My goals were:

  • Determine if they're smart and if they can get things done. Really the only two things which matter in the long run.
  • Provide situations more similar to our actual work life instead of artificial constructs
  • The household names can optimize for near zero false positives. They get more resumes per day than we get in a year (I made that up, but probably close). We can't do that. A false positive sucks, but going understaffed for a year because we passed on 10 false negatives sucks more.
  • Understand that accomplished devs often have a prior body of work, let them show us instead of just being resume bullet points
  • Spread the screening duties around to both keep people sane but also as professional development for our own engineers
  • Remember that candidates are not supplicants, rather they are at least as (if not moreso!) important than we are. We need to be selling ourselves as much as they need to sell themselves.
  • Foster an environment that helps present our company as a place that software developers would love coming to
The latter two are more abstract and were just points of emphasis for our group. Specifically there were two issues I wanted to address. The first was that I felt we (and almost every company I've ever interviewed with/for) would put up a wall which said "Tell us why we should deign to hire you". It's no secret that there are many bad stereotypes of tech interviewers - e.g. the guy who asks the same stupid "gotcha" question but doesn't realize there's more than one correct answer or the alpha geek who asks some esoterica in an attempt to crush your soul if you don't know the answer. Instead, I wanted our approach to be warm and welcoming, being thorough and being a dick don't have to go together. Candidates should feel that they're getting an opportunity to demonstrate their talents in a fair environment instead of being under some crucible of artificial pressure.  

The other is that the company I work for is extremely prestigious in its field, but that field is not software engineering. Worse,  we're a non-profit (read: low salaries) and household names of the software industry are literally across the street. It's a tough hill to climb when you're neither a sought after destination nor in the ballpark of the salaries of companies within 100 feet of you. Upper management asks how can we become notable for software prowess, my stance is that this can only happen organically over time and one path is that when candidates come in we seem like a place where a software developer would want to be. It's hard to say exactly one could do this, but we could start by not going with the same trite hiring process every other mediocre software shop is doing. Present a positive atmosphere, set ourselves apart from the pack a bit and hopefully even if things don't work out with a candidate they'll have positive things to say about us.

Ok, great. So what are we doing now? 

What are we doing that's so different? I'll admit that nothing here is novel. Only one small piece of this was something I came up with on my own. Instead I spent a lot of time reading about how other companies do things and what folks felt about those ideas and then strung together a path that I felt would be best. The reality is that nothing is perfect - if there was, we'd all be doing it and there'd be no false positives nor negatives. There's also no solution that won't piss someone off but we can try to minimize that number.

We start with a pretty typical screening process. We asked HR to not filter incoming resumes (let us handle the buzzword bingo please!) and they are screened by two people with the bar being "could this person possibly make it through the process". This is followed by a phone screen which lasts about half an hour and is simply about a 50/50 sales job on both parts with the screener trying to glean enough tech info to answer the same question. In our previous process the candidate would now be handed an online Codility test, and here is where we start to deviate. We ask the candidate to provide coding samples: an online portfolio, a Github account, or anything else they might have. We recognize that this isn't always possible (and skews towards younger folks!) but when it is we feel it's superior to Codility. First, senior developers are often put off by such tests when they have a body of work that they can demonstrate. Second, real code is always more informative than an artificial test. If they don't have samples, we fall back to Codility where we've chosen the most program-y of the questions and make it clear that this is simply a convenient way to generate material, not a graded test. The sample is then reviewed by two people with the bar being "would it scare me if this person ever touched my code repository". All of the screening roles are rotated around our engineers - both as a load balancer but also as professional development.

If all goes well, the candidate is brought on site where we have four sessions (plus one with an HR rep):

1) A technical discussion with a group of 4-6 team members. 

This is your standard interview session except that the brain teasers, whiteboard coding and similar things are verboten. Interviewers are instructed to try to guide an organic discussion about technology to assess both breadth and depth of knowledge, using something from their recent work history as a seed. Instead of asking the 90th person in a row what a pure virtual function in C++ is perhaps you can discern that they know their stuff by simply talking to them. I contend that when this works you glean just as much (if not more) information and manage to not have the candidate sweating bullets and worried about impressing you. Also, because it's an actual conversation it's easier to get a grasp on the candidate's communication skills - they're not doing mental gymnastics trying to figure out what trick you're currently playing on them.

Why 4-6 team members? Partly to help keep a lively discussion going, more people mean that there's more opportunity to participate. Another positive is that everyone is seeing the same thing, it's easier to compare notes afterwards. An interviewer having a bad day would be known by the rest to being harsh instead of having to take their word that the candidate was awful. Lastly, we have a social environment and people need to operate mixed in with several people at once - this is a more realistic situation than being locked in a room with different sets of two people.

2) A code review

Remember those code samples they sent us? You might ask, "How do you know they actually wrote it?" Well, we don't. However, after this we can be sure that either they did or understand it well enough that they could have.

Three engineers will sit with the candidate and talk through the code samples the candidate supplied in the screening. This isn't an opportunity to pick on their choice of where to put their curly braces but rather to discuss why they made the design choices they did, why they opted to do X instead of Y, how would they improve what they've done if they had the opportunity, what trade offs they made, etc.

Why this instead of whiteboarding a cycle detection algorithm for linked lists? This review allows us to probe their technical knowledge and assess their self awareness (do they understand what they've done well and poorly?), tests their communication skills (can they explain what they've done to a fresh audience?) and susses out if they actually authored the code in the first place.

3) A friendly lunch

The candidate will have lunch with 1-3 people, largely consisting of non-software folks in our department. This isn't graded, except in extreme cases and is intended to provide the candidate an ability to recharge while getting to know more people they'll be interacting with regularly.

4) A coding session

The candidate will be told to either bring in a laptop or that they can use one we have for this purpose which is loaded with standard development tools. They'll be told that they have two hours to complete an assignment which should be treated as if they were handed this task in an actual work environment. It's up to them what that means: unit tests, documentation, actually works, whatever. Two team members will be in the room with them the whole time and they're told to treat them as they would coworkers. If the candidate wants to sit in silence they can do so, but they can also use the folks in the room to bounce ideas off of, ask questions, or discuss the most recent episode of Game of Thrones if they choose - it's all up to them. When they've finished or two hours are up two more team members will be brought in and the four engineers will proceed with a code review of the same format as the earlier session.

Our hope was to find a completely real world problem for candidates to solve although finding a good one which could be solved in two hours was elusive. Sadly the one we're going with for now is meatier than your typical whiteboard problem but still ends up being fairly artificial. If we were requiring people to have an exact tech stack we could do more but since we're fairly language agnostic - not to mention things like frameworks, ORMs, etc there are too many variables to allow a Real Application. Hopefully we can improve this over time.

Ok, you've blabbed for a long time now. How is it working out?

Uh, I'll admit that I can't say for sure. We've yet to have a candidate come in for a live interview although I've found a lot more value in the coding samples instead of the Codility tests we used to get. So far the screening has been going well, we'll get a better picture in the coming months as people come in and things work or don't.

To make matters worse the first person coming in for a live interview backed out when he heard about the coding session. I'm ok with this, we make it clear that we're trying to simulate the job they're applying for, it doesn't seem unreasonable to request they actually do that for us. We'll never know for sure but this seems like a situation where this worked out for the best. When I was researching the process I found overwhelming support for this type of coding session over whiteboarding, so I'm hoping it's bad luck that our first person found it off putting.

Perhaps this will prove to have been a giant waste of time but I have faith that it'll go a long way to meeting the goals I stated above, time will tell.

Tuesday, December 30, 2014

Mei Mei Street Kitchen's amazing kung pow chicken dip

A few years ago at a popup dinner a few of us had Mei Mei's amazing kung pow chicken dip. It managed to capture the appropriate flavors while hardening your arteries on contact. After some harassment kind requests they posted it up on their blog, but that didn't last through a website redesign. A lovely friend of mine managed to save it, and voila here it is.

I think there's some sort of secret sauce (perhaps simply their amazing personalities!) that's missing here as it's never quite as good, granted I pestered them because they said great cooks would never be afraid to give out recipes as it's the cook and not the recipe - I'll be the first to admit that I'm a fraction the cook that those guys are! Either way, there's a lot of room for personal exploration here, I never make it quite the same any time I make it.

Personally a few modifications I tend to make:

  • I use 16oz of cream cheese and often 16oz of cheese
  • I trim the large fat chunks off of the thighs (which is what I use) and render that down as the base fat
Without further ado, here it is:

Writing Down Recipes Only Works If You Don't Throw Them Out: An approximation of our kung pao chicken dip recipe

November 27, 2012 by Irene
Someone asked if we would share how we made the kung pao chicken dip, a spicy, cheesy Mei Mei creation that debuted at our Staff-Meal-Gallows-pop-up . We love sharing recipes. We said we'd try. So here we are! The short answer is that you make kung pao chicken with the freshest and most delightful ingredients you can find, then mix it up with some really delicious cheese, and bake it in the oven. Pretty ridiculously simple. That's the formula we follow for a lot of our food, actually. The long answer is as follows:

*Thanks to Nicole for writing a killer blog post and taking some gorgeous photos, including that one.*
*Kung Pao Chicken Dip makes enough for four people as a snack/starter, or for Irene for lunch.*

Ingredients:
  • 1 pound of chicken (preferably dark meat), diced, marinating for maybe 30 minutes in a glug of soy sauce, two or so glugs of Chinese cooking wine, and 2 teaspoons of corn starch
  • Neutral cooking oil (peanut, canola, grape seed)
  • 10 dried red Chinese chilies OR 2-3 or more fresh jalapenos, minced
  • 3 stalks of scallions, sliced
  • 4 cloves garlic, minced
  • 1 TBS or so grated ginger
  • Kung pao sauce, comprised all this stuff whisked together:
    • 1 tablespoon Chinese black vinegar, or substitute good-quality balsamic vinegar
    • 1 teaspoon soy sauce
    • 1 teaspoon hoisin sauce
    • 1 teaspoon sesame oil
    • 2 teaspoons sugar
    • 1 teaspoon cornstarch
    • 1 teaspoon ground Sichuan peppercorns
  • 8oz cream cheese (I think we used 16oz actually and I'm not sorry)
  • 8 oz cheddar cheese, grated
  • Peanuts and whatever other garnishy stuff you're into.

Directions
  1. Heat a wok or cast iron skillet on high until it seems really ripping hot. Add a few tablespoons of your cooking oil of choice, and then the chilies. Stir-fry for about a minute, until your kitchen smells really good and you're maybe crying a little bit. A little smoke is good.
  2. Add the ginger, garlic, and scallions. Stir-fry some more.
  3. Add the chicken and fry for another two minutes, or until the chicken is cooked.
  4. Pour in the sauce. Kill the heat. Taste it. Needs more salt? More heat? More sugar? Add it now.
  5. Fold in the cream cheese and cheddar. Feel your inner Paula Deen grow strong.
  6. If you haven't already started eating it with a spoon, you can transfer it to a ramekin or baking dish and bake for 20 minutes at 375F.
  7. Top it with more cheese and garnish with candied nuts, more scallions, or more fresh chilies.
  8. Serve with crusty bread, chicken skin chips, or whatever else seems delicious.

You may already know that we're not super focused on measurements, or using exactly the ingredients called for, so have some fun and let us know how everything turns out!

Sunday, February 23, 2014

twitteR now supports database persistence

For a long time now I've wanted to add the ability for storing data from twitteR into a RDBMS. In the past I've done things by concatenating new results onto old results which simply becomes unwieldy. I know that many people have doctored up their own solutions for this but it seemed useful to have it baked in. Unfortunately I never had the time or energy to do this so the idea languished. But then dplyr happened - it provides some revolutionary tools for interacting with data stored in a database backend. I figured I'd kill two birds with one stone by finally implementing this project which in turn would give me a lot of data to play with. This is all checked in to master on github.

This is still a work in progress, so please let me know if you have any comments, particularly as regards making it more seamless to use.

 First, some basics:

  • While theoretically any DBI based backend will work, currently only RMySQL and RSQLite are supported.
  • The only types of data able to be persisted are tweets (status) objects and user objects. Granted, this likely covers 95%+ of use cases.
  • Data can be retrieved as either a list of the appropriate object or as a data.frame representing the table. Only the entire table will be retrieved - my expectation is that it will be simpler for users to interact with data via things like dplyr.
To get started, you must register your database backend. You can either create a DBI connection from one of the supported packages or call one of the available convenience methods (which will return the connection as well as register it with twitter.


To continue, suppose we have a list of tweets we want to persist. Simply call store_tweets_db() with your list and they'll be persisted into your database. By default they will be persisted to the table tweets but you can change this with the table_name argument.


Finally, to retrieve your tweets from the database the function is load_tweets_db(). By default this will return a list of the appropriate object, although by specifying as.data.frame=TRUE the result will be a data.frame mirroring the actual table. Much like store_tweets_db() there is a table_name argument.


Note that for user data there is a mirror set of functions, store_users_db() and load_users_db(), and the default table name is users.

Saturday, January 25, 2014

An updated look at the #code2013 language rankings

A few weeks ago I compared the #code2013 rankings from twitter to TIOBE's rankings although when I had collected the #code2013 data people were still chiming in, albeit at a slowing pace. As I would visually scan the new tweets it seemed like there was a huge increase of Delphi & Object Pascal compared to the data I had collected previously, and it made me curious if this was a real effect or just coincidence. Luckily I had continued to collect the #code2013 data after I made that post so I had an opportunity to find out, considering I had 6028 tweets giving me 1404 more than the last time.

At the same time, I commented in my original post that I was unhappy with the mechanism which I used to strip manual retweets (i.e. manually adding RT instead of a built-in retweet), as I had removed any tweet from the data which contained a RT. Because people often add commentary to the left of the RT, I created a new function which would leave anything to the left of the RT (as well as MT) which should leave more useable data. This code now appears in the github version of twitter as the function strip_retweets(). Unfortunately, this didn't make much of a difference - applying this new function to the original data set only gave me 23 more tweets worth of data, oh well. It was the thought that counted.

I processed the new dataset the same as the previous batch (all code included as a single gist below), and sure enough there was a large skew toward Delphi & Pascal in this batch. Note that I had tried to morph any usage of "object pascal" into a single "delphi/object pascal" entry, but presumably most people mentioning "pascal" mean delphi:


So despite the inclusion of about 30% more data, the results are very similar. So what happens if we look at the updated data against the TIOBE data as I did the first time?

Sure enough - when visually compared to the original, the pascal entries gained quite a lot (bouncing one of my favorites, Scala, down a tier). There were some other changes, most notably abap & c# gained while fortran lost but only ABAP had a very noticeable gain.

What happens if we only look at the new tweets against the TIOBE rankings. How much of a skew would Delphi show now?


As expected, Delphi took a huge leap forward. Also expected, some of the fringe languages fell off of this plot - which makes sense as we have about a third of the data so fewer opportunities to make the grade. You can also see some languages like R (another favorite) and ObjC dropping while others like Haskell and Matlab gaining.

So what happened? It seems reasonable to me to expect a fairly steady distribution over time, although clearly the social aspect to Twitter is affecting things causing viral gains and losses over time.


Thursday, January 2, 2014

Comparing the #code2013 results with the current TIOBE rankings

The TIOBE language rankings have always been controversial but in the absence of more meaningful metrics tends to be viewed as holy writ. Over the last few days of 2013 a hashtag was started by Twitter user @deadprogram called #code2013. The idea of this hashtag was that users would tweet which languages they used over the last year. I felt this would be an interesting comparison to the TIOBE rankings - the latter is based on search engine popularity but the #code2013 rankings would be based on what people are actually reporting.

To do this I used my R library twitteR to pull 4624 tweets with this hash tag and then started pulling it apart to see what I could see. I previously pulled the tweets using the searchTwitter() function, and loaded it into my R session. From there my first step was to try to remove retweets. Removing the new style Twitter retweets are simple, and then after that I removed anything with RT in the text. The latter isn't perfect and is likely to throw out good data (e.g. "lang1 lang2 lang3 RT @deadprogram: What programming languages have you used this year? Tweet using #code2013 Please do it and also RT!") but it seemed unlikely to radically skew the results. The R code I used to do this was:

This left 3745 tweets, so we lost about 1000 due to retweets. Considering the number of RTs thrown out here one thought might be to redo this by removing everything to the right of the RT instead of a blanket removal of anything with RT in the text.

The next step was to read in the TIOBE rankings (well, the top 50). Visually inspecting a sampling of the #code2013 tweets and looking at the TIOBE data made it clear that I would have to massage the language names a bit as there were a few problems. The most notable issue were things like "Objective C" or "emacs lisp" as I was planning on tokenizing languages by whitespace. Similarly, TIOBE defined "delphi/object pascal" but people in #code2013 tended to say either "Delphi" or "object pascal". It would be an impossible task to perfectly clean up the #code2013 data but I made a few adjustments to help things along:

I wanted to normalize all of the text to be lowercase but this presented an issue. A relatively small number of tweets (67, to be exact) were in a language encoding that tolower() wasn't fond of. Instead of fighting encoding issues I chose to throw these out as well. I looped through all of the statuses and if I was able to convert to lowercase I kept it, otherwise I threw it out:

Finally we're getting somewhere. I tokenized each status on any whitespace as well as . or , characters. From here I filtered each status to only contain words which exist in the TIOBE language list. The potential downside here is that we could have languages being represented by #code2013 that doesn't exist in the top 50 TIOBE languages and/or alternate spellings but this seemed unlikely to affect the outcome of this exercise in a meaningful way so this was a convenient way to normalize things. This resulted in 40 languages from the #code2013 that we're considering. Once that was done I created a data.frame with columns for the language name, the frequency count and a tier code. The tier code will be used to color the final plot and covered ranges 1-5, 6-10, 11-15, 16-25 and 26-40.


Ok. Now we're cooking. What I wanted to see here was how the rankings differed so what I did was to create a bar plot showing the frequency counts of the #code2013 hits with the Y axis being the languages and the X axis being the counts. The languages were ordered by their position in the TIOBE rankings, and the bars were colored by the #code2013 tier I mentioned previously. This is what the results looked like:



In general the top 10-ish are roughly the same although the most stark trend is that the top 5 and the next chunk are largely reversed. The top #code2013 languages were javascript, ruby, python, java & php while those are numbers 9, 11, 8, 2 & 6 respectively. Similarly 4 of the top 5 TIOBE languages are in the 6-10 tier, with the 10 place #code2013 (scala) language being all the way down as the TIOBE #33 language.

I might be way off base here but looking at the rankings of the #code2013 languages tells me a couple of things. One is that unsurprisingly web development still rules the roost: javascript, ruby, python, java, php. The other is that data analysis & big data (I loathe the term, but chest la vie) is coming on stronger than TIOBE recognizes considering some of darlings of that world are doing better in #code2013 than TIOBE with notable examples being Python, Scala, Haskell & R.

For the record, my tweet in this hashtag was: "Scala, java, R, python, matlab, C++ #code2013" so I have to say I'm pleasantly surprised to see some of my favorite languages (which would be the first four I mentioned, although not in that order) looking like a better combination than TIOBE would suggest.

Edit #1: Hadley Wickham suggested that I include a scatterplot of the data. Considering that one of the main motivations for this exercise was to force myself to figure out how his ggplot2 library worked I figured I'd oblige:



Friday, December 28, 2012

Seven Languages Week 1 - Ruby

I just completed the first chapter of Seven Languages in Seven Weeks which focuses on Ruby. On the assumption that the other six languages will follow the same pattern, the author divvied it up into three separate "days" (which was welcome as I'd been expecting 7 lessons which would dramatically lessen the chances of me completing any of these in a single week). The first lesson teaches the basic syntax of the language, the second lesson ties things together with more complicated structures, and at least in the Ruby chapter the third lesson was designed to show off "something cool" about the language - in this case, the metaprogramming facilities of Ruby.

I've had some minor experience with Ruby before, about the same extent that this book provided - the sort of introduction one gets over a couple of lectures or a 30ish page book chapter. As such it wasn't too confusing but we were definitely in the shallow end of the Ruby pool.

My first (err, second) impression of Ruby is that if one ignores Rails that there's not really any reason to prefer it over their favorite scripting language, e.g. Python, Perl, etc. I'm sure if a Ruby zealot was reading this they'd come up with a thousand reasons why I'm wrong but I simply can't imagine enough power existing in Ruby to warrant the sort of time investment it'd require for me to reach for Ruby first as opposed to something like Python (which I'm much more comfortable with) for scripting tasks, glue code, small apps, etc. Both are dynamically typed, object oriented languages featuring duck typing, metaprogramming facilities, functional programming trappings, etc. I'm sure one could spend all day describing how one tidbit was better in their favorite language in this fight but if I had to spend hundreds of hours achieving that level of fluency in the 'other language' it ain't worth it. And the last word on the silly Ruby vs Python debate that I just created - the Rubyists seem to love jumping up and down being sooo proud of "everything is an object!", which is something that always turns me off from a language (the zealots always spouting it, not that the statement is true).

All that said, what little I've seen of Rails was promising but this book didn't cover Rails at all. I have a fair amount of experience with Django (a popular web framework in Python) which was easy to use but Rails seems particularly easy to use and quick to develop on.

As promised, my solutions to the exercises (which could probably be gleaned from a google search) are available on my github account. I ran into a few minor road blocks - on Day 2 I spent a lot of time figuring that there had to be a much better solution, and on Day 3 I had misread the problem. Neither are really worth talking about here - after looking around on the web it seems that most people arrived at similar solutions (with varying degrees of proper Ruby idiom).

Tuesday, December 25, 2012

Seven Languages in Seven Weeks

About a year ago I bought Seven Languages in Seven Weeks with an eye towards broadening my horizons in regards to programming languages. The idea of the book is stated by the title - the author gives you a brief introduction to seven different languages spanning multiple programming paradigms with the idea being that one spends a week on each. The languages were mainly selected on the basis of being less common (e.g. Python and Javascript were removed for Prolog and Io respectively) but yet somewhat well known/useful. Each language is separated into three lessons which come with a bit of homework - both reading (e.g. language API) and programming assignments.

I finally managed to find some time and sit down with the book and have worked through the first two days of the first language - Ruby. I have a small amount of experience from Ruby, having previously started the Coursera SaaS class (granted 'started' means only the first week or two). I'll make a Ruby specific post when I'm finished.

I'm also planning on putting my solutions to the coding questions on my Github account - I'm sure they'll make someone with real experience in the language cringe, so if anyone happens to come across this and want to give me some constructive criticism feel free.