A Docker for WebKeepass

WebKeepass is a small Dancer2 web application that exposes a KeePass database with a web frontend. I developed it in order to be able to access my Keepass database from any device, securely.

Although a bit rough on the edges, the project got some users and GitHub watchers (apparently it fills a gap for other people than myself) and I started receiving some feature requests and bug reports.

More interesting, I recently received an email from a Docker contributor who took the time to build a Docker image for Web Keepass.

The corresponding GitHub project is there.

If you’re using it, feel free to drop a comment here, I’d be happy to get in touch with people who installed it from Docker, maybe that could lead to a tutorial.

The art of interviewing programmers, the fizzbuzz test, the white board and other thoughts

The Sphinx
The Sphinx

When it comes to interviewing software engineers it’s quite of a challenge to come up with a good interview plan. It’s also very important to test the applicant when you’re hiring programmers, good presentation skills and nice words can deceive a lot. I’ve seen graduates from prestigious schools coming with an impressive CV, a very nice way of describing their past experiences and still, they weren’t able to write a “for” loop with a simple conditional branching structure.

Coding Horror has a famous entry about that, “Why can’t programmers.. program“, and FizzBuzz is now very popular in many tech interviews, but apparently not everyone understood its purpose.

I came across this article recently and I read there so many things wrong I couldn’t resist to write a follow-up.

The article starts by saying we (the reader) are wrong about technical interviews. The assumption is quite big : everyone is wrong except the author.

But what really puzzles me is more about the ideas rather than the wording.

The Fizz Buzz test

Back to the Fizz Buzz test, the author says it’s just about testing the ability to manipulate the modulo operator. This is utterly wrong.
That test is very good to see if the applicant is able to write a for loop with an if-elsif-else switch. And more importantly, they must be able to do that in less than 5 minutes.

The time constraint is very important. I personally use this test (or variants) as a preliminary step to my “programming” test.
If more than five minutes are needed, or if there are mistakes (it happens a lot, surprisingly, even on a simple problem like “Fizz buzz”), then I know the candidate isn’t able to think easily with basic code structures in mind.

Common mistakes are:

  • the variable used with the modulo operator is the parameter of the function, not the one from the for loop
  • the for loop is simply forgotten (only an if-then-else structure, no iteration at all!)
  • the conditional statements are badly organized and the code does not work

But even if the candidate manages to implement a fizzbuzz(x) function without errors, if it took 15 minutes or more, something else is wrong, it took too much effort. The ability to implement that function in few minutes is a very good metric about the programmer’s skills.

So no, fizz buzz is not about the modulo operator, at all. It’s about the very basic skills one must have to apply to a programmer role.

What about the whiteboard?

In the article, the author advises us not to ask people to write code to the whiteboard, because it’s too far from the reality and “you are asking them to write code under time pressure, with somebody watching“. Yes. exactly, that’s the point.

Of course in real life, nobody writes a code sample on a whiteboard, but the ability to do so, under a bit of pressure, is a very good way to see how the person manages to handle a bit of stress, and how natural it is for them to think code. You can compare that to a musician: if you’re able to play your guitar in your room, alone, it’s not the same thing as playing it on stage, in front of many people watching. It’s very similar. If you need a very quiet and peaceful mood to be able to code something simple, then you will have trouble working in a team with business expectations based on your code. That’s a no-brainer.

The whiteboard also brings something very interesting to the table: it leaves the applicant alone with only what he deeply knows about programming. There is no way to escape to a scheme of trial-and-error with the code sample they’re writing, it forces the person to compile the code they write in their mind. And if you’re able to write a functional code sample on a whiteboard, it’ll be piece of cake to do so with your computer.

It’s not surprising if Google, where you can find a lot of brilliant engineers, use this interviewing technique intensively: applicants who are selected to be invited on-site, pass no less than 5 consecutive interviews, all of them “on the whiteboard”.

As explained by Steve Yegge, on his famous post “Get that job at Google“:

You want your mind to be in the general “mode” of problem solving on whiteboards. If you can do it on a whiteboard, every other medium (laptop, shared network document, whatever) is a cakewalk. So plan for the whiteboard.

The author also thinks that Whiteboard and coding interviews also routinely ask people to re-implement some common solution or low-level data structure. This is the opposite of good engineering. I’m surprised he thinks the medium (the whiteboard) is the same thing as the message (the question asked).

If I use a whiteboard to run my interviews, it doesn’t mean I’ll ask the same questions as another person doing an interview on white board. And about “good engineering skills“, it’s perfectly possible to use a whiteboard to get a feeling of them, for instance by using the very good problems of the “Programming Pearls” book.
I do that all the time, at the very end of the interview (if the FizzBuzz test was passed, no need to go there if not).

About the degrees and certifications

Although I strongly disagree with the previous items, some points make sense. I agree with the author on the fact that a degree as a very low value itself. It’s possible to see people with a good degree completely unable to deal with basic problems, or unable to write a for loop. Surprinsingly, you can have a degree in computer science without being able to construc an if-else structure…

On the other side, there are brilliant people without a degree, so I don’t give a lot of credit to the degrees myself, it’s just a low signal to consider. The technical interview has much more credit to evaluate the real skills and knowledge.

Programming language skills?

Interviewing on the skills about a specific programming language appears to be a trap. We did that in the begining but we changed our strategy. If you organize your interview (and the job offers) around one unique programming language and how effective the programmer is with it, you’ll end up with two issues:

  • you narrow the scope of people you can reach to those “experts”
  • you tie your vision to a programming language

If you want to build a company around experts of a specific language, you can, but it’s a bet you do for the future, on that language, its community and its ability to evolve well in time with your products and your industry ecosystem. It’s not necessary a bad strategy, but it brings limitations.

If on the other hand, you open your job positions to “Good Software Engineers” and you stop thinking about “the programming language”, you open a lot of doors. Your job positions become instantly more open, and you can meet people very talented, who don’t know yet your programming language but who will be able to learn quickly.

The other benefit of that approach is that if at some point you want to embrace a new technology, designed for a new language, you already have people able to adapt to that situation. This is maybe not the case if the team is built around some language experts; they might be reluctant to do the switch, more likely because they love programming with their favorite language. Which is a natural feeling when you’ve been working in a specific way for many years.

A programming language is a way to implement a solution to a problem, it should not be a purpose, it’s a tool.

Some guidelines

Besides the points where I disagree, the good thing about this article is that it shows how difficult it can be to interview engineers, when you want to identify the right people for your team.

It’s really true it’s quite hard. I’ve been doing engineer interviews for some years now and I clearly learned a lot. My first interviews were very bad, if I compare them to those I do today. In the very first ones, I wasn’t even doing technical tests or any kind of evaluation of the actual skills of the person. Now I’m armed with a quite dense cheat sheet of questions/problems, organized by topics, where I can pick stuff during the interview.

Depending on the reactions of the candidate, I can adapt my questions to harder or easier topics in order to get a good feeling after the hour we spent together.

The only part that is missing from my interview plan, is how to evaluate the person’s state of mind, or how he or she would fit well in the team’s spirit. Would the person enjoy working with us, and would the rest of team feel the same about that new person? That’s probably the most important thing to evaluate, and sadly, it’s maybe the hardest thing to picture during an interview.

At the end of the day, when we hire someone, we’re trying to build a team, and teams are effective if the people inside work well together.

Here is the real challenge, beyond finding skilled people, it’s finding skilled people who can work together, and as your team grows, it’s more and more of a challenge.

File::Sip a perl module to read huge text files with limited memory

sip beer
When slurping makes you sick, try to sip!

Even though we live in a world where buying a server with 500 GB of RAM is possible, there can always be a situation where we don’t have enough memory available. What about log files from a web server that handle a billion of requests every day, what if you need to parse these files as efficiently as possible with a limited amount of main memory available?

When the file is small enough, or when the memory is big enough, it’s not an issue, you just bring File::Slurp::Tiny and slurp the whole file into memory, you can then access the line you like. But when one of these conditions is not satisfied, the rules of the game change: you need to be able to access any line of the file without loading the whole content into memory.

What do you do? Well, that could be an interesting job-interview question!

At work we were in a similar situation a couple of months ago and we wanted to have a way to iterate over a file without loading it up into memory, after a quick search on metacpan, we realized nothing was doing what we wanted, so as good citizens, we implemented it and now, we release it to CPAN. So, let me introduce File::Sip.

I could have entitled this blog post “When slurping makes you sick, try to sip” that’s the whole idea: instead of loading the whole content of the file into memory, File::Sip builds an index of each line’s first character, accessible with the corresponding line number. Don’t get it wrong, File::Sip is slower than File::Slurp::Tiny, because it needs an init phase to scan the whole file for building its index. But if you want to parse a 10 GB file on a system where you only have 4GB of RAM for the current process, it won’t be a problem for File::Sip: only the current line and the index are in memory, nothing more.

The project has been released on CPAN and is hosted, as usual, in Weborama’s GitHub repo, comments, issues, patches and forks are welcome!

BTsync, the peer-to-peer cloud solution, a Dropbox that you own.

Clouds
Clouds on Flickr

What if you could have the benefit of a Dropbox-like solution without relying on any third-party for storing your data?

That’s exactly what BitTorrent Sync provides. It’s currently released in beta but works out of the box. The program starts in the background and a small web interface allows you to control the instance on the port 8888. The interesting part is that it’s entirely built in a decentralized manner: btsync is a peer-to-peer software. Just like with Git repositories, a btsync instance can be shared (read-only or read-write) with any other btsync instance. The sharing process is made with a key. Copy/paste a key (or flash a barcode with your mobile phone) and you’re done, the sync can start.

Simple, efficient. Setting up a btsync instance on your private server is enough to emulate a Dropbox-like system.

There is an Android and an iPhone app, works like a charm.

What is missing though is a web view of the shared folders, that would be very handy to be able to browse the content without mounting it.

I’m close to kill my Dropbox, that would be one centralized service less!

Dark Mail, secure yet easy-to-use email protocol

Post office boxes, on Flickr
Post office boxes, on Flickr

It seems that a new email protocol is going to be developed. It’s called Dark Mail and will be an attempt at providing an easy-to-use yet secure protocol for on-line communications.

Emails are sent in clear text all over the globe, the vast majority of users send their emails without encryption, because it’s quite complex to do: you need to handle private and public keys yourself, which implies you have a minimum of knowledge about it. At least, you know about PGP and how to configure your email client to deal with it. It also means your recipient does the same. As you can imagine, that’s why most of the users keep using unencrypted emails.

Compare that to secured websites, that little https that appears in green in your browser’s location bar when you access your bank account. Well, keys have just been exchanged between your browser and the website, same game, but here, everything is done without you needing to worry about it.

What if that would be possible for emails? Providing the guarantee that any email you send is encrypted and can only be read by the recipient you chose. That’s what Dark Mail is supposed to provide.

Quite a challenge, but clearly an interesting project to follow.

This reminds me of a conference given by Benjamin Bayart recently about the Internet, where he questioned the consequences of a world where the population needs to protect itself from its government: When you want to hide your communications from your own government, it means you’re not far from a civil war. It means your government is a threat to your own privacy.

Is it a good thing we’re seeing projects such as Dark Mail rising? Maybe Bayart is right and that’s the sign we’re entering in an on-line war between citizens and their government, and the consequences can become quickly very bad for all of us, but after PRISM, maybe these initiatives aren’t something we can avoid anymore.