The architecture of StackOverflow
One of the most interesting talks these weeks, and a rare insight into one of the most active pages on the web: Marco Cecconi of StackOverflow speaks about the general server architecture, why they don’t unit-test (!), how they release (5 times a day) and shows some awesome server load screenshots. It’s fascinating that they run one of the most trafficked pages (that also uses long-polling “real-time” messaging !) on just 25 servers, most of them on 10% load all the time. “We could run it on just 5 servers if needed”. Awesome. Nice statements regarding caching and using existing code, too.
I really like the Get-Things-Done attitude and the simple, but productive view on workflow (use multiple monitors, don’t be the nerd sitting in front of a laptop). The code is not perfect (lots of static methods), they don’t even test, only have a hand full of developers (!) and nearly no downtime. Ah yes, and they run one of the most successful sites in the history of the internet.
“Languages are just tools”. “You’ll be successful anyways, or fail anyways [it does not depend on the language].” I really like that guy. And by the way, they mainly use dot.net for the site. Make sure you also check out the links, especially #5 shows the current tech stack used in the company.
And by the way, have you noticed that EXTREMELY huge presentation screen ? Awesome! They obviously did this in a cinema or university audimax.
Update #1:
The slides of this talk:
https://speakerdeck.com/sklivvz/the-architecture-of-stackoverflow-developer-conference-2013
Update #2:
Thread on news.ycombinator.com regarding this topic, and Marco Cecconi (and other StackOverflow IT guys) have joined:
https://news.ycombinator.com/item?id=7052835
Update #3:
Excellent article in the StackOverflow tech blog showing how StackExchange was build back in 2008 (lots of technical details):
http://blog.stackoverflow.com/2008/09/what-was-stack-overflow-built-with/
Update #4:
Official 2009 database dump (legally available directly on StackOverflow):
http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/
Update #5:
AWESOME! Full up-to-date list of software, technology, methods and servers used for StackExchange:
http://meta.stackoverflow.com/questions/10369/which-tools-and-technologies-are-used-to-build-the-stack-exchange-network
Update #6:
This excellent YouTube comment by Joseph Lust sums it up perfectly:
* Use static methods everywhere instead of OOP
* Write the least code possible
* Keep entire site compilation under 10s
* Cache every single object, it’s faster
* Design to scale Up before Out
* Use 368GB memory for your servers/db’s
* Don’t write tests, have your users find defects
* Don’t reinvent square wheels
TDD Doesn’t work well with static classes? That’s news to me. Why does a test case care about instantiation? Why should it? I suppose there’s no writing a failed test case for Math.Add(1,2), eh?
I can’t see how TDD forces object lifetime that way. IoC and static classes are at odds, but there is no need for that static-everything madness. Whole graphs can be reused by many requests with just one static container class, thus avoiding massive garbage collections. They are sacrificing maintainability, the “Fear Driven Development” is the proof.
Interesting points about frequent releases and short feedback loops. You don’t need extensive manual or unit tests if you can release constantly to an alpha site where real users will find issues for you – not a good idea for every business but works well for them.
I didn’t imagine SO runs by so many static methods, though Marco’s explanation, “many instances make GC run so often and it causes low performance, therefore we cannot use TDD which doesn’t work well with static methods”, has so many interesting points of view. Very interesting talk. Thanks!
It’s very strange that at a “Developer Conference”, Mr. Cecconi had to explain concepts like garbage collection!
I think most people were just too shy to raise their hands :) As most conferences are quite expensive I highly doubt that the audience are mostly beginners.
“Use multiple monitors” is just as much posturing as “look cool using a tiny laptop” is. If you’re doing something really hard, like solving a tricky algorithmic problem or chasing an elusive bug, it *doesn’t matter* what kind of monitor’s in front of you because most of the action is in your head anyway. A good programmer can do these things while walking around, exercising (like I will be in a minute), or taking a crap. Multiple big screens can help sometimes, but they can also hurt if they encourage a level of multi-tasking a.k.a. self-interruption that disrupts good concentration.
Hmm good point, but basically I agree with Mr. Cecconi’s opinion, as lots of developers block themselves (and their workflows) with tiny screens, tiny eye-hurting font sizes, small laptop keyboards, not using a mouse (!), no second screen, not using an additional analog notebook (to keep your stuff organized OFF the screen), and attitudes a la “i want to solve this by myself” instead of taking the shortest way etc. This is a super-interesting topic, but we should not discuss this here.
Why should I use mouse? As a developer I prefer to have two hands on keyboard. Mouse is nothing but distraction and slows us down.
Why should we NOT use a mouse ? Switching / moving / organizing things with a keyboard is nearly impossible and slows us down extremely. But in general I think everybody should find his/her personal workflow, so if you can work in a wonderful way without a mouse, then do so, iwould also say it depends on what you are doing. I get frustrated when frontend people need 10sec to close a windows when they try to reach a tiny close button with a sticky touchpad. :) Please let’s skip or move the discussion as it’s not really related to the above article. Cheers!
I just learned a lot of information how Stack Overflow works on behind. Thanks.