The cost of bureaucracy in software companies

A lot of software companies suffer from the same problem, as they grow they fall into the trap of adding unnecessary bureaucracy. Now I’m not talking about making sure a work environment is safe, or that people are looked after, or that no bullying takes place. Maintaining a healthy workplace needs certain levels of procedures and policies.

However as organizations grow it becomes so easy to to create unnecessary policies, ones that are so tight they kill any sense of autonomy.

Lets take an extreme example of this, lets say one day someone gets caught torrenting files at work. Apart from firing them what else could happen? A policy could be set, firewalls could be turned up to max, teams could be brought in to police things, you could lock the software users have to a standard set. There’s a lot of ways to stop this, but now that one person has ruined things for the future.

Lets look at this mathematically. Lets say you bring in a firewall team, a security team and a compliance team to keep people in check, and lets say some talented developer needs to run a new service on port 8080 (really, that isn’t even an odd port!)

Now a change that would have taken no time suddenly involves 3 new people, each one bringing their own level of efficiency. If each person is 80% efficient, here’s what happens:

Before

Overall Efficiency = 80%

After

Overall Efficiency = 80% * 80% * 80% * 80% = 40.96%

It gets worse!

That’s right, by adding just 3 people to a decision, working at 80% we’ve managed to reduce overall efficiency by a half!

Now think about decisions in an organization, any modern organization. Think what it would take to change some text on a production website, or feeding back on requirements. As companies grow they bring in various departments and every single point requires more people to discuss.

Even the best of people, working at 90% can get stumped by this. Even at that pace a decision that needs 6 other people means you run at 47%.

With people working at 50% that could fall to 7% overall, essentially grinding progress to a halt

Summary

There are levels of bureaucracy that are necessary, but I fail to see how so many organizations fail to understand the most basic notions of how important to individuals and themselves autonomy really is.

Read More

Debugging a Cloudera Hadoop install in the cloud

From the offset I will be honest, hadoop is a nightmare to setup, its versions are all over the place, miss-matches lead to random failures and its just not a fun thing to be doing. However the nice people at Cloudera have a much easier solution to all of this, the provide a nice management interface to install your cluster. Though this is almost seamless, there are a few gotcha’s that you need to be aware of that can catch you out. So my hints are below:

Firewalls

The nodes in a hadoop cluster talk to each other in a lot of different ways, the number of ports you need open depending on your configuration is mind blowing, and from the way things are with hadoop its also ever changing! The shortcut to this is to shut off your firewall using a command such as:

service iptables stop

Or the equivalent for your linux version. Now I’m well aware this isn’t best practice, but if you’re just getting something up and running to test out or are hitting a brick wall and want to make sure its not a firewall problem then its a good test. Later on I’ll cover a long term fix for this.

DNS

Hadoop expects a fully working DNS setup, however this isn’t always in line with how cloud providers set up their servers. For instance my host of choice is RackSpace, who are awesome by the way, but when you setup new nodes they all get names so you can do things like:

ping datanode1

However if you have 3 data nodes there is no way for nodes 2 or 3 to know about data node 1, or each other. If you end up in this state your hadoop cluster gets in all sorts of a mess, some systems use DNS, some use IP’s and it’s impossible to know what’s going on.

The fix for this is easy, you need a working DNS system. This can either be achieved by setting up a fully working DNS server (various cloud providers support this or roll your own on a linux box) or if you have a small cluster you can do this manually. If you edit the /etc/hosts file it will contain a list of IP to name mappings separated by tabs, such as

127.0.0.1    locahost
198.0.0.1    node1

Now all you need to do is add IP to name mappings for any other servers in your cluster and you’re sorted.

Long Term

Longer term the suggested fixes above just aren’t feasible. Shutting off the firewall is not smart and manually setting up DNS is a long process. This advice is just to help you over that first hurdle and get things working. If you plan to invest in a production hadoop cluster I suggest going with a tool such as puppet to setup your servers so they are ready for Cloudera but also secure.

Read More