Electric Sheep Blog: June 2007

JavaScript on Rails is here, and it promises to be as good as Ruby on Rails!

After years of an undeserved reputation of being that half-baked, inconsistent scripting language that was used to validate form fields on browsers, JavaScript, or more precisely ECMAScript appears to be progressing in leaps and bounds. Steve Yegge predicted a few months ago that in it's next avatar (ECMAScript 4), it would have what it takes to be the Next Big Language. ECMAScript 4 supports a whole bunch of totally sexy (I can't think of a better adjective) features. To quote from the Wikipedia article:

Classes
Packages and namespaces
Optional static typing
Generators and iterators
Destructuring assignment (likely)
JSON Encoding/Decoding

Not to mention performance improvements as a consequence of the optional static typing.

Steve also obviously believes in putting his code where his mouth is, because he's gone and ported the whole of Rails - yes, you got that right, ported it line by blessed line - to JavaScript. His implementation uses the Rhino engine which runs on the JVM. My guess is this port of Rails to JavaScript will be far more effective than other attempts using mainstream languages like Java. As a language JavaScript is as (if not more) open and expressive as Ruby. If you want an example of JavaScript's expressiveness as a language, go check out the superb jQuery library if you haven't already done so. It will knock you off your feet, I guarantee you.

This just makes the case stronger for bringing business logic to the browser and getting rid go all those annoying get or post parameter based web applications. I mean seriously, if an architect suggested building a desktop thick client where the controllers and models were only on the server and the UI communicated with the controller by passing strings to it to trigger state changes in the model, he'd be considered officially insane. But the vast majority of state interaction type web applications (those with complex domain models) use such an architecture and nobody considers it odd.

Bottom line - once ECMAScript 4 is out and browsers start supporting it, all the 'thick clients are dead, long live the browser' weenies finally have a case. But only because the browser would've stopped being thin.

You may also want to read: Bringing business logic to the browser, or why you should develop in JavaScript

Be the nail that sticks out - how to get hired by an interesting company

When I've posted about recruitment in the past and how ThoughtWorks strives to ensure that truly passionate and competent people are hired, at least one comment says "But do we really need that level of competence in plenty?" or something similar. It's an interesting question, but its obvious by observing hiring practices in India's vast outsourcing industry that most people don't think so. The question is, what does this mean for the developer who cares about his code?

Traditional career paths treat writing code as a menial task (that damn 'software is a commodity' philosophy). To get anywhere, you have to make team lead in 2 years, architect/project manager in 7 and then you start scaling the heights of upper-management. You need, essentially, to go from being a hacker to being a suit. This means that at most companies, it's people who have 0-4 years of experience who actually write code. And most of the time they have little say in the design or other decision making - a highly frustrating position to be in.

So the obvious step for someone who is an alpha geek in the wrong job is to find another, more suitable one. And the truth is, there aren't that many programming jobs out there which would suit. Worse, these jobs are distributed across several companies ranging from Google down to small start-ups in stealth mode. One ray of light is that the personnel departments (I dislike the phrase 'Human Resource', I mean, come on...) of these companies are as desperate to hire you as you are to work for them. In an industry flooded with 9-to-5 developers for whom writing code is 'just a job', separating the wheat from the chaff can be very difficult indeed.

Here is some advice to help get yourself noticed by such companies. Interestingly, they will work as a filter to weed out employers like the one you're trying to get away from. It's just two words - 'stand out'.

Be opinionated. You can't write code, care about it, and not have opinions. In your current organisation you may get hammered for it, but hey, that's why you want to quit in the first place, right?
State these opinions somewhere for the record. A blog, perhaps. It shows the evolution of your opinions over time and gives you a talking point during an interview. Please don't blindly copy someone else's. Of course, if I needed to tell you that, then I'm targeting the wrong audience with this post :-). I'm assuming you'd be too proud to do that anyways.
You don't need to make your resume conform to the industry standard. For gods sakes, some of the things I see on resumes under the 'Objectives' section are plain silly. I'm pretty sure that the people who wrote them thought they were silly too, but were forced to put them in there because everyone knows it's expected in a resume. If a company won't hire you because your resume doesn't have an objective along the lines of 'To strive for the betterment of the organisation and for personal growth to achieve success', then you probably don't want to work there, right?
Don't make assumptions about career paths. Most organisations where alpha geeks are respected also have flexible roles.
Say which programming languages you like in your resume. Also say why. Like I said, if you think Java is an ugly language and someone won't hire you because of that (despite the fact that you've done n projects in java, know it well and thus have a sound basis for your judgement), then you probably don't want to work there, right?
Try to publish some code on your blog or on Google code or something. Link to it in your blog and your resume. Put your code where your mouth is. If you are what you claim to be in you resume, then prove it in code. My initial impression of someone goes up substantially if they've linked to their code in their resumes.

If you have any additional suggestions which I can add to this list, please let me know and I'll drop them in.

Update 2007-07-01: As promised, here are more suggestions distilled from the comments on this post. Do keep them coming.

Sunil: Some people I've come across have half-baked opinions, that they throw around to impress. I'm sure they'll be shot down during interviews. Know your stuff before you go around pronouncing opinions.
Amy Isikoff Newell: Be a woman.
Amy Isikoff Newell: Be honest. If you took time off for something important (like having kids), say so in your resume instead of trying to hide the 'gap', as they call it.
Neil Bartlett: Look for opportunities to speak at conferences. Start small, for example with you local User Group, work up from there, and in a little while you could be presenting at JavaOne.
Sidu: List your programming experience as far back as it goes. I have more years of BASIC than C++ and more of C++ than Java/C#/Ruby (and this is true for several of my classmates from school). People who have done programming in school usually tend to choose it as career because they've tried it for a while and decided they like it, not because it offers good job prospects (something which happens a lot in India).

You may also want to read: What developers look for when they consider a job offer

The best of the worst: Standard patterns of horrible code from Java and .Net

There are some standard worst practices in code which make my teeth ache when I see them. They can appear at any time in a code base and look very innocent but will turn everything around them into spaghetti in short order. Someone somewhere near you is typing out one of these insidious bits of code as you read this. You have been warned.
I've already written about the one where people create namespaces and constants to handle types. Something like


public interface CarType{
 public static final String AUDI = "audi";
 public static final String MARUTI = "maruti";
}

The constants are usually ints, or even worse, Strings. The latter start popping up everywhere, including (in one bad case) eventually showing up as a piece of text on a UI. But enough of that, let's move on to today's rant.

Today, recruitment had bribed some of us developers into a code review session (our interview process requires candidates to submit code) over lunch by ordering from Subway. Some of the code I reviewed brought back two of the most annoying, yet widespread development worst practices. Here they are, one from Java and the other from .Net

.Net: Everything is a DataSet
Aargh. Why of why is there this obsession with mapping everything to the rows and columns of a DataSet? Everything is not a DataSet. They're People and Employees and Magnolias. I've seen code where there wasn't a single developer created class! And this from people with several years of experience, not freshers. The flow inevitably looks something like

 Create DataSet -> Read from xml file/webservice -> Optionally pipe it around a bit using a few web services -> Pump into DataSet -> Perform operations, creating and manipulating DataTables, DataColumns and DataRows by the dozen with no classes and tons of procedural logic -> Bind these to UI objects

. If you're really unlucky, that 'tons of procedural code' I mentioned will be in a class which inherits from Form. A more thorough, consistent and violent violation of OO principles I am yet to come across.

I would suggest that XStream .Net and NHibernate be used instead to achieve the convenience that DataSets offer in the short term, but over a longer period and while maintaining code quality.

Java: Every field must have a getter and a setter
That strange and wonderful idea, the bean, has made a whole generation of Java developers completely inured to the idea of maintaining encapsulation. The bean was created to solve a specific category of problem, but the structure of the bean has spread like a virus, destroying the integrity of the most innocent of domain objects. When doctors wish to test the reflexes of such developers, the preferred method is no longer a mallet applied smartly just below the kneecap. No, you need just give them a class in an editor, ask them to add a field and check if they automatically add a getter and a setter for it.
The only time using a setter is acceptable is

if the state of the class cannot be corrupted by using it (calling setText on a TextBox can change its state, but does not corrupt it)
if the framework demands it

Getters aren't as bad as setters, but over time on a large code base they encourage the processing of data outside of the class to which it belongs. From there, it's a short and slippery slope to code duplication and other evils. In combination, getters and setters can reduce once healthy domain objects to pale DTOs.

I would suggest simply avoiding setters unless a change of state for that field does not corrupt the object. A lot of frameworks now understand this and help developers do away with setters. Hibernate, for example, allows you to configure field level access (Hibernate will populate your domain object fields directly using reflection, even if they are private), thus removing the need for setters.
For getters, use them only when the data read using the getter will never be processed - in other words, only in situations like binding to a UI or persisting to a database.

Bringing business logic to the browser, or why you should develop in JavaScript

A few weeks ago, I'd observed an interesting side-effect of building an AJAX web application - that the content served asynchronously would not be available to search engines. I'd written a post about it, trying to express the idea that there may be a way of categorising web applications based on whether the content served is the kind that it is valuable to index for search (Wikipedia, Amazon) or where it simply doesn't make sense (Google docs, the vast majority of enterprise web applications). Unfortunately, I made two mistakes. One was that I used the terms 'web site' and 'web application' to distinguish between the two - these terms already mean a lot of things to a lot of people - and second, the idea was raw and my articulation poor.

However, that idea has stayed with me and I've been thinking about it a fair bit since that last post and I've come to the conclusion that what I'd made the focus of my post was merely a side-effect of other, larger trends we're seeing around web frameworks, the relevance of HTML and the way we do web development in general. Please bear with me while I put these ideas down and feel free to flame me to your heart's content in the comments section :-). I'm looking forward to the feedback on this.

First off, I want to get rid of the language baggage for the purposes of this post. I'm going to christen the two types of web applications I have in mind as the web based Information Publisher application and the web based State Interaction application.

An Information Publishing application is a web application whose primary focus is to serve content. The content can be video, audio, text, whatever, but the job of the application is to just serve content. A classic example would be something like Wikipedia or Google search. In both cases, there is information which needs to be served to an audience for consumption. The fact that content in Wikipedia is fairly static while that in Google search is highly dynamic doesn't change the fact that they still essentially publish information.

Contrast this with the web based State Interaction application where the objective of the application is to allow the users to interact with various entities and affect their state (again, Google docs and most enterprise applications). You may publish information as the end product (generate reports, say), but this is not to be confused with the act of creating that information by allowing user interactions with various entities.

The interesting thing is that many web applications have both types and you see one or the other depending on what you're doing with them. Any decent wiki is a good example - when you're in 'edit' mode, it's a State Interaction application (a developer would immediately see entities like Page, Content, History etc.) and when you are viewing wiki content, it's an Information Publisher. The authentication functionality that many websites display is again an example of a State Interaction built into what is otherwise an Information Publisher.

Now, let me explain why I think this distinction is important.
The common underlying infrastructure available to both types of applications is the web browser, with rendering through HTML and communication with the server through gets, posts or AJAX.

In the case of an Information Publisher, this infrastructure is ideally suited to the task. HTML was, after all intended for precisely such uses. Communication with the server to request certain kinds of information can be handled easily and elegantly using name-value pairs (parameters) in a get or post request. A Google search request for the word 'hello' looks like this: http://www.google.co.in/search?q=hello. Nice.
State Interaction applications on the other hand usually have a whole bunch of entities which the user needs to interact with and which usually also need to interact with each other. These live on the server and their interaction with the user is through a user interface rendered using HTML. Changes to the state of an entity or object happen, again, by sending name-value pairs or a json string containing data to the server where these are parsed and some action is taken to alter the states of various objects. To put it bluntly, you have a bunch of objects which demand a high degree of interactivity and we get our UI to talk to them by passing strings around! Not so nice.

The fact that this pain has been felt by developers can be seen in the evolution of web development over time. A bunch of frameworks and tools have been created which build abstractions over this infrastructure to make State Interaction application development easier. For example, ASP.Net 1.1 tried to bring in abstractions which mimicked those used in WinForms. However, attaching an event trigger to a check box (something we do routinely in thick clients) would result in a page reload every time that box was checked or un-checked. The abstraction was defeated by the limitations of infrastructure used (posts). You could develop the same way as you did in WinForms, but the results were far from satisfactory. Things like this made it obvious that while we did need an abstraction, it couldn't really mimic the desktop world where the UI and the model are a method call (or ten :-)) apart. Sure, you're still using a MVC, but very differently from how you would in a thick client. And a whole lot of web frameworks like Spring and Rails have sprung up to support this abstraction. However, these still failed to address the fundamental problem - that highly interactive UI's cannot effectively communicate with their models (data binding, anyone?) by passing strings around.

Why is it that people say thick clients are more interactive than web clients? That this is true is not in doubt, or we wouldn't have such a hullabaloo about AJAX and the responsiveness it introduces. Often, this lack of responsiveness is blamed on the rendering engine of the browser, which renders content using HTML. Obviously, you're told, a markup language cannot be as flexible and easy to develop UI elements in as is a thick client rendering environment using abstractions like Panels, MenusBars and what have you. But this is no longer true since the entire HTML DOM is available for us to manipulate using javascript and the DOM tree structure is remarkably similar to the tree structure of nested widgets in a thick client, something the GWT has used to build up an excellent abstraction - but more on that later in the post.

Therefore, I concluded that this lack of responsiveness has less to do with the rendering medium and is mostly because the communication pipe between the UI and its backing model in a web application is far less effective that that in a thick client.
Which of course begs the question, 'Why have we been so poor in bringing the model from to server the browser?' The obvious answer is performance. Until recently (in fact I'll go so far as to say until the release of Firefox 2) the performance of javascript was so poor as to prevent its utilisation for anything more than a handful of field validations. Creating more than a couple of dozen or so DropAreas on a page using Scriptaculous would make dragging anything so slow that it was next to unusable. But javascript performance has increased in leaps and bounds and it is now possible to actually develop full fledged MVC architectures running purely on the browser, much like a thick client. AJAX is used purely to sync the model on the client with the model on the server, exactly like you would in a thick client. One of the earliest abstractions developed to support this model of State Interaction application design was the GWT. People have shied away from developing applications purely in javascript for many reasons, but the GWT eliminated most of them in one fell swoop. However, there is a general awareness now that developing within a disciplined framework makes life a lot easier (a lesson learned from Rails) and we're seeing javascript MVC frameworks like Jamal and TrimPath surface which are trying to build on this experience to make disciplined development in javascript easier. We've seen pure javascript client applications before in websites like Netvibes, a feed reader with a thick client feel and bunch of desktop UI entities like windows, titlebars and tabs which has been around since 2005. However, it's only now that we're seeing this style of development starting to move into the mainstream.

Of course, these abstractions still stuffer from limitations imposed by the underlying infrastructure. For one, javascript doesn't support threading, so data binding in a GWT application should be handled delicately or you could end up with annoying screen freezes. But all things said and done, these applications are still far more responsive than traditional web applications where the model sits only on the server. As importantly, they are far easier to develop since as I said before, server calls are purely for model syncing and your view objects can talk to model objects using method calls.

Having said that, client side models are a fairly bad idea for a Information Publisher for the simple reason that they're quite unnecessary. The challenge, in my opinion, is to clearly identify which portions of a website require State Interactions and which are good old Information Publishers and implement them accordingly using the appropriate technologies. I'd gotten quite gung-ho about GWT and went and developed an Information Publisher type website using it, only to realise later that none of the content was available through a Google search (I did however fix that issue, but it was a hack which won't scale). You can imagine where this could be crucially important to some websites, especially those which sell products or services. AJAX and dynamic rendering are pretty cool, but should be used appropriately.

To summarise, when developing a web application, it is important to identify which parts of it are information publishers and which are state interactions. As the complexity of the state interactions increases, one should seriously consider bringing the model from the server to the client and using AJAX just to keep the model data in sync. Tools like the GWT, Jamal and TrimPath not only make this possible, but also supply a whole lot of infrastructure (like unit testing and debugging in the GWT, scaffolding in TrimPath) to make the developer's life easier when developing in javascript.

Update: 2007/06/27
I'd originally titled this article 'Why things like the GWT and Jamal are going to help keep web developers sane' but changed it because it was rather vague.

You may also want to read: When should you choose Google's GWT for your web app?

Summary of a talk on Ruby deployment - highly recommended

Ron Evans has summarised a talk given by Ezra Zygmuntowicz on day 2 of Rails Conf. This is a must read. The post is here. Look for the section titled 'Xen And The Art of Deployment'.

The 'pre-run tasks' unit testing anti-pattern

The fundamental idea behind unit tests is that you run them all the time. You will probably run a single test every two or three minutes as you make changes to your code, and a suite of tests at least every twenty minutes, probably oftener. This means that every change you make to a piece of logic is followed by running the test for that piece to validate the change.
Given this style of usage, everything about your tests should be built to suit. Some the obvious and oft repeated best practices to support this include

Ensuring that every unit test be atomic, testing just a single piece of logic. A fairly good rule of thumb is that you would have one test per method, and your methods don't have more than a few (<15) lines

Ensuring that every unit test have a fresh state and not depend on any previous tests to create that state

Now, the 'pre-run' anti-pattern surfaces where you have a set of infrastructure tasks that need to get run to set up a suitable environment for running tests. I'm not referring to the things you'd do in your setUp() method (which sets up the program state rather than the environment in which the program runs), but rather to situations where (say) your tests require a set of xml files to run against and these need to be copied into the test directory from somewhere else.
This is something I've noticed happen when developers run tests only from ant, rake or whatever build script they're using. There, it's natural to just add these tasks to the dependencies of the test task and be done with it. You don't have a problem because the dependency graph in your script ensures that the environment is as it should be for those tests.
This increases the threshold for unit testing and low thresholds to running tests frequently are very important, because the effect they have over time in the behavior patterns of a team can be quite dramatic. As a consequence of these external dependencies, developers are discouraged from actually using unit tests the way they're meant to be used, because you're now forced to run your tests from a build script.
You want to encourage people to run tests in the manner I described above. Ideally, you should be able to do a fresh checkout of a code-base and go in and run any one test directly and have it go green.