Thursday, 19 May 2011

Running Jasmine Tests in a Continuous Integration Build (with TeamCity)

On the last couple of projects I’ve worked on, the need for JavaScript testing has grown as client behaviour has increased. Recent improvements in JavaScript components and libraries have made it easier to move more display behaviour to the client and create richer and more responsive UIs. In order to write tests to cover this code I evaluated the QUnit and Jasmine test frameworks. Jasmine better matched the BDD way I wanted to structure my tests (particularly nesting tests into specs and sub-specs).

We currently run all our unit and integration tests as part of our TeamCity build process, but how do we integrate JavaScript tests, which ideally need to run in an instance of one or more target browsers? The solution is reasonably straightforward but with one limitation (which I hope is temporary).

Enter JsTestDriver. This awesome project allows us to load and execute JavaScript files into a browser that is ‘captured’ by the server process. Full details of how this is done can be found at the project site. Basically for a CI build this captured browser and server process exists on a remote server. The limitation is that this needs to be an interactive session and currently I’ve only managed to get this working under my session, which I leave active and which causes many failed builds when the servers ‘accidently’ restart. A problem for another day.

Once this remote server is configured, it’s really just a matter of setting up the yaml configuration file (jsTestDriver.conf here):

server: http://server-name:9876

load:
  - jasmine/jasmine.js  
  - JasmineAdapter.js

  - fileundertest.js
  - fileundertest.specs.js

And executing something like the following script:

if not exist Out\JsTestDriverResults mkdir Out\JsTestDriverResults
java -jar JsTestDriver-1.3.2.jar --config jsTestDriver.conf --testOutput Out\JsTestDriverResults --reset --verbose --tests all

The server value is the endpoint that is configured on the remote server. The load values are the javascript files that will be loaded and executed. jasmine.js is the Jasmine framework file and JasmineAdapter.js is a great JsTestDriver adapter to support Jasmine. Once these are loaded, any Jasmine tests in subsequent files will be executed.

In the example above, fileundertest.js contains the code we want to test and fileundertest.specs.js contains the Jasmine tests. Obviously you can add as many files as required and even include dependencies such as jQuery, though that starts to get a little ropey. There is a Jasmine plugin that supports jQuery and DOM manipulation for the interested reader.

The results are output to Out\JsTestDriverResults in an JUnit compatible format which can be read by TeamCity when this script is executed as part of a build. These results then show up in TeamCity and any failing tests will be easy to spot.

Sunday, 23 January 2011

MongoDB Replication Performance

Disclaimer: These results are based on me playing around with MongoDB. I have very little experience/understanding of it so don’t assume anything below as the truth. As we all know, YMMV.

We’re mulling over using MongoDB at work to share state between members of our system. One of the reasons I like it is that it is (recently) straightforward to create replica sets to provide redundancy. However, if you want to ensure that changes have been replicated to a quorum of machines, or indeed verify that a write has even reached any MongoDB server, you need to call getLastError. This is all handled transparently in MongoDB drivers by SafeMode.

This is fine, but we need to understand the performance impact that holding out for replication entails, since our application will be blocked waiting while it occurs. I setup a virtual test environment for this (I even setup a domain controller, but that’s another story). I installed MongoDB on two servers and configured them to be part of the same replica set.

My test program used the ‘official’ driver and inserted 1000 Person records into a collection. The collection was dropped before each mode was tested. The test numbers were consistent regardless of the ordering of the modes.

using System;
using System.Diagnostics;
using MongoDB.Driver;

namespace MongoDBTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var server = MongoServer.Create("mongodb://lab-s02:27017");

            Run(server, SafeMode.False, "False");
            Run(server, SafeMode.True, "True");
            Run(server, SafeMode.FSyncTrue, "FSyncTrue");
            Run(server, SafeMode.W2, "W2");
            Run(server, SafeMode.Create(true, true, 2), "W2FSyncTrue");

            Console.ReadKey();
        }

        private static void Run(MongoServer server, SafeMode safeMode, string safeModeName)
        {
            var test = server.GetDatabase("test", SafeMode.FSyncTrue);
            var people = test.GetCollection("people");
            if (people.Exists())
                people.Drop();
            InsertRecords(server, safeMode, safeModeName);
        }

        private static void InsertRecords(MongoServer server, SafeMode safeMode, string safeModeName)
        {
            var db = server.GetDatabase("test", safeMode);
            var people = db.GetCollection("people");
            var sw = new Stopwatch();
            sw.Start();
            var count = 1000;
            for(int i = 0; i < count; i++)
            {
                people.Insert(new Person
                {
                    _id = i,
                    FirstName = "First Name " + i,
                    LastName = "Last Name " + i,
                    Age = (i % 100)
                });
            }
            sw.Stop();
            Console.WriteLine("SafeMode.{0, -12} inserted {1:N0} records in {2:N0}ms ({3:N0}rec/s)", safeModeName + ":", count, sw.ElapsedMilliseconds, count * 1000.0 / sw.ElapsedMilliseconds);
        }
    }

    public class Person
    {
        public int _id { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public int Age { get; set; }
    }
}

I tested 5 modes:

SafeMode.False no call to getLastError (we cannot verify the write was received)
SafeMode.True calls getLastError to confirm the write was received by the server (but not replicated)
SafeMode.FSyncTrue calls fsync (to flush the in-memory state to disk), then getLastError to confirm the write/fsync succeeded (but not replicated)
SafeMode.W2 calls getLastError to confirm the write was received and replicated to 2 servers (includes the master)
SafeMode.W2FSyncTrue (custom) calls getLastError to confirm both the write was flushed to disk and that the write was replicated to 2 servers

And the results were as follows:

SafeMode.False 61ms 16,393 rec/s
SafeMode.True 179ms 5,587 rec/s
SafeMode.FSyncTrue 6,806ms 147 rec/s
SafeMode.W2 31,244m 32 rec/s
SafeMode.W2FSyncTrue (custom) 28,092 36 rec/s

This clearly shows that there is a stiff price to pay for waiting for replication. Ensuring the write has been written to disk is a lot cheaper and may be sufficient for our purposes. These tests were running on virtual machines hosted on my workstation so it would be interesting to see the results on dedicated servers.

Sunday, 10 January 2010

Getting psake and TeamCity to Play Nice – Part 1

I love build automation. It makes me feel all warm inside. I’ve been using TeamCity over the last few weeks to setup builds automation for many of our current projects and it’s working really well. What’s good about TeamCity?

  • Easy to setup with great results out of the box
  • Gives you tons of information and graphs about your builds
  • Highly flexible
  • Integration with bug trackers
  • Plug-in for VS for pre-tested commit
  • Windows tray notifier

We wanted to use psake as our build scripting tool because XML is too verbose for this task and PowerShell gives us as much scripting power as we could ever need. Setting up psake to build our solutions is relatively trivial:

properties {
$base_dir = Resolve-Path .
$source_dir = "$base_dir\Source"
$build_dir = "$base_dir\Build"
$sln_file = "$source_dir\MySolution.sln"
$nunitconsole_path = "$base_dir\Libraries\NUnit\nunit-console.exe"
$mstest_path = "C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\MSTest.exe"

$website_output = "$build_dir\Website"
$configuration = "Release"

$nunit_assemblies =
"`"$source_dir\UnitTests\bin\Release\UnitTests.dll`"",
"`"$source_dir\IntegrationTests\bin\Release\IntegrationTests.dll`""
$mstest_assemblies =
"`"$source_dir\DatabaseTests\bin\Release\DatabaseTests.dll`""
}


##Exposed Tasks##

task default -depends Build
task Build -depends Clean, Compile, Test, CopyToOutputDirectory

##End Exposed Tasks##


task Clean {
&msbuild $sln_file /p:Configuration=$configuration /t:Clean
remove-item $build_dir -recurse -force -ErrorAction SilentlyContinue
}

task Compile {
&msbuild $sln_file /p:Configuration=$configuration /t:Build
}

task Test {
&$nunitconsole_path $nunit_assemblies
foreach($mstest_assembly in $mstest_assemblies) {
&$mstest_path /testcontainer:$mstest_assembly
}
}

task CopyToOutputDirectory {
copy "$source_dir\UI.deploy\$configuration" $website_output -recurse
}

The result of calling ‘invoke-psake build.ps1 Build’:

image

Ok, so we can build and test our application using psake and get visual feedback. Great. Now we want to run this build in TeamCity.

Getting it to Run in TeamCity

This is not a piece of cake. Straight away there is a documented issue with PowerShell that prevents it from being executed simply by putting PowerShell in the ‘Command executable’ field:

image

This just causes TeamCity to hang indefinitely as PowerShell seems to wait for input. It didn’t even seem to trigger the hanged build detection.

The workaround is to use a batch file (I name it Build.bat) to invoke PowerShell and pass something nonsensical to PowerShell when invoking it:

echo abc | powershell .\run-psake.ps1 Build

run-psake.ps1 is a PowerShell script that ensures the environment is configured, i.e. ensures the psake module is loaded, then executes invoke-psake:

if ((get-module -Name "psake") -eq $null) {
import-module psake
}
invoke-psake .\build.ps1 $args

So now we can set the ‘Command executable’ simply to Build.bat:

image

Now TeamCity will run the build, but it will not report build failures, test failures or build metrics. You might get some kudos from your boss when every single build reports as ‘Success’ regardless of whether it compiles or the tests pass, but it won’t help you actually deliver your project. There are probably several ways to address this and I will detail the method I found in my next post.

Monday, 30 November 2009

Database Source Control

I’ll take it as a given that everyone reading this thinks source controlling code is essential. Not everyone feels that same about database source control, or maybe they do, but they don’t know how to do it.

What’s The Problem?

Even in my short career, I’ve had more than enough exposure to shared database hell. With more than one developer on a project, management of database changes is incredibly time consuming and error prone if done by hand, even with a dedicated ‘guardian’ responsible for the schema. With single developer on a project, not having a simple way to manage and track changes is still awkward.

The problem comes from the fact that as a system evolves, the database changes. It is important that these changes are communicated amongst the team members so that when they run up their local build, or less likely their integration tests, that the application doesn’t break because of a missing column or table. Some bright person might have the idea to have a single database that everyone works on so there is only one version of the database. Great, problem solved.

That is until someone wants to make a change. Not having much alternative, they edit the shared database, and suddenly everyone’s local build breaks. Or multiple people debug at the same time using the shared database. By doing this you can guarantee strange behaviour or deadlocks. Even worse, if you do have a ‘guardian’ you might be waiting until next Tuesday to apply your urgent change, crippling your productivity.

Shared database proponents will have you believe that there is data in the database that is needed for the application to run. These people are fools, and need to open their eyes to the wonders of unit tests and integration tests. It is the application logic we’re developing, not the data. There is likely some data that is required for an application to run (AKA static data). This can be dealt with in other ways.

If you go for the shared database approach, how do you know what version of the schema you are at? How confident are you that Johnny hasn’t gone and changed the data type on that column, or deleted some critical lookup data? There are few guarantees working in that way.

Finally, how do you effortlessly deploy the database to a variety of different environments? The first time is easy, just do a backup and restore (barf, don’t take this as advice). But what then? The different environments will have data in them so you can’t backup/restore anymore. So you’re left with a change script, either manually written, or generated by a tool. The latter is probably ok for some people and might work for you, but it is still quite a manual process. It still doesn’t address the previous problems however.

In summary, it is borderline criminal to use a shared database for development. By shared database I mean anything other than a local, dedicated for me, database. It is equally criminal to not source control your database.

What’s The Solution?

There are a few key requirements that I think need to be addressed solve these problems. The solution needs to be

  • Simple
  • Deployable to multiple environments
  • Able to either re-create or upgrade the database
  • Command-line runnable (for builds, scripting and integration testing etc.)

I’ve given VSTS Database Edition a good go and while it is a decent solution in general, I really dislike the deployment mechanism. It is difficult to deploy the database outside of the IDE, which is why the rest of this post is not about it.

My solution follows K. Scott Allen’s guidance on versioning databases. It is a custom implementation that borrows many ideas from the Tarantino project. The required setup for the Tarantino deployer is less than elegant and seems to have a dependency on both Nant and Redgate SQL Compare, neither of which we use internally.

How It Works

Every set of changes to our database schema is scripted in a separate file. If a developer wants to add two tables and delete a column, they can create a SQL script of these commands. The script must exist in a specific folder and follow the naming convention XXX_SomeDescription.sql where XXX is a manually incremented number. This script is checked in to source control in the same way as any other file. Stored procedures, views and triggers are scripted separately to a different directory. Static data is scripted as simple conditional insert/update statements in a third directory.

The tool itself is a command line executable that has two modes of operation – reset and upgrade. Reset drops the database entirely and rebuilds it from the change scripts. It also adds a log table to the database to track what the current version is. Upgrade looks at this log table and executes only the change scripts that haven’t been executed. All the scripts are run transactionally so if any fail, they roll back and further execution halts. One rule of this system is that developers never edit a script once it’s checked in.

So, how does this solution support multiple environment configurations? Well, each environment is configured in a configuration file named MyConfiguration.deploy. These files looks something like this:

Server: (local)
Database: XYZ
IntegratedSecurity: true
MigrationDirectory: ..\..\Source\Database\Migrations
DataDirectory: ..\..\Source\Database\Data
ProgDirectory: ..\..\Source\Database\Programmability
SuppressPrompts: true

Running The Tool

When running the tool, you specify a configuration to deploy and it looks in the appropriate .deploy file for the settings. The default configuration is specified in the deployer.exe.config for the tool, meaning the most common deployment (presumably the local one) can be executed just by running the exe. The command line options are below

deployer.exe [/reset] [/version:X] [Configuration]

/reset - Resets the database (default is upgrade)
/version:X - Deploys up to version X (default is latest)
Configuration - The Configuration to deploy (default in deployer.exe.config)

This can very easily be executed from anywhere. The process to change the database is to add a new change script and run the executable. Simples.

image

This is a typical session for the tool. It’s not particularly fancy, but it gets the job done. The first command will drop and recreate the Development database up to version 3. The second command upgrades the database from there. Notice how the views and stored procedures are recreated both times.

image

This is the migration table within the database, providing all the required information for versioning.

Integration Testing and CI Builds

For integration testing we just run something akin to the below in the assembly SetUp method.

Process.Start(“deployer.exe”, “/reset Local_Testing”).WaitForExit();

Therefore, every time we run our integration tests, we get a fresh database to test with. This could also be used for automated deployments from the CI server, but I prefer to manually execute database changes to non-development environments as they are often more involved. This is clearly an area for improvement though.

Summary

Writing a deployment mechanism like this is really straightforward. If you have a decent knowledge of ADO.NET and follow convention over configuration it shouldn’t take more than a day or two to write. If you don’t have a solution for database source control at the moment consider the existing solutions such as VSTS Database Edition, Tarantino or Migrator.Net to name but a few; or, if you fancy stretching your skills, throw something like this together. I enjoyed the challenge.

Sunday, 1 November 2009

My poor interpretation of Command-Query Separation

Bertrand Meyer’s definition of command-query separation (according to good ol’ Wikipedia) is:

Every method should either be a command that performs an action, or a query that returns data to the caller, but not both.

I like the separation of concerns that this brings to the table and the implicit guarantee that performing a query will never, ever, change the state of the system. I can run my little query methods until the cows come home and I know that all the data is safe and consistent. I like that.

Having this clear separation also helps define the boundaries of must test and should test. Any commands that change the system state are a must test because any bugs in them are likely to mean the system state becomes inconsistent or invalid. It is still important to test the queries, but not as important (still important as the queries will eventually drive the commands).

I’ve taken these principles and created a simple architecture around them for one of my personal projects. For the command side of things I create a class that contains the data required to perform the command. I also define a command handler with the command as a generic type.

public class User
{
    public virtual Guid Id { get; set; }
    public virtual string Name { get; set; }
    public virtual string Email { get; set; }

    public User(string name, string email)
    {
        Name = name;
        Email = email;
    }
}

public class NewUserCommand
{
    public string Name { get; set; }
    public string Email { get; set; }
}

public interface ICommandHandler<TCommand>
{
    void Handle(TCommand command);
}

public class NewUserCommandHandler : ICommandHandler<NewUserCommand>
{
    public void Handle(NewUserCommand command)
    {
        using (var session = NHibernateHelper.SessionFactory.OpenSession())
        using (var tx = session.BeginTransaction())
        {
            var user = new User(command.Name, command.Email);
            session.Save(user);
            tx.Commit();
        }
    }
}

In order to invoke this command handler, I publish commands to a Command Bus. The code below uses Castle Windsor in order to resolve the correct command handler.

public interface ICommandBus
{
    void Send<TCommand>(TCommand command);
    void SendAsync<TCommand>(TCommand command);
}

public class CommandBus : ICommandBus
{
    public static void Configure(IWindsorContainer container)
    {
        container.AddComponentLifeStyle<ICommandBus, CommandBus>(LifestyleType.Transient);

        container.Register(
            AllTypes
                .FromAssemblyContaining(typeof(CommandBus))
                .BasedOn(typeof(ICommandHandler<>))
                .WithService.FirstInterface()
                .Configure(x => x.LifeStyle.Transient));
    }

    public void Send<TCommand>(TCommand command) where TCommand : ICommand
    {
        var commandHandler = IoC.Container.Resolve<ICommandHandler<TCommand>>();
        commandHandler.Handle(command);
    }

    public void SendAsync<TCommand>(TCommand command) where TCommand : ICommand
    {
        var commandHandler = IoC.Container.Resolve<ICommandHandler<TCommand>>();
        ThreadPool.QueueUserWorkItem(x => commandHandler.Handle(command));
    }
}

The reason for having both a synchronous Send and an asynchronous SendAsync is that often I will want to wait for a command to complete before I query the updated model. If I don’t want to query the updated model, I can ‘fire and forget’ the command.

The usage of the command bus from an MVC controller is as simple as:

CommandBus.Send(new NewUserCommand
                    {
                        Name = "Joe Bloggs",
                        Email = "joe.bloggs@fake.com"
                    });

The only responsibility the controller has is to publish the correct commands. This makes testing a breeze. It is also pretty much the embodiment of the Open-Closed Principle; if I want to add new features to the system, I create new commands and command handlers.

I’ve only used this approach for a handful of commands so far, but it works pretty nicely. I’ll detail the querying aspect of my architecture in another post.

Sunday, 25 October 2009

Finding time to blog (and an unrelated rant)

I can’t seem to carve the time out to get a blog post done (I know this is a bit ironic). It’s not like I don’t have the time, it just doesn’t appear to be very high on my list of priorities. I need to put more effort in as I think it will be a fantastic chance for me to learn and develop. Therefore, I’m going to commit to write a post once a fortnight (for now) until I get used to it, starting 1st November.

//Start Rant

On a completely unrelated note, I was doing some research into the current world financial situation. I was truly shocked to discover the true size of the current US (and hence world) debt. When including derivatives, I’ve seen figures between 200 and 1,500 trillion dollars. That’s 4 to 30 times the GDP of the entire world! The world is in a gargantuan hole, and in my opinion it’s been almost exclusively caused by big banks and their invisible mass fraud. The hole is so big, governments are powerless – indeed, governments are at the mercy of banks, praying they don’t burst the bubble. Obama promised so much in his campaign rhetoric, but he’s in bed with the same elite group that put us in this situation. He could act bravely, instead of feeding the fire with silly amounts of tax payer funded ‘recovery’ money, but he won’t.

The recent talk of a recovery is nonsense; there’s no recovery from this, at least not in the long term. I expect that within the next decade, possibly the next few years, we’ll see hyperinflation and the collapse of governments. I hope I’m very wrong.

//End Rant

Thursday, 1 October 2009

First impressions of Git

Git is a distributed version control system. After reading about it on and off for probably over a year, I’ve been seeing many open source projects migrating to it, and I thought I owed it to myself to give it a go. So I downloaded msysgit, opened Git Bash and started typing commands.

We use TFS at work, and compared to that the Git model is a breath of fresh air. The single working directory is the biggest thing to hit me so far – I hate how with TFS you have to create a second copy of the same code on your hard drive, then check it in. What a waste of time, effort and disk space! Performing a branch with Git doesn’t change any of your files, so it’s instant. What a breath of fresh air. Plus swapping between branches is uber-fast and easy. I’d dread having to try that with TFS.

It’s also so quick. There’s none of the 1-2s delay you get in TFS when you start editing a file (that is so freaking annoying). Why can’t TFS let me edit a cached copy and check out in the background? Maybe warn me if you can’t connect so I’ll lose my changes. At least then I’d not break my flow. It’s hard enough to get into the zone as it is; I hate it when my tools get in the way.

I only had chance to play with it for a couple of hours and use the basic commands, but from what I’ve seen so far, it’s very promising. The only concern I have is what the merge conflict resolution process is like. I will have to try some complex branching and merging to get a feel for that.

There is an excellent article by Ryan Carson on git: Why You Should Switch from Subversion to Git. Also, the book Pro Git by Scott Chacon (one of the guys behind github) is available online.

Watch out TFS, Git’s a coming.