Medical records and the new playing field

January 31, 2010 Indraneel Leave a comment

Recently one of our clients was looking to build a product in the healthcare domain which involved accessing and modifying medical records. As a part of the initial engagement I did a comparative study of a few EHRs, EMRs and PHRs. There are just too many of them now, each targeted towards a market of their choice. What impressed me is the amount of integrations that are possible with some of these systems. The good ones come with drug database, insurance claims process, ePrescription integration and a host of others. However, what really struck me is the fact that very few of them use modern application delivery methods. If fact a lot of them are still stuck at desktop application/client-server mode. Some have the ASP model. Few, very few have made the transition towards SaaS. Needless to say, these systems need to head towards the direction the world is headed.
The electronic revolution in medical instruments have been going on for a very long time now. But the rise in the use consumer medical instruments is pretty recent. Today we have blood glucose and pressure monitors in almost every homes. And this got me thinking – how cool it would be if these equipments can automatically send the measurements to say, the PHRs. Yes, we are looking at a new playing field if you are thinking what I’ve been thinking.

Categories: Healthcare Tags: , ,

Getting started with the Cloud

January 25, 2010 Indraneel Leave a comment

There has been so much of hoo-ha about the cloud in recent days that it’s really difficult to filter out the noise. A lot of times I get this question from people – “So how do we start using the cloud?” With the slew of cloud vendors and underlying technologies, it’s really pretty easy to start using the cloud in our day-to-day lives. In this post I will talk only about the IaaS cloud, Amazon cloud components in particular. But you can choose any vendor you like.

Rapid provisioning and disposal of engineering environments

 
One of the biggest advantages of the cloud is that you can fire up an instance in the blink of an eye. OK, maybe not in the blink of an eye, but if you are using Amazon EC2, it takes about 3-5 minutes to fire up an instance. And that is probably umpteen times faster than your IT department provisioning a server for you. And this gives us the opportunity to rapidly provision engineering environments like testing, staging etc. at will. It also saves a great deal of money if you throw away the instance when you don’t need it. Say, you are at towards the end of the first sprint of the first release of the product. The product owner would need to see the demo of the working product at the end of the sprint. He would also like to play with the product a little. But you don’t want to give out your testing environment so that the product owner and the testers don’t step on each other’s toes. So what do you do. You fire up another instance of the AMI that you saved with all the configurations, pre-requisite software and libraries that’s needed to run the product and deploy the finished product to that instance right before the day of the demo. How much time would it take. Not more than 3-5 minutes, for launching the instance and maybe 10 minutes for the product if you have set up automated deployment.
When the product owner is done with playing around, you can just bring down the instance with the click of a button. It is really that simple.
Cloud - Rapid provisioning of engineering environments

Deployment of several versions of products

 
In order to keep development work on groups of features independent of each other for one of the products that we were building, we created branches on which groups of developers would work. This is an excellent strategy if you have short development cycles because if there are a group of features that need to go in simultaneously into the product, you can have a (relatively) long running branch. However these branches have to be tested independently. Also, just before the release, you will have to do away with these branches, deploy the trunk after the merge has happened and test that too. So you will need different versions of the product, residing in separate branches deployed at the same time. And that’s where the cloud comes in handy because you can provision and dispose in a jiffy.
Cloud - Automated Deployments

Saving application state

 
Have you ever faced the situation when you found a huge bug in a product but the developers could not replicate it and marked the bug as invalid? I bet you have. A lot of times even saving the bug as a video as mentioned in one of my posts, does not help unearthing the real problem. That’s because, most of the times, these issues are embedded in the environment or the data. How good it would be if you could freeze the entire state of the application – session, data, environment and all, bundle it all up and gave it to the development team. Some developers will hate you for that, but I’m sure they’ll thank you later. One of the biggest advantages the cloud gives us is bundling up the entire environment. If you are using Amazon EC2, you can just bundle up the instance, data, environment, session and all into an AMI and go on your way of testing the application again.
Cloud - Saving the application state

Cloudification of a part of the application services

 
If you are in an organization where you cannot deploy anything outside the organization firewalls due to security policy or if the product itself is such that using public infrastructure like the Amazon cloud is not a viable option (one example could be IP protection), you can still use the cloud. Let me take the example of an application that indexes huge documents or maybe media amongst doing a lot of other things. Now, indexing media is very computationally intensive. And what is the cloud good for if it can’t take up the burden of computationally intensive stuff? You can keep the entire application inside the firewall and host a web-service on the cloud which listens for indexing requests and fires up an instance as soon as it gets one and processes the request. Better yet, you can enable Amazon Cloudwatch and set thresholds so that it fires up more instances when one instance is thrashing. Simple but powerful.

 
Cloudify a part of application services

As you can see, it’s really easy to start taking the advantages of public cloud components. So go ahead and get started today.

Continuous performance testing

January 23, 2010 Indraneel Leave a comment

Once I asked an engineering manager what they were doing to ensure that the product they were building would be able to endure peak load. After a blank stare, all I got was that they were adding features and would worry about performance later, when they development hubbub settled down.
After the product was built one of the team members ran a performance test and found out a lot of the user actions took an inordinately amount of time to complete even under a concurrent user load of just 5.
Leaving out performance considerations till the end of the development cycles is just dangerous. I have seen teams who were forced to make architectural changes right before the release and then scurry to meet release deadlines, because they never found out what the performance impact of an architectural decision they took 4 sprints ago.
That said, in one of the product teams that I was working with, found out that some of the user actions was taking an inordinate amount of time because users complained about it. This is perhaps the last thing that you want. But anyways, we agreed that we needed to know performance impact of each new feature we pushed out and each enhancement we made. We were doing weekly releases and so we knew that it would really require quite a bit of automation. Choosing performance scenarios, creating scripts, running them and analyzing the results was too much to be done manually every week.
Enter Bamboo, JMeter and Gruff. Bamboo would run the performance scripts created in JMeter along with the build every Tuesday before the build got pushed to the staging sever. We can store the results for the scenarios in MySQL and create graphs with Gruff and show them as Bamboo “artifacts” for the Tuesday builds. Yes, I know what you are thinking. Maybe we should have used the rrdtool instead of MySQL and Gruff. We tried using rrdtool and gave up. Mainly because we wanted to keep all our performance data, not just a window of it. Since we were not running very heavy loads we decided to generate the load from the Bamboo machine itself since it was a pretty decent machine. We kept the target machine isolated to make sure nothing else was done on it except performance tests.
A ruby script would trigger the JMeter scripts just after the build is complete, bundled and deployed on the target sever by capistrano. For result aggregation we used the XSLT provided by JMeter and tweaked it to produce XML instead of HTML. The aggregated data went into the MySQL database. And then we used Gruff to create the graphs and store them as Bamboo artifacts.
Here is a little sample of what it looked like in the end.

Performance test with Jmeter, Bamboo and Gruff

With the performance tests running with every Tuesday builds, it was now easy to see the performance impact of adding the new features or enhancements.

Pre-loading the distributed cache

January 22, 2010 Indraneel Leave a comment

We have been using memcache for distributed caching in one of the products that we were working with. We had 4 application servers with 2GB of memory dedicated to memcache in each of the machines. A lot of times when we delivered new functionality or a patch, we needed to restart the memcache servers. And then for a day, the site thrashed. We got horrible response times for a lot of our pages which were supposed to be cached. We digged a little deep and found out that the cache loader which was supposed to pre-load the cache needed about 50 hours to complete. No wonder we were getting horrible response times in pages that were supposed to be cached. What we actually needed was a multi-threaded pre-loader running on two different machines. With some code optimizations and 16 threads pre-loading the pages into memcache, it took us about 3 hours.

More about email deliverability

December 20, 2009 Indraneel 1 comment

Sometime ago I wrote a post about how to send emails without getting flagged as SPAM. Unfortunately I had missed out on a few things.

rDNS

One of the most important things in email deliverability is rDNS (also known as the PTR record). rDNS stands for “reverse” DNS. Here is an example how rDNS works:

  • A mail header says that the sender is abc@abc.com and it was sent from 111.222.333.444
  • The receiving mail server verifies if 111.222.333.4444 really points to abc.com by a rDNS lookup

Mail servers like AOL and Google are very very particular about rDNS i.e – they would check for reverse DNS entries for each mail received by them. Your important newsletter is almost sure to make it to the bulk email folder if you haven’t setup the reverse DNS entry for your domain.
Now if you are hosting your nice web-application on Amazon EC2, reverse DNS won’t work for you, because Amazon won’t set your reverse lookup. You will have to use a third-party email service like authsmtp or fastmail to send out mass emails with maximum deliverability.

Domain Keys Identified Mail

Domain Keys Identified Mail is a method for email authentication (as the DKIM website says). Basically DKIM allows the sender of an email to sign the email using public key cryptography. Prominent email services like Yahoo, Gmail and Fastmail implement DKIM. This is how it works in a nutshell:

  • The sender of the email adds a header-field named “DKIM-Signature” which contains a digital signature of the contents of the header and body of the email message
  • The receiving SMTP server does a DNS lookup and gets the public key for the domain.
  • It uses the public key to decrypt the message

You can use Javamail with DKIM for sending out that important newsletter from your web application.

Sender policy Framework

SPF is a special format DNS record which specifies which machines can send emails for that domain.

  • For example the owner of abc.com can determine which hosts are allowed to send emails whose sender email address ends in @abc.com
  • Receivers who check SPF can reject messages from unauthorized hosts before receiving the message body

And here’s what the SPF record may look like:

abc.com TXT “v=spf1 ip4:111.222.333.444 –all”

White lists, Black lists

  • Blacklists are lists of domain names which are known to send SPAM emails. Basically they are the lists of known offenders.
  • Whitelists are the opposite – lists by which an ISP allows someone to bypass spam filters when sending emails to its subscribers


So …. Get Registered

MxToolBox and ReturnPath are both good options

JIRA-git integration

December 19, 2009 Indraneel Leave a comment

I’ve been messing around with JIRA for quite sometime now. So when one of our product teams got stuck with integrating JIRA with git, I tried to help. There was a JIRA-git plugin over here and it worked fine. Problems arose when the plugin did not pull out commits to the branches in JIRA. So I went ahead and fixed it. Now the plugin pulls up all the changes from the branches but unfortunately it does not show the branch name along with the name of the changed file in the JIRA tab. I’ll try and fix that soon. In the meantime you can download and use the modified plugin from this location.

Continuous Integration and Automated Deployment for PHP

January 16, 2009 Indraneel Leave a comment

I had set up the Continuous Integration for a few Ruby on Rails products including Workstreamr and PaidInterviews. It was easy with rake, the ci_reporter plugin, rcov and Bamboo. I also used Capistrano for automated deployments after each build. A new product development started a few days ago and I was called upon to setup the Continuous Integration platform again. It was different this time though. This new product was to be developed in PHP.
Rake is a general purpose build tool just like ant and maven. And Capistrano is the best deployment tool I have encountered so far. So the choice was easy. This is how I created a rake task to execute all the unit tests and produce test coverage reports with PHPUnit.

Yeah, yeah, I know, I could have done the same thing with a shell script. But when I start creating tasks for database migration and such, things are going to get messy and tracking dependencies wouldn’t be easy with shell script.

PHPUnit can also produce coverage reports. So I included the second task which would produce coverage reports.

Now for the deployment script.

And now I bundle all this up neatly in a shell script and give it to Bamboo. Here is what my build-deploy.sh script looks like:

#/bin/bash
rake build:report_coverage
cap remote deploy

The PC Quest article

January 16, 2009 Indraneel Leave a comment

Some of my comments appeared in the January 2009 edition of PC Quest. The online version of it is available over here.

Categories: Cloud Computing

Cloud Computing – Large scale computing for everyone

November 2, 2008 Indraneel Leave a comment

In the beginning of 2008, New York Times ingested 405,000 very large TIFF images, 3.3 million articles in SGML and 405,000 xml files mapping articles to rectangular regions in the TIFF’s using Amazon Web Services, Hadoop and some custom code. This data was converted to a more web-friendly 810,000 PNG images and 405,000 Javascript files containing JSON in less than 36 hours.

Why is this a big deal?

NASA has been computing on far greater scale than this for a long time. NASA’s weather simulation software is computationally a lot more complex than indexing documents.
So why is this a big deal? It’s a big deal because it was neither NASA nor CERN, not even Google. It was a business who did this without buying a single machine. They rented computing power on the fly. They rented slices of a cloud.

What is cloud computing?

A few days ago a journalist said “There is clear consensus that there is no consensus on what cloud computing is”. I like to think of cloud computing as the commercialization of computing resources like CPU cycles, storage, memory etc just like public utilities like electricity, water or natural gas. At the very core of the cloud is virtualization. Virtualization is a technique in which software is used to completely simulate or emulate hardware.

Types of clouds

I see two distinct categories of clouds that vendors are selling today:

  • Infrastructure as a Service – IaaS vendors sell raw compute power – CPU cycles, memory, bandwidth etc. IaaS clouds are complex but with the complexity comes flexibility. Most cloud vendors allow root access to an instance. And hence, specialized knowledge is necessary to handle such flexibility.
  • Platform as a service – PaaS refers to those clouds which provide frameworks and infrastructure on which users can build applications. PaaS clouds are built on IaaS clouds. Most PaaS clouds are very restrictive. They generally allow users to build applications on a particular set or sets of technologies. For example Google App Engine allows users to build applications using Python only. Portability is an inherent issue with PaaS clouds, because of the lack of standards in this domain. So if you have built an application using Google App Engine and BigTable you probably won’t be ableto port the data to any other cloud without spending a huge amount of time and money.

Inside the cloud

At a very high level clouds are made up of the following layers:

Inside the cloud

Inside the cloud

  1. At the very bottom is the hardware layer. Many cloud vendors build their clouds out of of the shelf server class software. For example Joyent uses Dell servers with quad core intel processors for their cloud. Plumbing refers to the networking elements in the cloud with all the fast router, switches and load balancers connected by fiber optic cabling. Clusters, made up of ordinary server class machines make up the skeleton of the cloud.
  2. Storage services refer the storage provided by the cloud. Most cloud vendors offer SAN or NAS storage. Provisioning is generally on the fly and users can ask for virtually unlimited amount of storage.
  3. As mentioned earlier, virtualization is at the very core of the cloud. Virtualization has made creation of a software machine as a clone of an existing one super fast. Think of the cluster (mentioned earlier) as one mega machine with one host OS managing all its resources. Creating virtual machines with pre-defined CPU and memory is fast and easy. Many vendors like Amazon Web Services use Xen virtualization.
  4. Platform services are bunch of pre-installed and packaged goodies that an user of the cloud gets whenever an instance of the cloud is brought up. The LAMP stack supported by AWS and Joyent is an example of platform services.
  5. No matter what the vendor says, if it takes more than 10-15 minutes to bring up an instance, then it is not a cloud. The web services layer is the one that enables users to templatize an instance, bring up a new instance from a template, take backups, restore from a backup etc. instantly, as and when needed.

Just in time deployment using the cloud

Deployment of products is messy business. Not so long ago, fledgling organizations had to first calculate the amount of computing resources needed for a launch, translate that into hardware requirements, call up the hardware vendors or the hosting company and wait till they provisioned the hardware and then installed and configured the software. Provisioning, installation and configuration took several days. It was a lose-lose scenario for everyone. Product success meant another cycle of calls and provisioning while the users suffered due to unresponsive software caused due to heavy load. Product failure meant huge losses due to unused hardware.
Not any more with the advent of the cloud. Now product launches can happen at the click of a button with just enough computing resources sitting behind a Virtual IP. The utilization of the resources are closely monitored. New, templatized instances of the cloud are instantiated whenever the threshold for the monitored utilization is reached. The users of the cloud pay for what they use at any instant of time. The users of the product never find it unresponsive, since computing resources are always adequate, just in time to meet the users’ needs.

Large scale computing for all

So long the ability to do large scale computing was within the reach of an elite club of businesses. Google, Amazon and Yahoo were amongst the very few in the club. Few businesses had the means to lay their hands on infrastructure of that scale. Cloud computing has changed that. Today, a ‘large’ Amazon EC2 instance with 4 EC2 compute units (which is equivalent to the capacity of 4 Opteron or Xeon processors) and 7.5 Gigs of memory costs as less as $288 per month. Users can choose from quite a few operating systems and scale up and down on the fly. Application development platforms like JBoss Enterprise Application Platform and Ruby on Rails come built into it. Clouds have opened the doors of large scale computing to virtually everyone.

Back to blogosphere

November 2, 2008 Indraneel Leave a comment

September was a crazy month. PaidInterviews got launched at DEMO and I worked really very hard. By the end of September, things eased out. I got some breathing room. The financial crisis had changed the world by then. I took a couple of vacations for Durga Puja and Dewali. And oh boy, did I need them! I’m well rested and focused now. I’ll post a few things very soon.

Categories: Product Engineering Tags: