Tuesday 4 June 2013

From Continuous Delivery to Continuous Quality Delivery



Everybody is talking about Continuous Delivery (CD) these days. In most of the conferences that I attended CD was a topic being discussed amongst developers or DevOps. I worked on a couple of projects recently and had some interesting observations. I think that CD is not just about DevOps being involved and QA is not just about testing. It in fact is a confluence of both those roles and many more.

Continuous Quality Delivery is making the application available to the end user in a stable state as often as possible and to give some business value at the same time.

As a QA I can contribute to the CD process in many ways thus transforming it to Continuous Quality Delivery. The objective is to avoid any defects in the first place or at least find and fix them as early in the development cycle as possible.

When I joined the team, I noticed that we were trying to achieve CD but the release process itself was too long and took about two days to get something out into production.
I took the initiative of getting people together to refine the release process. After a few sessions of facilitations over a couple of weeks of continuous improvement, we got to a place where we could actually release a working piece of software in a couple of hours.

In a typical delivery team it is always not possible to make every check-in committed by the developer to be production ready. So it is a good idea to user feature toggles. (I will leave it to a separate discussion as to why feature toggles are better than release branches) Feature toggles are generally used for different reasons: To hide incomplete features or to observe user behaviour by toggling a completed feature on or off.
This is a powerful technique to control the state of the application. But this needs continuous maintenance in terms of removing the toggle once the feature is completed along with the code and associated tests. As a QA I was on top of monitoring from the beginning, which stories would need toggles and which ones need to be removed. Thus keeping the quality of the application and the code base to a high level.
I also had conversations with the BAs trying to see if we can write and line up stories in such a way that we do not need toggles in the first place, thus keeping the toggle maintenance to its minimum. We could do this by playing stories in such a way that the backend functionality is developed first and then moving on to the front end visible stories.

Continuous Quality Delivery or other wise it is a good practice for a QA to pair with the BAs in story reviews to bridge gaps in requirements, to identify test data, to identify additional test cases.
Similarly it is also beneficial to constantly interact with developers ensuring the level of automation test coverage and finding defects while they are developing. I found it immensely useful in attending tech huddles, dev analysis, code reviews etc to understand the code and implementation better. This would enable me to think of other scenarios of testing apart from just black box testing. I found it necessary to write automation tests as much as we could to minimise the manual testing and regression time.

Build time plays a significant role of how quickly a developer can commit his code and how soon can it go through different environments to production. There was a time when the team started feeling the pain of the increased build time. I started tracking the build time and making it visible to the team. I got together with the operations people to increase the build agents so that we could run the tests in parallel. This surfaced a new challenge when we found out that some of the tests were dependent on each other. So I paired with the developers to make the tests autonomous. This helped us in reducing the build time.

The QAs were constantly trying to make sure that the manual regression test suite was kept to a minimum to reduce the manual testing time. While we were trying to add newer tests to the suite, at the same time older tests were being discarded. But for business functionality scenarios would always be present.

The QAs were using a Blue-Green deployment strategy to release the application into production. This would ensure that we have almost no downtime and also give us the confidence of testing on a "production" environment before it turns into "live"

The QAs were also holding retrospectives to make sure that we were always improving as a role in contributing to Continuous Quality Delivery.

The project was a grand success and the road map in this area was to go form weekly releases to daily releases.





Wednesday 29 May 2013

Performance does matter

Assumptions -- This is my experience on a Web based Java application. There will be a separate post around Stress and Load testing.

When you start off on a project, it is very easy to forget about performance testing and the significance of it.
I tried my best to sell the idea of performance testing to the business but failed initially. The lesson I learned was to talk in the language of the stakeholders. Performance testing is not something we do at the end. By monitoring application performance at regular intervals, it is easier to analyse and fix the code. It would also educate us in avoiding the mistakes in the future. Thus we can avoid big bang performance testing and fixing right before the release. The stakeholders need to be informed about the advantages of doing continuous performance testing.

There are two types of performance testing namely "front end" and "back end".
Front end performance deals with how different browsers respond to the scripts. It analysis the page load / wait times of different components of the page. This translates in to user behaviour. Statistics state that on an average an user is happy to wait for four seconds for the page to load. Anything over ten seconds, you would start losing potential customers to the site.
Back end performance deals with how different services and servers cope up with the load. Database performance is also covered under this area.

We can run the performance tests at different levels namely Application, Service, Unit levels. I treat these tests similar to how I would treat my functional tests. And like automated tests, the higher the level of tests the more difficult it gets to analyse the problem.

Different tools would allow you to run these tests. The tools may also vary based on the programming language you are using.
For example JProfile is used to monitor JVMs and the application at a lower level (for classes and methods) or dotTrace in the .Net world.
After which we can test the application at a service / API level (maybe by using tools like Jmeter for generating the load)
And the top level is to hit the application through the web interface or even headless for starters. I have used Browsermob (Neustar) or VSTS if you are a .Net addict.

Running the performance tests from your local machines would limit you to the performance of those machines itself. So it is recommended to run it on the cloud. You could fire the test scripts to make it run against different build agents or virtual machines.

We need to make sure the performance test environments are configured and set to be production like. It is a common belief that having a production like environment is very expensive. The alternative solution is to fire up virtual machines. This worked for me as in some cases the production environments were also hosted on virtual machines.
It is highly beneficial to have the performance test environments in an independent infrastructure. If they are in a shared environment then we need to analyse how much and how often the other systems are going to impact the performance of the application.
Because our application was hosted on virtual machines, we had the privilege to scale up the machines and services to observe performance improvements.

We had to make sure that the performance test scripts that are written is mimicking actual user behaviour. For application level tests we were comparing statistics between the beta and the legacy systems. This helped us to configure our tests accordingly. We were running these tests from Browsermob(Neustar) but monitoring was done through internally built tools using Graphite Gdash.

What and how you monitor plays a significant role.
From the front end perspective, the things you may need to typically monitor is the response time (how long does it take to serve a request) and throughput (how many requests can it serve per second). If you have a legacy system then you can compare these statistics generated from production to the stats on the performance environment.
Ideally the monitoring tools should be sitting as close to the application servers as possible. If the monitoring tools are too far away in space and time then you would not have real time accurate statistics. If the tools are in the same environment as the system under test, then that too may effect the performance of the environment.
Common issues that one faces are memory leaks and CPU reaching its limits. So from the back end perspective, things you would find helpful in monitoring are CPU usage and memory for various machines and services.

Ideally you should not take into account network lag as part of the metrics. As you can do only as much about it. And it would also hide the actual problems in the system under test. Hence the run the tests through the cloud servers located closest to your environments.

Once you run the performance test and capture the results, then it is time to tackle the problem. Finding the bottleneck and fixing it may result in another bottleneck surfacing up.

While performance testing, change one thing at a time. For example try not to change the script and the application version at the same time. In that case if you see any discrepancies then you do not know whether it was the application version or the script that caused it. Hence it is a good idea to create baselines for every change you make and take baby steps to achieve the end result.

One of the good practices is to run automated performance test scripts as part of your Continuous Integration pipeline. By taking this approach, we had to set a performance threshold based on which the build would fail. This gives us a fast feedback like any other functional tests.


Some lessons learned:

1) Something which helped increase the performance of the system is caching. We should have some level of intelligent caching. Too much of caching will not help either as users would start seeing stale information. Caching can be considered at several levels. You can have caching at three different levels: the Content Delivery Network (CDN) level, the application level or even at the service / database levels.
While coding we should make sure that static content is cached at some level. Because in our case the static content was being served from one of the services which was eating up all its memory.

2) Services and databases should not be doing background tasks while their primary purpose is to serve the customer. Having other instances of these services to do the background tasks would decrease the risk of performance problems.

3) Sometimes databases queries too can be expensive. Monitoring these queries and to try and minimise the calls to the database is an effective method of increasing the system performance.

4) There may be instances where you would have some sections of the pages making extra calls to the backend systems to retrieve additional information even though not all end users may require it. So it would be better to analyse these conditions and wrap these sections such that the additional calls would be made only when a user genuinely wants that information.



Note: Browsermob has its only API through which we can write the perf test scripts. We can write two types of scripts. One which runs the test through the browser and the other one is headless. The costs differ based on which one you run.

P.S. It is tricky to write performance tests with sessions and logins in it.