Cloud: Why? (Mind the Gap Series, #1)

We are living in a posh world: There are now many ways to achieve a simple requirement with software, maybe way too many. Too many languages, too many patterns, too many platforms, too many developers, etc. Now we have cloud; increasing the speed of delivery while reducing the infra costs and complexity. It also has one hell of a learning curve, but that's an article for another day. As mentioned in the intro, I'll be talking about how implemented a cloud transformation, namely on Azure, at my latest client. They are one of the largest investment companies with a very large (and until recently, very complex) organisational chart with offices all over the world.

Imagine the complexity and the diversity of the organisation, the amount of effort required to make the transformation (even partially), the challenges not just technologically but also politically. We managed to pull it off as a team and I'm hoping this article series will make it visible the journey to you.

Let's kick-off the series with a short, simple question: Why? Why did they decide to go cloud?



Why?

The short answer is, why not?

Some time ago, my client decided to move from their existing on-premise trading application and adopt Aladdin from BlackRock. The version of Aladdin they bought was a cloud based service, hosted by BlackRock itself. This was a great thing, they didn't have to buy big servers on racks, even cooling them is a cost. There are certain SLAs BlackRock needs to honour, so your business continues without interruption (you can check BlackRock's website for all the features, otherwise I would have to charge them for advertisement) (Yes, joking).

Ultimately, this was game-changing and a great step towards the future; but it also brought many other questions: How do you connect with an external application, which is at the heart of your business? And if your heart lives outside of your body, how do you connect the veins properly so your body still gets blood pumped to it?

So, if the technology is so advanced and you are getting a new heart, why not update the rest of the organs as well? If you are changing the core application you have, and it is in the cloud, it simply makes sense to start carrying the rest of the applications to the cloud as well. Or at least, start developing new applications there.

It would solve most of your on-premise issues, such as infrastructure bottlenecks, and it would definitely speed up your delivery. It's also a great investment for the future and it would be quite ironical if an investment company didn't do it.

What?

Let's take a step back and talk about our game plan here. There were certain things we needed to do: We didn't have an Azure subscription to start with, let alone an operation model. We had to come up with a roadmap.

Our aim has not been simple, but it was clear:

  • Create a development environment for our developers to start working right away
  • Create an operating model that will support both us and applications
  • Integrate with not just Aladdin, but also with on-premise apps
  • Craft applications using the available tools on Azure and with best practices
  • Create an API layer to create an integration point between apps and consumers
  • Create CI/CD pipelines to ensure software quality
  • Craft a solution to process and store batch data in a central store, so anyone can reach the data they need easily
  • Process the data through the store and make it available to other apps like Power BI
  • Create in-company best practices and distribute them across teams

With these goals in mind, we started making small castles in our sandpit.

How?

This is where we came in. I joined the team through James Saffron, long after the Aladdin decision. It was shortly after that cloud provider was selected and a little headway was made. When I joined, there was already a basic version of a cloud operating model in place for development in a sandpit environment, but the non-prod and prod environments were still under discussion. AD was recently synced up with Azure AD, we had a subscription that we could mess around with. The Retail team was already elbow-deep into it with their Data Lake project.

The reason I joined in the team was to create report and analytics tools with the incoming Aladdin data using the Azure Data Lake Store and Analytics. The Data Lake might not be the perfect tool to do the job, but still it did the job well.

Just before our project, company kicked off the API project to use Apigee Edge to create an API layer. It was for ingesting the Aladdin data like us and expose it via REST APIs. The APIs would serve both internal and third party consumers. Their goal was for it to be a live endpoint while we were producing analytics and reporting data.

Attacking from both sides, we started tackling the problems and building our platform. We had many gaps to fill, which I'll mention shortly below. I'll also explain each of them in their own articles.

Development Environment

The first thing we needed to begin development or at least experimenting was an Azure subscription that we could play around with. We needed a subscription with full control over our resources, so the first thing we had to do was create a subscription model in three stages:

  • Sandpit: A full playground for us to experiment, build, demolish resources as we wish. No real data is allowed in this environment.
  • Non-prod: Development environments for our projects after we decided how. We still have full control over our resources except subscription level actions. Includes Dev, Integration and Test environments. Contains non-sensitive but non-production data.
  • Prod: No-developer fly zone, except read-only permissions. Contains Pre-Production and Production environments, only release and support teams has contributor permissions.

Existing on-premise developer desktops were quite limited in both specs and permissions. They didn't allow us to install new tools to experiment with, the specs wouldn't be able to handle them anyway. To overcome this, we were issued with high-performance Azure VMs to develop our applications on. We had admin permissions on the machine, allowing us to experiment freely. Of course, it was still behind a firewall to prevent any type of data loss.

Source Control, CI/CD

We chose Visual Studio Team Services for our source control, build and deployment needs. VSTS is a magnificent tool, allowing us to work through Git and deploy through our release pipeline.

We created our own accounts on VSTS and started development. After the process became stable, we promoted it to become company-wide and all the teams started using it.

Application Security

The on-premise world (Windows-based one anyway) trusts Active Directory to take over security. On Azure, we have Azure Active Directory which provides the same capabilities in different ways. Azure AD was created and synced up from AD for us. This allowed us to utilise existing user credentials as well as the security groups in the cloud.

Azure Active Directory supports both OpenID Connect and OAuth 2.0, allowing us to authenticate and authorise both our users and applications with the directory. Azure AD also allows role based access control as well as security groups. This allowed us to assign users and groups into application specific roles, reducing the complexity on AD groups and increasing flexibility of applications.

Credential Rotation

On-premise environments are mostly locked down and are not exposed to the internet. But the moment you step out on to the cloud, you need to take precautions such as rotating your credentials. Microsoft didn't offer an out of the box solution for this. We created a solution that rotates the credentials frequently and notifies the consumer applications' configuration accordingly.

On-premise Connectivity

In order to access on-premise resources, our API proxies have been deployed within an App Service Environment, which had firewall access to internal resources. All the traffic from Azure to on-premise was communicated through these APIs. The other way around this is achieved directly with out-going internet access permissions.

Conclusion

There are many other things I haven't mentioned; one reason is because the article is now too long. The other one is, I need to fish out all the details buried deep in my mind along the way.

Keep an eye out for the new articles on this series, many more coming. I am hoping they will be much more detailed and will help you with your transformation as well.