Minimise service disruptions and how to best recover fast

Liberty IT Consulting Group is an Australian IT company. Liberty services the banking and finance sector and specialises in transformation Project Management. Key areas of focus are core banking, digital, integration, migrations, cloud, project quality assurance, project governance, advisory, and project recovery. Liberty’s business analysts, project managers, directors, PMO professionals, engineers, and agile practitioners are the finest in their field.

Visit us at https://lnkd.in/gZ7FeqkJ for all your project delivery needs and let us be your solution implementation partner of choice. 

#Technology #Digital #Outages

TRINA: Tell us what to do to minimise service disruptions and how do we recover. Theo. So, in recent history, we’ve seen some pretty significant outages at some of the major Financial Services systems. Sometimes, they are even reported in social media before the operations teams themselves know that the systems are down. 

Many of these outages have caused havoc and delays and have infuriated consumers and merchants. What steps should we be taking to minimise service disruptions and should they occur, how best to recover fast from these? 

THEO: Yeah, so look I think everybody sitting here, I think has dealt with … in the most simplest way… a log file runs out on the database and it brings the whole bank down. I think we’ve all dealt with those ones. Last year, Akamai went down and I think … four banks … three of the large Banks went down at the same time. So, as I say, these things happen, even in the best of planned things. I would say today for me, there’s two areas to look at. The one is smart monitoring and dashboards to look at … say, rather prevent it before it happened. The other one is the design of your production architecture. 

In terms of the monitoring … and I’ll sort of delve into both areas here. In terms of monitoring, as I said, most of the companies here, I think use Splunk or New Relic or any similar application but also I think what we’re seeing and what we’re advocating as well as to add … sort of, AI apps, tools and technology that actually … will self-learn your environment and actually bring up more scenarios and sort of … self healing of your systems and platforms as well. And I think, those things have gone a long way that we’ve seen implemented with our customers to actually prevent these incidents happening. 

The other one that we do … a key design feature … we call it a circuit breaker design in terms of your production architecture. 

What that means is that when you look at the components of your production stack, if one component starts failing, it disconnects or the circuit breaker trips in, which means that, although one component is down, the whole bank, large part of the bank is still operating. It is not bringing the whole Bank down. And I think, what we’ve also lightly advocating our customers is to move to the cloud native platforms. Especially, as I said, multi-cloud environments because that by itself provides more resilience in terms of your deployment. For example, if you look at your Cloud deployment architecture, it provides a sort of automatic load balancing in terms of a multi-cloud approach, it allows you elastic scaling of your Kubernetes clusters as well. And if you used instead of your service mesh you can easily in real time actually move production now. So we’re not even talking about restoration time. right? In today’s world, we don’t need down time. So as I said before, they need technologies in terms of cloud available, you can really minimize your down time, if you deploy properly … and the cloud architecture. As I said, from our side as well, as I said, is one of the key features that we’re driving customers to adopt Cloud deployments as well.

Liberty IT Consulting Group
ABN: 83 614 846 098

DOWNLOAD WHITEPAPER

"*" indicates required fields

Full Name*
This field is for validation purposes and should be left unchanged.