Menu

Designing for Failure and Recovering from Failure – c110044gwpl

Course #: c110044gwpl

Duration: 0.8 Hours

The Designing for Failure and Recovering from Failure module covers how to use practices to design for failure. The module includes a closer look into three specific practices and then explores the concept of recovering from failure invocation.

Objectives

  • Design and implement a circuit-breaker pattern for failure management
  • Limit the effects of outages by using limited blast radius practices
  • Improve application resiliency with chaos testing
  • Learn software testing resiliency practices
  • Learn when to Use Availability Zones versus Multi-zone Region failure domains on IBM Cloud

Audience

This course is intended for learners who are pursuing professional-level site reliability engineer certification on IBM Cloud.

Prerequisites

Before starting this curriculum, the target audience should understand:
•System Thinking
•DevOps practices
•Cloud Architecture
•Software engineering principles
•System administration
•Network and OSI model
•Networking and security practices for IBM Cloud
•Incident management
•Root cause analysis

The target audience should also be able to:
•Proficiently write code
•Create run books as a reference
•Make system components serviceable
•Interpret data and statistics to determine actions
•Use LogDNA, SysDig, Grafana, Prometheus, Kibana
•Interpret schematics
•Drive incidents to resolution
•Remediate underlying sources of unreliability
•Create and configure VMs
•Create and configure Containers on IBM Kubernetes Service (IKS)/Red Hat OpenShift Kubernetes Services (ROKS)
•Create and configure Containers using OpenShift
•Create and configure Serverless applications
•Configure for high availability and scalability

Topics

Module Introduction
Topic 1: Designing for Failure
Topic 2: Designing and Implementing Circuit-breaker Pattern for Failure
Topic 3: Limiting Effects of an Outage Using Limited Blast Radius Practices
Topic 4: Improving Application Resiliency with Chaos Testing
Topic 5: Software Testing Resiliency Practices
Topic 6: When to Use Availability Zones Versus Multi-zone Regions Failure Domains on IBM Cloud
Module Summary

Contact us regarding the training