At the AWS Summit in Dubai

Great day at the AWS Summit in Dubai. Nice refresher and networking.

Glitch or Outage?

Do you find it difficult to understand if that spike on your service was actually a glitch or an outage?

This is what I think:

  • Less than a minute: Often considered a glitch, especially if it self-corrects quickly and has minimal impact on users.
  • 1-5 minutes: This could be a glitch or an outage depending on the factors like criticality of service, regulation etc. [To be defined]
  • More than 5 minutes: This is more likely considered a definite outage.

Now, what about a series of glitches? :P

Eid Mubarak

While many are enjoying the Eid break, it's still a hectic time for managers, especially in Engineering, as we work to complete performance reviews before next week’s deadline. Personally, balancing these reviews with other business priorities has been challenging to say the least.

In the meantime, we're busy addressing tasks from the SAMA pre-audit, which is adding to the workload. I'm really pleased with the progress we’re making toward preparing GCP as a potential second site in KSA, and I want to acknowledge the hard work and dedication from everyone involved.

Self disposable network troubleshooting pod

This script provides a quick way to launch a temporary network troubleshooting pod in Kubernetes, run commands interactively, and then clean it up automatically (pod is deleted when you exit the shell). Good for debugging network issues (e.g., DNS, connectivity, HTTP requests) and running ad-hoc commands in a disposable environment.

I have this aliased as 'nt'

#!/bin/bash

# Define the name of the pod
POD_NAME="nettools"

# Check if the pod exists
if kubectl get pod "$POD_NAME" -n default &> /dev/null; then
  echo "Pod $POD_NAME exists. Executing command..."
  kubectl exec -it "$POD_NAME" -n default -- /bin/bash
else
  # Create the pod
  echo "Creating pod $POD_NAME..."
  kubectl run $POD_NAME -n default --image=wbitt/network-multitool:latest --restart=Never --overrides='{"spec": {"terminationGracePeriodSeconds": 2}}' -- sleep infinity

  echo "Waiting for pod $POD_NAME to be in the 'Running' state..."
  while [[ $(kubectl get pod $POD_NAME -n default -o jsonpath='{.status.phase}') != "Running" ]]; do
      sleep 1
  done

  # Shell into the pod
  echo "Shelling into the pod $POD_NAME..."
  kubectl exec -it $POD_NAME -n default -- /bin/bash

  # Delete the pod after exiting the shell
  echo "Deleting pod $POD_NAME..."
  kubectl delete pod $POD_NAME -n default --grace-period=0 
fi

What is Platform Engineering?

I like to summarise it as follows...

  1. Treat your development teams as clients
  2. Abstract environment and tooling complexity
  3. Enable self-service in a cost-effective, secure manner that is manageable at scale

Another round of layoffs...

Another round of layoffs at my company. A whole team of 4 slashed, Engineering Manager, Vice President not spared. All in all 8 made redundant in a company of around 100.

With the tech layoffs going around the world, I feel this is the new normal. Hiring and firing will continue. We have to accept a level of risk now wherever we are working.

The only advice:

  • Give your best, but not at expense of your own time
  • Be more visible with your work
  • Work to enhance relationship with your peers, not with the Company
  • Always focus on your own learning and skills so you are hirable