AWS Certified DevOps Engineer - Professional Completed

Congratulations! You have successfully completed the AWS Certified DevOps Engineer - Professional  exam and you are now AWS Certified. You can now use the AWS Certified DevOps Engineer - Professional  credential to gain recognition and visibility for your proven experience with AWS services.
...
Overall Score: 78%
Topic Level Scoring:
1.0  Continuous Delivery and Process Automation: 70%
2.0  Monitoring, Metrics, and Logging: 93%
3.0  Security, Governance, and Validation: 75%
4.0  High Availability and Elasticity: 91%

Supercharge your CloudFormation templates with Jinja2 Templating Engine

If you are working in an AWS public cloud environment chances are that you have authored a number of CloudFormation templates over the years to define your infrastructure as code. As powerful as this tool is, it has a glaring shortcoming: the templates are fairly static having no inline template expansion feature (think GCP Cloud Deployment Manager.) Due to this limitation, many teams end up copy-pasting similar templates to cater for minor differences like environment (dev, test, prod etc.) and resource names (S3 bucket names etc.)

Enter Jinja2. A modern and powerful templating language for Python. In this blog post I will demonstrate a way to use Jinja2 to enable dynamic expressions and perform variable substitution in your CloudFormation templates.

First lets get the prerequisites out of the way. To use Jinja2, we need to install Python, pip and of course Jinja2.

Install Python

$ sudo yum install python

Install pip

$ curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
$ sudo python get-pip.py

Install Jinja2

$ pip install Jinja2

To invoke Jinja2, we will use a simple python wrapper script.

$ vi j2.py

Copy the following contents to the file j2.py

import os
import sys
import jinja2

sys.stdout.write(jinja2.Template(sys.stdin.read()).render(env=os.environ))

Save and exit the editor

Now let’s create a simple CloudFormation template and transform it through Jinja2:

$ vi template1.yaml

Copy the following contents to the file template1.yaml

—

AWSTemplateFormatVersion: ‘2010-09-09’
Description: Simple S3 bucket for {{ env[‘ENVIRONMENT_NAME’] }}
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: InstallFiles-{{ env[‘AWS_ACCOUNT_NUMBER’] }}

As you can see it’s the most basic CloudFormation template with one exception, we are using Jinja2 variable for substituting the environment variable. Now lets run this template through Jinja2:

Lets first export the environment variables

$ export ENVIRONMENT_NAME=Development
$ export AWS_ACCOUNT_NUMBER=1234567890


Run the following command:

$ cat template1.yaml | python j2.py

The result of this command will be as follows:

—
AWSTemplateFormatVersion: ‘2010-09-09’
Description: Simple S3 bucket for Development
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: InstallFiles-1234567890

As you can see Jinja2 has expanded the variable names in the template. This provides us with a powerful mechanism to insert environment variables into our CloudFormation templates.

Lets take another example, what if we wanted to create multiple S3 buckets in an automated manner. Generally in such a case we would have to copy paste the S3 resource block. With Jinja2, this becomes a matter of adding a simple “for” loop:

$ vi template2.yaml

Copy the following contents to the file template2.yaml

—
AWSTemplateFormatVersion: ‘2010-09-09’
Description: Simple S3 bucket for {{ env[‘ENVIRONMENT_NAME’] }}
Resources:
{% for i in range(1,3) %}
  S3Bucket{{ i }}:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: InstallFiles-{{ env[‘AWS_ACCOUNT_NUMBER’] }}-{{ i }}
{% endfor %}

Run the following command:

$ cat template2.yaml | python j2.py

The result of this command will be as follows:

AWSTemplateFormatVersion: ‘2010-09-09’
Description: Simple S3 bucket for Development
Resources:
  S3Bucket1:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: InstallFiles-1234567890-1
  S3Bucket2:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: InstallFiles-1234567890-2

As you can see the resulting template has two S3 Resource blocks. The output of the command can be redirected to another template file to be later used in stack creation.

I am sure you will appreciate the possibilities Jinja2 brings to enhance your CloudFormation templates. Do note that I have barely scratched the surface of this topic, and I highly recommend you to have a look at the Template Designer Documentation found at http://jinja.pocoo.org/docs/2.10/templates/ to explore more possibilities. If you are using Ansible, do note that Ansible uses Jinja2 templating to enable dynamic expressions and access to variables. In this case you can get rid of the Python wrapper script mentioned in this article and use Ansible directly for template expansion.

Enable Cost Allocation Tags to differentiate project based billing

When running in an AWS public cloud environment, many times there is a need to dissect the billing across different projects for accounting and accrual purposes. AWS provides a mechanism to aggregate related platform costs using a feature known as Cost Allocation Tags. With this feature you can designate Tags on your AWS resources to track costs on a detailed level.

From the AWS Documentation:

Activating tags for cost allocation tells AWS that the associated cost data for these tags should be made available throughout the billing pipeline. Once activated, cost allocation tags can be used as a dimension of grouping and filtering in Cost Explorer, as well as for refining AWS budget criteria.

For example, to view cost allocation based on various project resources in your AWS account, you can tag these resources (EC2 instances, S3 buckets, etc) with a tag named “Project”. Next the Project tag can be activated as a Cost Allocation Tag. From then on AWS will include this tag in associated cost data to allow for filtering based in the tag in Cost Explorer reports.

Let’s walk through the steps of setting this up:

  1. Log in to your AWS Management Console
  2. Tag all the resources with a Tag Key as Project and Value as per your various projects. Understand that this may not be possible for every resource type.
  3. Navigate to My Billing Dashboard > Cost Allocation Tags
  4. Under User-Defined Cost Allocation Tags section, select the tag “Project” and click the “Activate” button.

Fig-1

Once a tag is activated it will take around 24 hours for billing data to appear under this tag.

Next, to view the costs under a project, do the following:

  1. Log in to your AWS Management Console
  2. Navigate to My Billing Dashboard > Cost Explorer
  3. Click “Launch Cost Explorer”
  4. On the right side of the page under Filters section, click the Tag filter and select the Project tag, then the Tag Value to filter cost by the Project

2018-01-05_150042

As you can see from the screenshot below, now we can see exactly how much each project is costing per day (or month, if selected)

2018-01-05_145028

Some important points to consider:

  • Cost allocation tagging is “managed” via the master billing account at the root of the AWS organization. If your account is part of an organization you will have to contact this account administrator to enable the cost allocation tags.2018-01-05_145000
  • The error message in the previous screenshot will always appear in tenancies not allocated the management permission.
  • Some resources notably bandwidth charges cannot be tagged and thus cannot be accounted under cost allocation tagging. A common pattern in such cases is to calculate percentage cost on each project and cost the unaccounted charges based on this percentage.

Install SSL Certificate on load balancer

Go to Load Balancer

Click Change under “SSL Certificate”

 

 Select upload a new SSL Certificate…

a.       Give a certificate name

b.       In Private key box, copy/paste content from the file <wildcard_authoritiesonline_com_au.key>

c.       In Public Key Certificate, copy/paste content from the file <wildcard_authoritiesonline_com_au.crt>

d.       In certificate chain, copy/paste content from the file <DigiCertCA.crt>

 Click Save

Check Website URL from browser. It should be updated immediately and you should not see any certificate error.

 

 

Watching the watcher – Monitoring the EC2Config Service

EC2Config service is a nifty Windows service provided by Amazon that performs many important chores on instances based on AWS Windows Server 2003-2012 R2 AMIs. These tasks include (but are not limited to):

  • Initial start-up tasks when the instance is first started (e.g. executing the user data, setting random Administrator account password etc)
  • Display wallpaper information to the desktop background.
  • Run Sysprep and shut down the instance

More details about this service can be found at Amazon’s webpage

Another important aspect of EC2Config service is that it can be configured to send performance metrics to CloudWatch. Example of these metrics are Available Memory, Free Disk Space, Page File Usage to name a few. The problem we faced is sometimes this service will either stop or fail to start due to a misconfigured configuration file. Having this service running all the time was critical for monitoring and compliance reasons.

To make sure that this service was running and publishing metrics to CloudWatch, we came up with a simple solution. We used a Python script written as a Lambda function to query Windows performance metrics for the last 10 minutes (function scheduled to run every 30-minute interval configurable through Lambda Trigger) and if the metric was missing, send an alert.

Following is the code written for this purpose. The salient features of the code are:

  1. The function lambda_handler is invoked by Lambda
  2. Variable are initialised, currently these are coded in to the function but they can also be parametrized using Environment Variables feature of a Lambda function
  3. Ec2 and CloudWatch objects are initialised
  4. Running Instances are retrieved based on “running” filter
  5. If an Instance is running for less than the period requested than ignore this instance (this avoids false alarms for instances started in the last few minutes)
  6. Cloudwatch metric ‘Available Memory’ for the instance is retrieved for last 10 min. This can be substituted with any other metric name. Please also take note of the Dimension of the metric
  7. Datapoint result is inspected, if no Datapoint is found this instance is added to a list (later used for alert)
  8. If the list has some values, an alert is sent via SNS topic

#
#
# AWS Lambda Python script to query for Cloudwatch metrics for all running
# EC2 instance and if unavailable send a message through an SNS topic
# to check for EC2Config service
#
# Required IAM permissions:
#   ec2:DescribeInstances
#   sns:Publish
#   cloudwatch:GetMetricStatistics
#   dynamodb: Read/Write to CWCheckData Table
#
# Setup:
# Check these in the code (Search *1 and *2):
#   *1 : Confirm details of the parameters
#   *2 : Confirm details of the dimensions
#   Define Environment Variable "CustomerID" while creating Lambda function

from __future__ import print_function
import boto3
import sys
import os
from calendar import timegm
from datetime import datetime, timedelta
import json
import decimal
from boto3.dynamodb.conditions import Key, Attr
from botocore.exceptions import ClientError

class DecimalEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, decimal.Decimal):
            if o % 1 > 0:
                return float(o)
            else:
                return int(o)
        return super(DecimalEncoder, self).default(o)

def dynamodb_create_table():
    dynamodb = boto3.resource('dynamodb')

    table = dynamodb.create_table(
        TableName='CWCheckData',
        KeySchema=[
            {
                'AttributeName': 'instance_id',
                'KeyType': 'HASH'  #Partition key
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'instance_id',
                'AttributeType': 'S'
            }

        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 5,
            'WriteCapacityUnits': 5
        }
    )

    print("CW_Missing_Metrics: Table status:", table.table_status)

# Get one value from table
def dynamodb_get_single_value(table_name, qry_col_name, qry_col_value, rslt_col_name):

    ret_value = ""

    _region = "ap-southeast-2"  # Region
    dynamodb = boto3.resource("dynamodb", _region)

    table = dynamodb.Table(table_name)

    try:
        response = table.query(
            KeyConditionExpression=Key(qry_col_name).eq(qry_col_value)
        )

    except ClientError as e:
        print("CW_Missing_Metrics: Error (dynamodb_get_single_value): ", e.response['Error']['Message'])
    else:
        for i in response['Items']:
            ret_value = i[rslt_col_name]

    return ret_value

# Set one value from table
def dynamodb_set_single_value(table_name, upd_col_name, upd_col_value, new_col_name, new_col_value):

    _region = "ap-southeast-2"  # Region
    dynamodb = boto3.resource("dynamodb", _region)

    table = dynamodb.Table(table_name)

    try:
        response = table.update_item(
            Key={
                upd_col_name: upd_col_value
            },
            UpdateExpression="set {0} = :b".format(new_col_name),
            ExpressionAttributeValues={
                ':b': new_col_value
            },
            ReturnValues="UPDATED_NEW"
        )
    except ClientError as e:
        print("CW_Missing_Metrics: Error (dynamodb_set_single_value): ", e.response['Error']['Message'])
    else:
        print("CW_Missing_Metrics: Successfully added/updated record to new value")

def check_tag_present_x(instance, tag_name):
    temp_tags = ""
    for tag in instance.tags:
        if tag['Key'] == tag_name:
            return True

    return False

def check_tag_present(instance, tag_name, tag_value):
    for tag in instance.tags:
        if tag['Key'] == tag_name:
            if tag['Value'] == tag_value:
                return True

    return False

def send_alert(list_instances, topic_arn):
    if topic_arn == "":
        print("CW_Missing_Metrics: Missing topic ARN. Returning without sending alert.")
        return

    instances = ""

    for s in list_instances:
        instances += s
        instances += "\n\n"

    subject = os.getenv('CustomerID', '') + " - Warning: Missing CloudWatch metric data"
    message = "Warning: Missing CloudWatch metric data for the following instance id(s): \n\n" + instances + "Check the EC2Config service is running and the config file in C:\\Program Files\\Amazon\\Ec2ConfigService\\Settings is correct."

    print("CW_Missing_Metrics: *** Sending alert ***")
    print("CW_Missing_Metrics: Message: {0}".format(message))

    client = boto3.client('sns')
    response = client.publish(TargetArn=topic_arn, Message=message, Subject=subject)

def lambda_handler(event, context):

    # *1-Provide the following information
    _instancetagname = 'Environment'  # Main filter Tag key
    _instancetagvalue = 'PROD'  # Main filter Tag value
    _period = int(10)  # Period in minutes
    _namespace = 'WindowsPlatform'  # Namespace of metric
    _metricname = 'Available Memory'  # Metric name
    _unit = 'Megabytes'  # Unit
    _topicarn = 'arn:aws:sns:ap-southeast-2:862017364710:CloudWatchMissingMetrics'  # SNS Topic ARN to write message to
    _min_minutes = 1440  # Max minutes to wait before sending next alert for an instance, One Day = 1440 minutes
    _region = "ap-southeast-2"  # Region

    ec2 = boto3.resource('ec2', _region)
    cw = boto3.client('cloudwatch', _region)

    filters = [{'Name': 'instance-state-name', 'Values': ['running']}]

    instances = ec2.instances.filter(Filters=filters)

    now = datetime.now()

    print("CW_Missing_Metrics: Reading Cloud watch metric for last {0} minutes.".format(_period))

    start_time = datetime.utcnow() - timedelta(minutes=_period)
    end_time = datetime.utcnow()

    print("CW_Missing_Metrics: List of running instances:")

    list_instances = []

    for instance in instances:

        if check_tag_present(instance, _instancetagname, _instancetagvalue) == False:
            # print ("Tag/Value missing, ignoring instance ", instance.id)
            continue

        cwTag = "Cloudwatch Server Name"
        if check_tag_present_x(instance, cwTag) == False:  # Tag missing, ignore
            # print ("***** Tag ", cwTag, " missing, ignoring instance ", instance.id)
            continue

        print("CW_Missing_Metrics: Checking ", instance.id)

        i = 1

        date_s = instance.launch_time
        date_s = date_s.replace(tzinfo=None)
        # date_s = datetime.datetime.now(date_s.tzinfo)
        new_dt = datetime.utcnow() - date_s

        instance_name = [tag['Value'] for tag in instance.tags if tag['Key'] == 'Name'][0]
        cw_server_name = [tag['Value'] for tag in instance.tags if tag['Key'] == 'Cloudwatch Server Name'][0]
        cw_server_name = cw_server_name.lower()
        minutessince = int(new_dt.total_seconds() / 60)

        # print("Instance id:",instance.id)
        # print("Instance name:",instance_name)
        # print("Launch time:",instance.launch_time)
        # print("Instance uptime:",minutessince,"min\n")

        if minutessince < _period:
            print("CW_Missing_Metrics: Not looking for data on this instance as uptime is less than requested period.")
            continue

        metrics = cw.get_metric_statistics(
            Namespace=_namespace,
            MetricName=_metricname,
            Dimensions=[{'Name': 'Server Name', 'Value': cw_server_name}],
            # Dimensions=[{'Name': 'InstanceId','Value': instance.id}], # *2
            StartTime=start_time,
            EndTime=end_time,
            Period=300,
            Statistics=['Maximum'],
            Unit=_unit
        )

        datapoints = metrics['Datapoints']
        # print("datapoints array=====>", datapoints)

        for datapoint in datapoints:
            if datapoint['Maximum']:
                # print i,")\nInstance name:",instance_name,"\nInstance id:",instance.id,"\nDatapoint Data:",datapoint['Maximum'],"\nTimeStamp: ",datapoint['Timestamp'],"\n=============================\n"
                print(i, ")\nDatapoint Data:", datapoint['Maximum'], "\nTimeStamp: ", datapoint['Timestamp'], "\n")
                i += 1
            else:
                print("CW_Missing_Metrics: Cloudwatch has no Maximum metrics for", _metricname, "instance id: ", instance.id)

        if i == 1:  # No data point found
            # print ("Data points not found.")
            print("CW_Missing_Metrics: Cloudwatch has no metrics for", _metricname, " for instance id: ", instance.id)
            list_instances.append(instance_name + " (" + instance.id + ")" + ", CW Server Name: " + cw_server_name)

        print("=================================================\n")

    #DEBUG
    #list_instances.append('i-0a25dc7ba6b4a5d3b')

    list_instances_for_alert = []

    for s in list_instances: # these instances in 'list_instances' have missing metrics

        # Check if instance was reported in last 24 hr
        last_checked = dynamodb_get_single_value("CWCheckData", "instance_id", s, "last_checked")

        if (last_checked == ""):
            print ("CW_Missing_Metrics: First alert for Instance {0}.".format(s))
            fmt = '%Y%m%d%H%M%S'  # ex. 20110104172008 -> Jan. 04, 2011 5:20:08pm
            now_str = datetime.now().strftime(fmt)

            # Set alert sending date in DB
            dynamodb_set_single_value("CWCheckData", "instance_id", s, "last_checked", now_str)
            list_instances_for_alert.append(s)

        else:
            fmt = '%Y%m%d%H%M%S'  # ex. 20110104172008 -> Jan. 04, 2011 5:20:08pm
            now_str = datetime.now().strftime(fmt)
            rec_datetime = datetime.strptime(last_checked, fmt)
            rec_datetime = rec_datetime.replace(tzinfo=None)
            now_datetime = datetime.strptime(now_str, fmt)
            new_dt = now_datetime - rec_datetime
            min_last_alert= int(new_dt.total_seconds() / 60)

            if (min_last_alert > _min_minutes):
                # Set alert sending date in DB
                print("CW_Missing_Metrics: New alert for Instance {0}.".format(s))
                dynamodb_set_single_value("CWCheckData", "instance_id", s, "last_checked", now_str)
                list_instances_for_alert.append(s)
            else:
                print("CW_Missing_Metrics: Alert already sent for instance '{0}' within last {1} minutes.".format(s, _min_minutes))

    if len(list_instances_for_alert) > 0:
        send_alert(list_instances_for_alert, _topicarn)

################################ Main ################################

#Main
#boto3.setup_default_session(profile_name='vicroads')
#print ('CW_Missing_Metrics: Loading function...')
#lambda_handler(0,0)
#dynamodb_create_table()
#dynamodb_get_single_value("CWCheckData", "instance_id", "i-0de559fcf8bfd5053-1", "last_checked")
#dynamodb_set_single_value("CWCheckData", "instance_id", "i-0de559fcf8bfd5053-1", "last_checked", "DDDDDDD")

################################ Main ################################

Please note: The function needs some permissions to execute, so the following policy should be attached to lambda function’s role:

{
"Version": "2012-10-17",
"Statement": [{
"Sid": "Stmt1493179460000",
"Effect": "Allow",
"Action": ["ec2:DescribeInstances"],
"Resource": ["*"]
},
{
"Sid": "Stmt1493179541000",
"Effect": "Allow",
"Action": ["sns:Publish"],
"Resource": ["*"]
},
{
"Sid": "Stmt1493179652000",
"Effect": "Allow",
"Action": ["cloudwatch:GetMetricStatistics"],
"Resource": ["*"]
}]
}