Hands on CloudOps

access_time157 mins remaining

1. Overview

The aim of these labs is to provide some hands-on to have the fundamentals of Cloud Operations in AWS. This is a fundamentals level, but an advanced level might be provided in the future, if required.

What you will learn

  • Be comfortable and confident with the AWS environment
  • Get the basics of users and permissions management
  • Be able to manage users in AWS
  • Be able to deploy resources in AWS in an automatic way
  • Understand how to analyze logs in AWS
  • Be able to analyze metrics in AWS
  • Be able to create alarms in response to events
  • Be able to set up storage in a correct way, by using AWS best principles

NOTE: The source code can be found in my Git repository https://github.com/ronaldtf/aws-cloudops-workshop

2. Set up a CLI

First of all, we need to set up our environment to later be able to deploy our resources. To do so, we must follow the AWS documentation. We describe below the procedure anyway.

In the IAM Section in the AWS Console, we have to go to users and select the user we want to use for the deployment. We select Security credentials and Create access key (if not previously created). Copy the user access key and secret access key provided.

If you do not find the file ~/.aws/credentials in the cases below, type:

$ aws configure

and set the default region (e.g. eu-west-1).

Case 1: Unique account for users and services

If you are not using MFA and simply use an account for users and services (which is not recommended), just set the values copied above in an ~/.aws/credentials file:

    [default]
    aws_access_key_id = YOURACCESSKEYID
    aws_secret_access_key = YOURSECRETACCESSKEY

Case 2: AWS Users Account and AWS Services Account separately

If you use MFA and a user account for the IAM users and a service account for the services (and where you want to deploy your services) as shown in the image below…

… you must follow this configuration:

  • Install the aws-mfa tool
$ pip install aws-mfa
  • Define a profile (e.g. myprofile-long-term) and update your ~/.aws/config file by including the profile options.
[profile default]
...

[profile myprofile-long-term]
region = eu-west-1
output = json
role_arn = arn:aws:iam::123456789012:role/role_to_be_assumed
source_profile = default

Note that the source_profile is the profile that gives us permissions to do an assume role. Therefore, we need to define those credentials for it in the ~/.aws/config file.

  • Update your credentials file, by creating a tag profile by appending ‘long-term’ to the profile name (replace the values below with the appropriate ones for your case):
[myprofile-long-term]
aws_access_key_id = YOURACCESSKEYID
aws_secret_access_key = YOURSECRETACCESSKEY
aws_mfa_device = arn:aws:iam::123456789012:mfa/iam_user
  • Run the aws-mfa command to set the credentials with the profile
aws-mfa --profile myprofile

This will place the right credentials in the ~/.aws/credentials file. However, note that this credentials will expire and we will need to execute the command below again once they are expired.

Task: Set up your own environment.

3. Create an EC2 instance

An EC2 instance is a virtual machine in AWS. In order to define an EC2 instance we need, at least:

  • Region: EC2 is a regional service and, therefore, all the resources will be deployed in the specific region you specify
  • Name: the name of the EC2 instance you’re going to deploy
  • AMI (Amazon Machine Image): an AMI contains the operating system and additional applications you want to use.
    1. standard images: Images provided by AWS, the community or Marketplace
    2. custom images: Images you create with your custom applications
  • Instance type: It contains:
    1. instance family: it indicates the purpose of the EC2 instance, e.g. computing optimized, memory optimized, etc.
    2. instance size: the size of the instance for the specific family Both elements are translated in a number of vCPUs and memory, as well as other additional features such as network optimization, serial console access, etc.
  • VPC and subnet: networking infrastructure where the EC2 instance is deployed. This is strictly MANDATORY, as otherwise AWS would not know where to deploy the instance

NOTE: An EC2 instance can have multiple network interfaces from different subnets. However, all the network interfaces from the different subnets must belong to the same Availability Zone

  • Security group: It indicates the rules to access to the EC2 instances from the outside (inbound traffic), as well as the rules to address traffic from the EC2 to outsize (outbound traffic)
  • Storage: It can be of 2 types:
    1. Instance storage: it is an ephemeral storage (volatile). It can only be used as a root volume. These instances cannot be stopped
    2. Volume storage: this is a Network Attached Storage, which is durable. There are different types, regarding its purpose:
      • General Purpose (SSD)
      • Provisioned IOPS (SSD)
      • Cold HDD
      • Throuput Optimized HDD
      • Magnetic

Instance storage is not supported by all the instance types

NOTE: HDD disks cannot be used as root volumes

  • Key pair (optional): if specified, you need to provide the private key pair (in Linux) or use it to get the password (in Windows) when connecting to the EC2 instance
  • Additional settings, such as user data (a script to run when launching the EC2 instance, CloudWatch monitoring, instance profile, etc.) are optional and they would be covered in an advance workshop

Task: Go to the AWS Console and launch an instance in a default VPC (public subnet) and by assigning permissions to connect via SSH. Try to connect to the EC2 instance by specifying your private key.

4. Create a basic lambda

In this section we are going to create a lambda function with some basic settings.

Task: Go to the AWS Console and try to create an AWS Lambda. You’ll find more details below in this section

These are the steps when creating an AWS Lambda:

  1. Click on Create function button
  2. Specify Author from scratch, as we are going to create a blank one
  3. Specify some settings:
    • Function name: the name of the AWS Lambda function
    • Runtime: the “language”. In this case, specify Python 3.8
    • Architecture: x86_64
    • Permissions: Specify Create a new role with basic Lambda permissions
  4. Copy the following code in the lambda_function.py (use the code below as the code in the link has an error):
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

"""
Purpose

Shows how to implement an AWS Lambda function that handles input from direct
invocation.
"""

import logging
import os


logger = logging.getLogger()

# Define a list of Python lambda functions that are called by this AWS Lambda function.
ACTIONS = {
    'plus': lambda x, y: x + y,
    'minus': lambda x, y: x - y,
    'times': lambda x, y: x * y,
    'divided-by': lambda x, y: x / y}


def lambda_handler(event, context):
    """
    Accepts an action and two numbers, performs the specified action on the numbers,
    and returns the result.

    :param event: The event dict that contains the parameters sent when the function
                  is invoked.
    :param context: The context in which the function is called.
    :return: The result of the specified action.
    """
    # Set the log level based on a variable configured in the Lambda environment.
    logger.setLevel(os.environ.get('LOG_LEVEL', logging.INFO))
    logger.debug('Event: %s', event)

    action = event.get('action')
    func = ACTIONS.get(action)
    x = int(event.get('x'))
    y = int(event.get('y'))
    result = None
    try:
        if func is not None and x is not None and y is not None:
            result = func(x, y)
            logger.info("%s %s %s is %s", x, action, y, result)
        else:
            logger.error("I can't calculate %s %s %s.", x, action, y)
    except ZeroDivisionError:
        logger.warning("I can't divide %s by 0!", x)

    response = {'result': result}
    return response

Task: Test the lambda function and the ‘divided-by’ action

{
 "action": "divided-by",
 "x": "27",
 "y": "4"
}

Task: Test the lambda function by changing the action and/or the values. What would happen if you divide by 0?

Task: Would you know how to debug the lambda? Ask the instructor

NOTE: Lambdas can be implemented within VPCs, we can set additional configuration settings such as parameters, etc. Understanding these settings goes out of the scope of this workshop.

5. Create a role and assume it

In this lab you have to create a role, which you will later assume in the console.

Task: try to create a role by yourself and assume it on your own. If you do not know how to do it or you need some help, see the steps below.

Create a role

In order to create a role we need to define 2 components:

  1. Trust relationship: i.e. who is going to assume the role (so that we can grant permissions to)
  2. Permissions: the permissions which are granted

Trust relationship

We are going to assume that someone from the same account is going to assume this role. If we opt for the simplest way to do it, we can just indicate this when creating the role. Otherwise, we can set this later by specifying the following (replace AWS_ACCOUNT_ID below, if needed):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Principal": {
                "AWS": "<AWS_ACCOUNT_ID>"
            },
            "Condition": {}
        }
    ]
}

Permissions

For this part, we have 2 options:

  • We can use one of the existing AWS policies
  • We can create an existing one

Task: For this lab, create a new policy. Use the AWS Policy Generator to create a policy that grants S3 permissions for all the resources and all the Get, List and Describe operations.

Task: Do you need a permissions boundary?

Assume the role

In order to assume a role, if we opt to do it in the AWS Console, we need to the right side of the console and click on “Switch Role”. To switch a role we need to specify:

  • Account: for this lab, the same as your AWS account ID
  • Role: the role name to be assumed (the one you created before)
  • Display name (optional): a name you want to keep for this switching role

NOTE: Be aware that the role name is case sensitive!

6. Assume a role from an EC2 instance

In this lab you just need to do the same as before but, instead of using a trust relationship for the account, you need to specify the EC2 service.

Task: Create a role as before but specify the EC2 services in the trust relationship

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole"
            ],
            "Principal": {
                "Service": [
                    "ec2.amazonaws.com"
                ]
            }
        }
    ]
}

Now, you need to launch an EC2 instance.

Task: Go to “Actions”=>“Security”=>“Modify IAM role”. Specify the role you just created.

Task: Enter in the EC2 instance and verify that you can list the services (in the example below we assume you’re using the Ireland region)

$> aws s3 ls --region eu-west-1

Task: Why do not you need to specify aws configure in this case?

7. Create a bucket policy and verify its access

In this lab, you have to:

  1. Create a bucket (with default settings)
  2. Put an object in the bucket
  3. Assume the role you created before
  4. Verify that you have access to read the object you’ve put
  5. Go back to the original user
  6. Update the bucket policy by setting the policy below (replace the bucket name accordingly)
  7. Assume the role you created

Task: Can you read the object again?

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicRead",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET_NAME>",
                "arn:aws:s3:::<BUCKET_NAME>/*"
            ]
        }
    ]
}

Task: Replace the policy above with the following one and test it again

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:Get*"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET_NAME>",
                "arn:aws:s3:::<BUCKET_NAME>/*"
            ]
        }
    ]
}

8. Update a simple template and validate it

For this lab, you can use this CloudFormation template file from my Github account

Task: Use the CloudFormation designer to validate the template. Stop a while to see the interaction of the components, understand the CloudFormation template, etc.

Task: Try to modify the template and make it inconsistent. What happens with the validation?

9. Create and update a CloudFormation stack

For this lab, you can use this CloudFormation template file from my Github account

Create a CloudFormation stack

Create a CloudFormation stack with the template provided. In the output you should see an URL to access to a web page.

Task: Do you have access to it? Are you able to find the problem?

Update the CloudFormation stack

Now that you have found the problem, you have to update the stack with the modified template.

Task: Update the stack and access to the URL. You should be able to access it.

10. Create and update a CloudFormation stack set

In order to create the same stacks in multiple accounts/regions, you can use AWS CloudFormation Stack Set.

In this lab we are going to deploy a CloudFormation Stack Set in a single account, but we could do exactly the same thing by specifying other multiple accounts.

These are the steps to follow:

  1. Create a Stack for the parent account with this template. This stack creates a role that allows assuming a role to another account.
  2. Create a Stack for the child account with this template. This stack creates a role that grants access to perform operations in the child account to a parent account (the assumed role).
  3. Create a Stack Set with this template. This template deploys a S3 bucket in the regions and accounts specified.

NOTE: When you create a Stack Set, a CloudFormation stack is created in every region and account you specify.

The templates above are placed in this repository: https://github.com/ronaldtf/aws-cloudops-workshop/tree/master/cf-stackset

Task: Create a Stack Set that deploys the Stack Set in 2 regions in the same account.

Task: Try to create the Stack Set created before. What is the problem? How do you fix it?

11. Create an alarm and run a stress test to trigger it

In this lab we are going to use a metric that monitors the CPU utilization of an EC2 instance. We will use that metric and a threshold to trigger an alarm.

First of all, remember the states of an ALARM are:

  • INSUFFICIENT_DATA: when there are not sufficient data to determine whether the alarm should be triggered or not
  • OK: if the metrics has not achieved the threshold
  • ALARM: when the threshold has been overpassed and some action might be needed.

Task 1: Create a SNS topic

  1. Go to the SNS (Simple Notification Service) and create a topic. Use the default settings to create the topic.
  2. Subscribe your email address to the topic. To be able to get notifications from the topic, you must click on the link of the confirmation email you have received.
  3. Write down the arn of the SNS topic

Task 2: Create an EC2 instance (if you do not have any)

In case you do not have any instance available for this step, follow the steps in a previous lab to launch an EC2 instance.

Task 3: Create an alarm

Go to CloudWatch Alarms and create an alarm with the following data:

  • Metric: CPUUtilization (for the corresponding EC2 instance)
  • Statistic: set it to Average
  • Period: set it to 5 minutes
  • Threshold: set it to static, and greater than 80 (80% of CPU)
  • Datapoints: 1 out of 1
  • Missing data treatment: leave the default value
  • Action: Notification (select the topic you created in Task 1)

By setting this action we have created an alarm which is triggered when the CPUUtilization for the selected metric is higher than 80% during 5 minutes.

NOTE: for this example, we are using default metrics with standard resolution (datapoints collected every minute)

You should see that the current state of the alarm is INSUFFICIENT_DATA but, after a while, it will change to OK.

Task 4: Make the alarm be triggered

In order to make the alarm be triggered, we need to increase the CPU utilization of the EC2 instance. To do so, we need to login to the console and run the following actions:

  1. Install the stress package:
$> sudo yum update -y
$> sudo amazon-linux-extras install epel -y
$> sudo yum install stress -y
  1. Stress the cpu (100 workers and timeout after 300 seconds)
$> stress -c 100 -t 300

Task: Check again the alarm status. Has it reached the ALARM state?

12. Create a custom metric filter and create an alarm from it

The purpose here is to demonstrate how can we create a custom metric filter from CloudWatch logs and how can we use these as normal metrics.

NOTE: Because of the lack of log data, the instructor will demonstrate how this can be configured in a real scenario

13. Set up CloudWatch agent and create an alarm based on OS metrics

In order to perform the actions below you must have an EC2 instance. Create one if you do not have any by following one of the labs in a previous section

Grant permissions to the EC2 instance

  1. Create a role with EC2 as trust relationship and with the permissions to CloudWatchAgentServerPolicy
  2. Attach the created role to the EC2 instance

NOTE: Steps indicated above has not been details as they had been part of previous labs

Configure the CloudWatch agent in EC2 instance

$> sudo yum install amazon-cloudwatch-agent
$> sudo amazon-linux-extras install collectd -y
$> sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
$> sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

Set up an alarm

Create ana alarm based on one of the OS metrics

14. Create an EventBridge rule that triggers a notification regularly

In this lab we will create an EventBridge rule that:

  • runs every minute
  • is enabled
  • triggers a SNS topic

Task: Create the EventBridge rule with the specifications above. All the dependent services should have already been in place from previous labs.

15. Set up a notification in S3 to trigger an action when an object is PUT

Task: what is the difference between S3 events and CloudTrail events?

Task: what is the difference between CloudWatch logs and CloudTrail events?

In this lab we are going to trigger an event every time we instert an object (with a PUT action) in a specific bucket.

  1. Create a S3 bucket (or reuse the bucket created in a previous lab)
  2. In the S3 bucket, go to “Properties” => “Event Notifications”
  3. Create an event notification. You need:
    • A name
    • A prefix (to filter objects)
    • A suffix (to filter objects)
    • Object action
    • (optional additional parameters)
  4. Specify a SNS topic

Task: Have you received an email once you have put an object in the bucket?

16. Create a CloudTrail trail

When creating a CloudTrail trail, we need to specify:

  • trail name: name of the CloudTrail trail
  • bucket creation: whether the audit logs are going to be placed in an existing or a new bucket, and which one will it be
  • encryption: by default, logs are encrypted (as information is critical)
  • log file validation (this is out of scope from this lab)
  • CloudWatch logs: apart from S3, logs can be placed as CloudWatch logs

Task: create a trail with logs in an existing bucket in S3, without encryption

17. Analyze a CloudTrail trail

For the trail created before, go to the selected bucket and open one of the files. If nothing has been created yet, do not worry! CloudTrail trails can need up to 15 minutes to send the audit information to S3 and, optionally, CloudWatch.

In order to avoid you keep waiting, you can have a look at the following extracts:

Example 1

{"Records": [{
    "eventVersion": "1.0",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "accountId": "123456789012",
        "userName": "Alice"
    },
    "eventTime": "2014-03-06T21:22:54Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "StartInstances",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "205.251.233.176",
    "userAgent": "ec2-api-tools 1.6.12.2",
    "requestParameters": {"instancesSet": {"items": [{"instanceId": "i-ebeaf9e2"}]}},
    "responseElements": {"instancesSet": {"items": [{
        "instanceId": "i-ebeaf9e2",
        "currentState": {
            "code": 0,
            "name": "pending"
        },
        "previousState": {
            "code": 80,
            "name": "stopped"
        }
    }]}}
}]}

Task: Could you identify what this audit log corresponds to?

Example 2

{"Records": [{
    "eventVersion": "1.0",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "userName": "Alice"
    },
    "eventTime": "2014-03-06T21:01:59Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "StopInstances",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "205.251.233.176",
    "userAgent": "ec2-api-tools 1.6.12.2",
    "requestParameters": {
        "instancesSet": {"items": [{"instanceId": "i-ebeaf9e2"}]},
        "force": false
    },
    "responseElements": {"instancesSet": {"items": [{
        "instanceId": "i-ebeaf9e2",
        "currentState": {
            "code": 64,
            "name": "stopping"
        },
        "previousState": {
            "code": 16,
            "name": "running"
        }
    }]}}
}]}

Task: Could you identify what this audit log corresponds to?

Example 3

{"Records": [{
    "eventVersion": "1.0",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "userName": "Alice"
    },
    "eventTime": "2014-03-24T21:11:59Z",
    "eventSource": "iam.amazonaws.com",
    "eventName": "CreateUser",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "127.0.0.1",
    "userAgent": "aws-cli/1.3.2 Python/2.7.5 Windows/7",
    "requestParameters": {"userName": "Bob"},
    "responseElements": {"user": {
        "createDate": "Mar 24, 2014 9:11:59 PM",
        "userName": "Bob",
        "arn": "arn:aws:iam::123456789012:user/Bob",
        "path": "/",
        "userId": "EXAMPLEUSERID"
    }}
}]}

Task: Could you identify what this audit log corresponds to?

Example 4

{"Records": [{
    "eventVersion": "1.04",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accountId": "123456789012",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "userName": "Alice"
    },
    "eventTime": "2016-07-14T19:15:45Z",
    "eventSource": "cloudtrail.amazonaws.com",
    "eventName": "UpdateTrail",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "205.251.233.182",
    "userAgent": "aws-cli/1.10.32 Python/2.7.9 Windows/7 botocore/1.4.22",
    "errorCode": "TrailNotFoundException",
    "errorMessage": "Unknown trail: myTrail2 for the user: 123456789012",
    "requestParameters": {"name": "myTrail2"},
    "responseElements": null,
    "requestID": "5d40662a-49f7-11e6-97e4-d9cb6ff7d6a3",
    "eventID": "b7d4398e-b2f0-4faa-9c76-e2d316a8d67f",
    "eventType": "AwsApiCall",
    "recipientAccountId": "123456789012"
}]}

Task: Could you identify what this audit log corresponds to?

18. Create a bucket and access to it privately

In this lab we are going to demonstrate how to perform the private access to an S3 bucket privately (without internet access).

This is not actually a lab but a demo the instructor will provide in the live workshop.

19. Manage policies and ACLs

In a previous lab you were able to create a S3 bucket policy that granted/restricted access to a given account/role.

Task: would you be able to make a bucket publicly readable?

If we want to make a bucket publicly readable, we need to modify it’s bucket policy to grant its access. One way is by specifying the following bucket policy (remember to replace the BUCKET_NAME with the right name):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject*",
                "s3:GetBucket*"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET_NAME>",
                "arn:aws:s3:::<BUCKET_NAME>/*"
            ]
        }
    ]
}

Task: Does it work?

20. Create a lifecycle rule

For this lab you will need to:

  1. Go to an existing S3 bucket
  2. Within the bucket, go to the Management tab
  3. Click on Create a Lifecycle rule

The important part of the lifecycle is what action to perform

Task: Is there any action you do not understand? Ask the instructor

21. Use S3 access endpoints with policies

Even if this is an advanced topic, we will show it how to do it as a demo.

Note: The instructor will demonstrate how to perform this during the workshop

22. (Optional) Cross account access to S3

The goal of this lab is to allow to access a S3 bucket from another account:

  • AccountA has a private bucket BucketA
  • AccountB has a role that allows access to the bucket BucketA in AccountA
  • BucketA has a policy that allows AccountB to access it

Task: You should be able to do this lab with the concepts you have so far. However, you would need to have multiple accounts.

NOTE: The instructor will perform a demo for this.

23. Congratulations!

Now that you know…

  • How to interact with the AWS Console
  • How to set up your environment to work with AWS
  • How to manage security in your AWS account and services
  • How to monitor your resources
  • How to audit your account
  • How to define alerts to be aware of abnormal behaviors
  • How to deploy resources in AWS

… it’s time to put your hands on new projects and start new challenges!