|
1 | | -## Derive Application Log Insights Using Amazon CloudWatch Connector for Automated Data Analytics on AWS |
| 1 | +# Derive Application Log Insights Using Amazon CloudWatch Connector for Automated Data Analytics on AWS |
2 | 2 |
|
3 | | -### <ins> Background </ins> |
4 | | - This repository aims at providing CDK solution that is used to demonstrate the capabilities of AWS ADA Solution. Automated Data Analytics (ADA) is an AWS Solution that enables users to derive meaningful insights from data in a matter of minutes through a simple and intuitive user interface. ADA provides a AWS-native, production-ready data platform to enable businesses analyse datasets, manage the data ingestion and data transformation. ADA provides a foundational platform that can be used by data analysts in use cases such as IT, Finance, Marketing, Sales and Security. Using ADA, teams can ingest, transform, govern and query diverse datasets from a range of data sources without requiring specialist technical skills. ADA provides a set of pre-built connectors to ingest data from a wide range of sources including Amazon Simple Storage Service, Amazon Kinesis Stream, Amazon CloudWatch, Amazon CloudTrail, and Amazon DynamoDB. The Amazon CloudWatch data connector allows data ingestion from the Amazon CloudWatch logs in the same AWS account in which ADA has been deployed, or an external AWS Account. Amazon Athena is a serverless, interactive analytics service that provides a simplified, flexible way to analyze petabytes of data. |
| 3 | +##Background |
| 4 | + This repository provides an AWS CDK solution that is used to demonstrate the capabilities of ADA on AWS Solution as describe in the blog [here](link). [Automated Data Analytics on AWS (ADA)](https://aws.amazon.com/solutions/implementations/automated-data-analytics-on-aws/) is an AWS Solution that enables users to derive meaningful insights from data in a matter of minutes through a simple and intuitive user interface. ADA offers an AWS-native data analytics platform that is ready to use out-of-the-box by data analysts for a variety of use cases. Using ADA, teams can ingest, transform, govern and query diverse datasets from a range of data sources without requiring specialist technical skills. ADA provides a set of [pre-built connectors](https://docs.aws.amazon.com/solutions/latest/automated-data-analytics-on-aws/data-connectors-guide.html) to ingest data from a wide range of sources including Amazon Simple Storage Service (S3), Amazon Kinesis Stream, Amazon CloudWatch, Amazon CloudTrail, and Amazon DynamoDB. |
5 | 5 |
|
6 | | -In this repository, we demonstrate how ADA Solution can be used to derive application insights in the AWS. We will first deploy ADA Solution into an AWS account and configure ADAby creating data products using the data connectors. We then use ADA’s query workbench to query and join the separate data sources to gain insights. We will also demonstrate how ADA can be integrated with BI tools such as Tableau to create rich visualisation and create reports. |
| 6 | +Using this repository, we will demonstrate how an Application Developer or an Application Tester is able to leverage ADA to derive operational insights of applications running in AWS. We will also demonstrate how ADA solution can be used to connect to different data sources in AWS without having to copy the data from the source. We will first [deploy the ADA solution](https://docs.aws.amazon.com/solutions/latest/automated-data-analytics-on-aws/deploy-the-solution.html) into an AWS account and [set up the ADA solution](https://docs.aws.amazon.com/solutions/latest/automated-data-analytics-on-aws/setting-up-automated-data-analytics-on-aws.html) by creating [data products](https://docs.aws.amazon.com/solutions/latest/automated-data-analytics-on-aws/creating-data-products.html) using data connectors. ADA’s data products allows users to connect to a wide range of data sources so that users can query the datasets as if they are querying Relational Database Tables. We then use ADA’s query workbench to join the separate datasets and query the correlated data to get insights. We will also demonstrate how ADA can be integrated with BI tools such as Tableau to visualise the data and to build reports. |
7 | 7 |
|
8 | | -### <ins> Solution overview </ins> |
| 8 | +##Solution overview |
9 | 9 |
|
10 | | -The following are deployed: |
| 10 | +In this section, we will present the Solution Architecture for the demo and explain the workflow. For the purposes of demonstration, the bespoke application is simulated using an Amazon Lambda function that emits logs in [Apache Log Format](https://httpd.apache.org/docs/2.4/logs.html#accesslog) at a preset interval using Amazon EventBridge. This standard format can be produced by many different web servers and be read by many log analysis programs. The application (Amazon Lambda) logs are sent to a CloudWatch Log Group. The historical application logs are stored in an Amazon S3 bucket for reference and for querying purposes. A lookup table with a list of [HTTP status codes](https://httpd.apache.org/docs/2.4/logs.html#accesslog) along with the description is stored in an Amazon DynamoDB table. These three will serve as sources from which data will be ingested into ADA for correlation, query and analysis. We will [deploy ADA Solution](https://docs.aws.amazon.com/solutions/latest/automated-data-analytics-on-aws/deploy-the-solution.html) into an AWS account and setup ADA. We will then create the data products within ADA for the Amazon CloudWatch Log Group, Amazon S3 bucket, and Amazon DynamoDB. Once the data products are configured, ADA provisions the data pipeline to ingest the data from the sources into the ADA platform. Using ADA’s Query Workbench, users can query the ingested data using plain SQL for application troubleshooting or issue diagnosis. |
11 | 11 |
|
12 | | - 1. A Amazon Lambda Function that simulates an application emitting logs in Apache Log Format and |
13 | | - 2. An Amazon EventBridge rule that invokes the Application Amazon Lambda function at a 2-minute interval. |
14 | | - 3. An Amazon S3 Bucket with the relevant bucket policies and a .csv file that contains the historical application logs. |
15 | | - 4. A Amazon DynamoDB table with the lookup data. |
16 | | - 5. Relevant IAM roles and permissions required for the services. |
| 12 | +Refer to the diagram below to get an overview of the architecture and workflow of using ADA to gain insights into application logs. |
| 13 | + |
17 | 14 |
|
18 | | -### <ins> Instructions </ins> |
| 15 | +The workflow includes the following steps: |
| 16 | +1. An Amazon Lambda function is scheduled to be triggered at a 2-minute interval using Amazon EventBridge. |
| 17 | +1. The Amazon Lambda function emits logs that are stored at a specified Amazon CloudWatch Log Group under /aws/lambda/CdkStack-AdaLogGenLambdaFunction. The application logs are generated using the Apache Log Format schema but stored in the Amazon CloudWatch Log Group in JSON format. |
| 18 | +1. The Data Products for Amazon CloudWatch, Amazon S3 and Amazon DynamoDB, are created in ADA, respectively. The Amazon CloudWatch data product connects to the Amazon CloudWatch Log Group where the application (AWS Lambda) logs are stored. The Amazon S3 connector connects to an Amazon S3 bucket folder where the historical logs are stored. The Amazon DynamoDB connector connects to an Amazon DynamoDB table where the status code that are referred by the application and historical logs are stored. |
| 19 | +1. For each of three data products, ADA deploys the data pipeline infrastructure to ingest data from the sources. Once the data ingestion is complete, user will be able to write queries using SQL via the ADA’s Query Workbench. |
| 20 | +1. User logs in to the ADA portal and composes SQL queries from the query workbench to gain insights in to the application logs. User can optionally save the query and share the query with other ADA users in the same domain. ADA’s query feature is powered by Amazon Athena, which is a serverless, interactive analytics service that provides a simplified, flexible way to analyze petabytes of data. |
| 21 | +1. Tableau is configured to access the ADA Data Products via ADA’s Egress EndPoints. User creates a dashboard with two charts. The first chart is a heat map that shows the prevalence of HTTP Error codes correlated with the Application API EndPoints. The second chart is a bar chart that shows the top 10 application APIs with a total count of HTTP error codes from the historical data. |
19 | 22 |
|
20 | | -The `cdk.json` file tells the CDK Toolkit how to execute your app. |
| 23 | +## Prerequisites |
21 | 24 |
|
22 | | -## Steps to setting up the solution: |
23 | | -## Prerequisites: Install the AWS CDK prerequisites, TypeScript-specific prerequisites and git. |
| 25 | +To perform this demo end to end as described in the [blog](link), the user needs the following prerequisites: |
24 | 26 |
|
| 27 | +1. Install the [AWS Command Line Interface](https://aws.amazon.com/cli/), AWS CDK [prerequisites](https://docs.aws.amazon.com/cdk/v2/guide/work-with.html), TypeScript-specific [prerequisites](https://docs.aws.amazon.com/cdk/v2/guide/work-with-cdk-typescript.html) and [git] (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). |
| 28 | +1. [Deploy](https://docs.aws.amazon.com/solutions/latest/automated-data-analytics-on-aws/deploy-the-solution.html) ADA Solution in the user’s AWS Account in the North Virgina (us-east-1) region. |
| 29 | + 1. Provide an admin email while launching the ADA CloudFormation stack. It is needed for ADA to send the root user password. An admin phone number is required to receive One-Time Password (OTP) messages if Multi-Factor Authentication (MFA) is enabled. For this demo, MFA is not enabled. |
| 30 | +1. Build and Deploy the sample application ([AWS Cloud Development Kit](https://github.com/aws-samples/operational-insights-with-automated-data-analytics-on-aws)) solution so that the following resources can be provisioned in the user’s AWS Account in the North Virginia (us-east-1) region: |
| 31 | + 1. An Amazon Lambda Function that simulates the logging application and an Amazon EventBridge rule that invokes the Application Amazon Lambda function at a 2-minute interval. |
| 32 | + 1. An Amazon S3 Bucket with the relevant bucket policies and a .csv file that contains the historical application logs. |
| 33 | + 1. An Amazon DynamoDB table with the lookup data. |
| 34 | + 1. Relevant IAM roles and permissions required for the services. |
| 35 | +1. (Optional) Install Tableau [desktop](https://www.tableau.com/products/desktop), a third party Business Intelligence provider. We are using Tableau Desktop version 2021.2. There is a cost involved in using a licensed version of Tableau Desktop application. For additional details, please refer to Tableau licensing documentation. |
| 36 | + |
| 37 | +## Setting up the Sample Application Infrastructure using AWS CDK |
| 38 | +The steps to clone the repo and to set up AWS CDK project are listed below. Before running the commands below, be sure to [configure](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) your AWS Credentials. Create a folder, open Terminal and navigate to the folder where the AWS CDK solution needs to be installed. |
| 39 | + |
| 40 | +``` |
25 | 41 | * `gh repo clone vaijusson/ADALogInsights` clone the project in a local folder |
26 | 42 | * `npm run build` compile typescript to js |
27 | 43 | * `npm run watch` watch for changes and compile |
28 | 44 | * `cdk deploy` deploy this stack to your default AWS account/region |
29 | 45 | * `cdk diff` compare deployed stack with current state |
30 | 46 | * `cdk synth` emits the synthesized CloudFormation template |
| 47 | +``` |
| 48 | + |
| 49 | +These steps perform the following: |
| 50 | +1. Install the library dependencies |
| 51 | +1. Build the project |
| 52 | +1. Generate a valid AWS CloudFormation template |
| 53 | +1. Deploy the stack using AWS CloudFormation in the user’s AWS account. |
| 54 | + |
| 55 | +The deployment takes about 1-2 minutes and creates the Amazon DynamoDB lookup table, Application Lambda function and Amazon S3 bucket containing the historical log files as outputs. |
| 56 | + |
| 57 | + |
| 58 | +## Tear Down |
| 59 | + |
| 60 | +Tearing down the sample application infrastructure is a two-step process. First, to remove the infrastructure provisioned for the purposes of this demo, execute the following command in the Terminal. |
| 61 | + |
| 62 | +``` |
| 63 | +cdk destroy |
| 64 | +``` |
| 65 | + |
| 66 | +For the following question, enter ‘y’ and CDK will delete the resources deployed for the demo. |
| 67 | + |
| 68 | +``` |
| 69 | +Are you sure you want to delete: CdkStack (y/n)? y |
| 70 | +``` |
| 71 | + |
| 72 | +Alternatively, the resources can be removed from the AWS Console by navigating the ‘CloudFormation’ service, selecting the ‘CdkStack’ and selecting ‘Delete’ option. |
| 73 | + |
| 74 | + |
31 | 75 |
|
32 | 76 | ## Security |
33 | 77 |
|
|
0 commit comments