AWS Databases - Robel T. Fesshaye

AWS RDS (Relational Data Storage)

RDS runs on virtual machines
You can’t log into RDS operating systems
Patching of the RDS OS and DB is Amazon’s responsibility
RDS is NOT serverless but Aurora severless is serverless.
Used for relational OLTP (Online Transaction Processing)
Types of DBs on AWS: SQL Server. Oracle, MySQL Server, PostgreSQL, Aurora, MariaDB, DynamoDB(No SQL)
RDS has two key features:

- - Multi-AZ – for Disaster Recovery
  - Read Replicas – for performance
There is no data charge between primary and secondary instances
Some of the common causes of not being able to connect to a DB instance on AWS are:
The DB is still being created
The local firewall is stopping communication traffic
The security groups for the DB have not been properly configured

Data Warehousing

Used for business intelligence. Tools like Cognos,Jaspersoft, SQL Server Reporting Services, Oracle Hyperion, SAP NetWeaver
Used to pull in very large and complex data sets. Usually used by management to do queries on data (such as current performance vs targets etc)
Use different types of architecture both from a database perspective and infrastructure layer.
AWS data warehouse solution is called Redshift

OLTP vs OLAP

They differ in terms of the types of queries you will run.

OLTP (Online Transaction Processing) via RDS
OLAP (Online Analysis Processing) via Redshift

Backups

There are two different types of Backups for RDS:
1. Automated Backups
2. Database Snapshots
Automated backups

- recoverable at any time within the configurable retention period i.e. between 1 and 35 days.
- There is 1 full daily snapshot, and transaction logs for the remainder of the day.
- When you do a recovery, AWS will first choose the most recent backup, and then apply transaction logs relevant to that day. This allows you to do a point in time recovery down to a second, within the retention period.
- recovery is supported during the retention period.
- Automated backups are enabled by default. You get free storage matching the size of the DB.
- Backup data is stored in S3 and free storage space equal to the size of your database
- Choose a backup window during a time of reduced load; storage i/o may be suspended during backup, and users may experience increased latency.

– Database Snapshots

Are done manually (user initiated)
They are stored even if the original RDS is deleted (unlike automated backups)
When restoring from an automated backup or manual snapshot, the restored version will be a new RDS instance with a new DNS endpoint.

Encryption

Encryption at rest is supported for:

- MySQL
- Oracle
- SQL Server
- PostgreSQL
- MariaDB
- AuroraDB (more recently)

Encryption is done using the AWS key management service (KMS)
If the RDS instance is encrypted the data stored at rest in the underlying storage is encrypted, as rae its automated backups, read replicas amd snapshots.
Encrypting existing DBs is not supported. To do this, you’ll need to create a new encrypted instance, and migrate data to it. The encryption key can be stored in KMS.

Failover

Multi-AZ allows you to have an exact copy of your production database in another AZ.
supports failover to the same DNS endpoint. Note that Amazon RDS uses DNS names, not IP addresses.
When Multi-AZ failover is enabled, AWS creates a primary RDS instance, and synchronously replicates data to a standby instance in a different AZ.
If the primary DB instance fails, the following happens:

- The standby replica (which is stored in a different AZ to the primary) is promoted to become the new primary.
- The DNS record of the DB instance is changed to point to the new primary.
- The original primary DB instance is terminated, and a new standby replica is created.

Failover is for DR (Disaster Recovery) only. It is not possible to use your DR DB as a read replica.
Failover and Multi-AZ is only supported for:
- MySQL Server, PostgreSQL, Oracle, Maria DB
Multi-AZ deployments are not supported for SQL Server or Aurora

Read Replicas

Read replicas allow you to have a read-only copy of your production database. This is achieved by using asynchronous replication from the primary RDS instance to the read replica. You can read replicas primarily for very read-heavy database workloads.
Are for scaling, not for DR.
Are only supported by:
- MySql
- PostgreSQL
- MariaDB
- Aurora
Can be promoted to a standalone, writable DB
Have their own DNS address
You can have read replicas that have Multi-AZ
You can create read replicas of multi-AZ source databases.
Must have automatic backups turned on in order to deploy a read replica.
You can have upto 5 read replica copies of any database
We can have a read replica in a second region

AWS Oracle DB does not support read replicas [ AWS RDS read replicas are not supported by SQL server, Oracle or Aurora ]
It’s possible to have read replicas of read replicas, but watch out for latency.
Read replicas are supported in separate regions from the source DB for both MySQL and MariaDB, but NOT PostgreSQL.

DynamoDB [ DyanamoDB FAQ ]

Non-relational
Supports both documents and Key-Value data
Extremely important for the Developer exam. Only an overview is needed or the solutions architect exam.
Always on SSD – magnetic disks are not supported
Spread across 3 disting geographic data centers (meaning 3 different facilities. AWS is non specific about whether this means different AZs, countries, etc)
Great fit for mobile. Web, gaming, ad-tech, IoT and may other apps.

DynamoDB consists of:

Collections (aka tables)
Documents (aka rows), which contain key-value pairs (aka fields)

DynamoDB tables support a primary key which can be either:

a single-attribute partition key, i.e. UserId
a composite partition-sort key, i.e. UserId (partition), Timestamp (sort)

Provisioned capacity / provisioned throughput capacity controls the read and write capacity on a table – when you create a table, you specify how much provisioned throughput capacity you want to reserve for reads and writes (RCU – Read Capacity Units, WCU – Write Capacity Units). DyanamoDB will then reserve the necessary resources to meet your throughput needs while ensuring consistent, low-latency performance. Your provisioned throughput capacity can be changed after table creation, and can be increased/decreased as necessary.

DynamoDB cross-region replication is suitable for:

- Efficient disaster recovery – by replicating tables in multiple data centers, you can switch over to using DynamoDB tables from another region in case a data center failure occurs.
- Faster reads
- Easier traffic management
- Easy regional migration
- Live data migration

– Reads and Writes

- - Dynamo supports eventually consistent reads (usually within 1 sec), and strongly consistent reads
  - Write capacity units are billed in blocks of 10
  - 1 write capacity unit = 1 write per second
  - Read capacity units are billed in blocks of 50
  - Read Types:

- - - strongly consistent reads : returns a result that reflects all writes that received a successful response prior to the read.
    - eventually consistent reads: consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data. Enabled by default.
  - DynamoDB can be expensive for writes, but is cheap for reads.

Redshift

Petabyte scale data warehouse service in the cloud.
Columnar storage – data is stored sequentially, and is organised by column

Redshift can be configured:

- Single node (160GB)
- Multi-node:

- - Leader node – manages client connections and receives queries
  - Compute nodes – for data storage and queries. Can have up to 128 of Compute Nodes.

Redshift supports advanced compression

- Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk.
- Doesn’t require indexes or materialized views
- Analyses your data and picks the best compression method for it
Supports automatically distributing your query among nodes for massively parallel processing. Redshift automatically distributes data and query load across all nodes.

Redshift Backups

Enabled by default with a 1 day retention period.
Max retention period is 35 days.
Redshift attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3)
Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery.

Redshift Pricing

Charged for compute node hours; you’re only charged for compute nodes, not for leader nodes.
Charged for 1 unit per node per hour for the billing period
Will be charged for backups
Will be charged for data transfer

Security Considerations:

Data is encrypted in transit using SSL, and encrypted at rest using AES256
Is not designed for Multi-AZ
Designed for management reports / BI (Business Intelligence)
In the event of an AZ outage, data snapshots can be restored to a different AZ

Redshift Availability:

Only available in 1 AZ
Can restore snapshots to new AZs in the event of an outage.

Aurora [Aurora FAQ]

Aurora is intended to compete with Oracle, and is MySQL and PostgreSQL compatible.
Start with 10GB, scales in 10GB increments to 64TB (storage Autoscaling)
Compute resources can scale up to 32vCPUs and 244GB of memory.
2 copies of your data is contained in each availability zone, with a minimum of 3 availability zones. 6 copies of your data.
Only runs on AWS infrastructure. This means that it’s not possible to run a local Aurora DB.
Up to 5x better performance than MySQL, and 1⁄10 the cost of commercial DBs

Aurora Scaling

Aurora designed to transparently handle the loss of upto two copies of data without affecting database write availability and upto three copies without affecting read availability.
Aurora storage is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically.

Aurora replicas types

Aurora Replicas (currently 15)
MySQL read replicas (currently 5)

Backup with Aurora

Automated backups are always enabled on Amazon Aurora DB instances. Backups don’t impact database performance.
You can also take snapshots with Aurora. This also doesn’t impact on performance.
You can share Aurora snapshots with other AWS accounts

Elasticache

Elasticache is a web service that makes it easy to deploy, operate and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from a fast, managed, in-memory caches, instead of replying entirely on slower disk-based databases.
Elasticache can be used to increase performance on read-heavy databases (frequent identical queries)

Elasticache supports two open-source in-memory caching engines:

1. Memcached: Single-AZ
2. Redis – similar to Memcached, but generally preferable; Redis was developed using lessons learned from Memcached: Multi-AZ

Q & A

Question 1: Which of the following is not a feature of DynamoDB?

A. The ability to perform operations by using a user-defined primary key
B. Data reads that are either eventually consistent or strongly consistent
C. The primary key can either be a single attribute or a composite
D. The ability to store relational based data

Question 2: What happens to the I/O operations of a single-AZ RDS instance during a database snapshot or backup?

A. I/O may be briefly suspended while the backup process initializes (typically under a few seconds) and you may experience a brief period of elevated latency
B. Nothing
C. I/O operations to the database are sent to a secondary instance of a Multi-AZ installation (for the duration of the snapshot)
D. I/O operations will function normally

Question 3: You can RDP or SSH in to an RDS instance to see what is going on with the operating system.

A. True
B. False

Question 4: AWS’s NoSQL product offering is known as ________.

A. RDS
B. DynamoDB
C. MongoDB
D. MySQL

Question 5: Which set of RDS database engines is currently available?

A. Aurora, MySQL, SQL server, Cassandra
B. PostgreSQL, MariaDB, MongoDB, Aurora
C. MariaDB, SQL Server, MySQL, Cassandra
D. Oracle, SQL Server, MySQL, PostgreSQL

Question 6: When creating an RDS instance, you can select the Availability Zone into which you deploy it.

A. True
B. False

Question 7: RDS Reserved instances are available for multi-AZ deployments.

A. True
B. False

Question 8: With new RDS DB instances, automated backups are enabled by default?

A. True
B. False

Question 9: In RDS, what is the maximum value I can set for my backup retention period?

A. 15 days
B. 30 days
C. 35 days
D. 45 days

Question 10: If I wanted to run a database on an EC2 instance, which of the following storage options would Amazon recommend?

A. RDS
B. S3
C. Glacier
D. EBS

Question 11: How many copies of my data does RDS – Aurora store by default?

A. 3
B. 6
C. 2
D. 1

Question 12: MySQL installations default to port number ________.

A. 1433
B. 3306
C. 3389
D. 80

Question 13: If you want your application to check RDS for an error, have it look for an ______ node in the response from the Amazon RDS API.

A. Incorrect
B. Error
C. Abort
D. Exit

Question 14: Which AWS DB platform is most suitable for OLTP?

A. ElastiCache
B. RDS
C. DynamoDB
D. Redshift

Question 15: Which of the following is most suitable for OLAP?

A. ElastiCache
B. RDS
C. DynamoDB
D. Redshift

Question 16: Which AWS service is ideal for Business Intelligence Tools/Data Warehousing?

A. Elastic Beanstalk
B. ElastiCache
C. DynamoDB
D. Redshift

Question 17: What data transfer charge is incurred when replicating data from your primary RDS instance to your secondary RDS instance?

A. The charge is double the standard data transfer charge
B. The charge is the same as the standard data transfer charge
C. There is no charge associated with this action
D. The charge is double the standard data transfer charge

Question 18: Under what circumstances would I choose provisioned IOPS over standard storage when creating an RDS instance?

A. If you use online transaction processing in your production environment
B. If you have workloads that are not sensitive to latency/lag
C. If your business was trying to save money
D. If this was a test DB

Question 19: If you are using Amazon RDS Provisioned IOPS storage with a Microsoft SQL Server database engine, what is the maximum size RDS volume you can have by default?

A. 500GB
B. 1TB
C. 32TB
D. 16TB
E. 6TB

Question 20: Which of the following AWS services is a non-relational database?

A. RDS
B. Redshift
C. DynamoDB
D. ElastiCache

Question 21: You are hosting a MySQL database on the root volume of an EC2 instance. The database is using a large number of IOPS, and you need to increase the number of IOPS available to it. What should you do?

A. Migrate the database to an S3 bucket
B. Migrate the database to Glacier
C. Add 4 additional EBS SSD volumes and create a RAID 10 using these volumes
D. Use CloudFront to cache the database

Question 22: In RDS, changes to the backup window take effect ________.

A. After 30 minutes
B. The next day
C. Immediately
D. You can’t backup in RDS

Question 23: Amazon’s ElastiCache uses which two engines?

A. Redis and Memory
B. Reddit and Memcrush
C. Redis and Memcached
D. MyISAM & InnoDB

Question 24: When you add a rule to an RDS DB security group, you must specify a port number or protocol.

A. True
B. False

Answers

1-D 2-A 3-B 4-B 5-D 6-A 7-A 8-A 9-C 10-D 11-B 12-B 13-B 14-B 15-D 16-D 17-C 18-A 19-D 20-C 21-C 22-C 23-C 24-B