AWS Interview Questions

Master AWS with these real-world interview questions and answers.

Switch Topic:

Beginner Questions

Core concepts, syntax, and foundational command-line knowledge.

Easy Associate Level AWS

What is the AWS Shared Responsibility Model?

AWS and customers share security responsibilities — the line depends on the service type:

AWS is responsible for: Security “of” the cloud — physical data centers, hypervisors, networking hardware, managed service infrastructure.

You are responsible for: Security “in” the cloud — your operating systems, your application code, IAM configurations, data encryption, network configuration (VPC, security groups), and patching guest OS on EC2.

For managed services like RDS or Lambda, AWS takes on more responsibility (OS patching), but you still own IAM, data, and network controls.

Easy Associate Level AWS

What is the difference between S3 Standard, S3 Infrequent Access, and S3 Glacier?

AWS S3 offers storage classes with different cost/access tradeoffs:

Standard: High durability, low latency, high throughput. For frequently accessed data.
Standard-IA (Infrequent Access): Same latency as Standard but cheaper storage cost. Higher per-retrieval cost. Use for data accessed less than once a month.
Glacier Instant Retrieval: For archive data accessed a few times per year. Millisecond retrieval.
Glacier Deep Archive: Lowest cost. Retrieval takes 12 hours. Use for compliance/regulatory long-term retention.

Use S3 Lifecycle Policies to automatically transition objects between classes based on age.

Easy Associate Level AWS

What is the difference between IAM users, groups, roles, and policies in AWS?

Users: Individual identities for people or applications with long-term credentials (access key + secret).

Groups: Collections of users that share the same permissions. Manage permissions at group level, not individually.

Roles: Identities assumed temporarily by AWS services (EC2, Lambda), federated users, or cross-account access. No long-term credentials — they use short-lived tokens. This is the preferred approach.

Policies: JSON documents that define permissions. Attached to users, groups, or roles.

Best practice: Always use roles over users for AWS service authentication.

Intermediate Questions

Infrastructure management, deployment strategies, and delivery flows.

Medium Senior Level AWS

What is AWS Backup?

AWS Backup is a fully managed backup service that centralizes and automates data protection across AWS services. It provides a unified backup console to configure and audit backup activity, define backup policies, and monitor backup and restore jobs. Backup supports EC2, EBS, RDS, DynamoDB, EFS, FSx, and more. It enables you to set backup schedules, retention policies, and lifecycle transitions. Backup provides cross-region and cross-account backup capabilities, ensuring business continuity and regulatory compliance. It integrates with AWS Organizations for centralized backup management across multiple accounts.

Medium Senior Level AWS

What is Amazon Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. Athena is serverless, requiring no infrastructure to set up or manage, and you pay only for the queries you run. It integrates with AWS Glue Data Catalog for metadata management and supports various data formats including CSV, JSON, Parquet, and ORC. Athena can query petabytes of data stored in S3 with excellent performance. It supports federated queries to analyze data across multiple sources including relational databases, NoSQL databases, and other data sources.

Medium Senior Level AWS

What is Amazon SNS?

Amazon Simple Notification Service (SNS) is a fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication. SNS provides topics for high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications. It supports SMS, email, mobile push notifications, and HTTP/S endpoints. SNS enables fan-out messaging patterns, message filtering, and message attributes. It provides durability, security, and high availability with redundant infrastructure across multiple availability zones.

Medium Senior Level AWS

What is Amazon VPC?

Amazon Virtual Private Cloud (VPC) lets you provision a logically isolated section of the AWS cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment including selection of IP address range, creation of subnets, and configuration of route tables and network gateways. VPC provides advanced security features including security groups and network access control lists. You can create a hardware VPN connection between your corporate datacenter and VPC, enabling you to extend your existing infrastructure into the cloud.

Medium Senior Level AWS

What is Amazon S3?

Amazon Simple Storage Service (S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. It allows you to store and retrieve any amount of data from anywhere on the web. S3 provides 11 nines of durability and stores data across multiple devices and facilities. It offers various storage classes for different use cases, lifecycle management, versioning, encryption, and access control. S3 integrates with many AWS services and supports features like static website hosting, event notifications, and cross-region replication.

Medium Senior Level AWS

What is AWS Shield?

AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS. Shield Standard provides automatic protection against most common network and transport layer DDoS attacks at no additional charge. Shield Advanced offers enhanced protection with additional detection and mitigation against larger and more sophisticated attacks. It includes 24/7 access to AWS DDoS Response Team (DRT), real-time attack notifications, and DDoS cost protection. Shield integrates with CloudFront, Route 53, ELB, and Elastic IP resources.

Medium Senior Level AWS

What is AWS Macie?

Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect sensitive data in AWS. It automatically discovers, classifies, and protects sensitive data such as personally identifiable information (PII) and financial data stored in Amazon S3. Macie provides dashboards and alerts to help you understand how your data is being accessed and used, helping organizations meet compliance requirements and protect sensitive information.

Medium Senior Level AWS

What is AWS GuardDuty?

GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation. It uses machine learning, anomaly detection, and integrated threat intelligence to identify and prioritize potential threats. GuardDuty analyzes data from AWS CloudTrail, VPC Flow Logs, and DNS logs. It helps protect AWS accounts, workloads, and data by detecting unauthorized behavior, compromised instances, and reconnaissance by attackers.

Medium Senior Level AWS

What is AWS Config?

AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. It continuously monitors and records AWS resource configurations and allows you to automate evaluation against desired configurations. Config provides detailed resource inventory, configuration history, and configuration change notifications. It helps with compliance auditing, security analysis, change management, and troubleshooting by tracking how resources are configured and how they change over time.

Medium Senior Level AWS

What is AWS CloudTrail?

CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. It records API calls made on your account and delivers log files to your S3 bucket. CloudTrail provides visibility into user activity by tracking actions taken through AWS Management Console, SDKs, command line tools, and other AWS services. It helps with security analysis, resource change tracking, and compliance auditing by maintaining a comprehensive history of AWS API calls.

Medium Senior Level AWS

What is AWS Detective?

Detective analyzes and investigates security findings using machine learning to visualize resource behavior and identify potential security issues. It automatically collects log data from CloudTrail, VPC Flow Logs, and GuardDuty to create detailed visualizations of activities and relationships between AWS resources. Helps security teams quickly investigate and respond to potential threats by providing interactive graphs showing entity relationships and anomalous behavior patterns.

Medium Senior Level AWS

What is AWS ECS and when would you choose it over EKS?

ECS (Elastic Container Service) is AWS’s native container orchestrator. EKS (Elastic Kubernetes Service) is managed Kubernetes.

Choose ECS when:

Your team is AWS-native and doesn’t have Kubernetes expertise
You want lower operational overhead (no Kubernetes control plane concepts to manage)
Tight AWS service integration is a priority (IAM roles per task, ALB integration is simpler)

Choose EKS when:

You need Kubernetes-native features (CRDs, Operators, Helm ecosystem)
You have multi-cloud or hybrid requirements
Your team already has Kubernetes expertise

Medium Senior Level AWS

Explain AWS VPC and its core components (subnets, route tables, IGW, NAT).

A VPC (Virtual Private Cloud) is your isolated network within AWS.

Subnets: Subdivisions of your VPC in a specific AZ. Public subnets have a route to the IGW; private subnets do not.
Route Tables: Rules defining where traffic is directed. A public subnet’s route table has 0.0.0.0/0 → IGW.
Internet Gateway (IGW): Allows public subnets to communicate with the internet.
NAT Gateway: Allows private subnets to make outbound internet requests (e.g., pulling packages) without exposing them to inbound internet traffic.

Medium Senior Level AWS

What is the difference between an AWS Security Group and a Network ACL?

Security Groups (SGs): Stateful firewalls at the instance level. If you allow inbound traffic, the corresponding outbound response is automatically allowed. Rules are allow-only (no deny rules).

Network ACLs (NACLs): Stateless firewalls at the subnet level. You must explicitly allow both inbound and outbound traffic. Rules are evaluated in order (by rule number) and support both allow and deny.

In practice: Use Security Groups for most use cases. Use NACLs as an additional layer for blocking specific IP ranges (e.g., blocking a bad actor’s IP at the subnet boundary).

Advanced Questions

Enterprise orchestration, deep architectural concepts, and scaling issues.

Hard Lead / Architect Level AWS

How do you implement least-privilege IAM policies and why is it critical?

Least-privilege means granting only the exact permissions needed to perform a task — no more. This limits blast radius if credentials are compromised.

Implementation steps:

Start with deny-all, add allows: Begin with minimal permissions and add only what’s needed.
IAM Access Analyzer: Use to identify unused permissions and generate least-privilege policies based on CloudTrail logs.
Policy conditions: Add StringEquals conditions to restrict resources by tag, region, or account.
Permission boundaries: Cap the maximum permissions a principal can have, even if attached policies are more permissive.

"Condition": {
  "StringEquals": {
    "aws:RequestedRegion": "us-east-1"
  }
}

Real Production Scenarios

Real-world architecture, system migration, and design challenges.

Medium Senior Level AWS

What is AWS Lambda?

AWS Lambda is a serverless compute service that runs code in response to events without requiring server management. It automatically scales applications by running code in response to triggers such as API requests, file uploads, or database changes. Lambda supports multiple programming languages including Python, Node.js, Java, Go, and .NET. You only pay for the compute time consumed, with no charges when code is not running. Lambda functions can process data, respond to HTTP requests via API Gateway, and integrate with other AWS services. It eliminates infrastructure management overhead and provides automatic high availability and fault tolerance.

Medium Senior Level AWS

What is AWS CloudWatch?

Amazon CloudWatch is a monitoring and observability service that provides data and actionable insights for AWS resources, applications, and services. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. It provides unified view of AWS resources and applications, enables alarms and automated actions, creates dashboards, and analyzes logs. CloudWatch supports custom metrics, distributed tracing with X-Ray integration, and anomaly detection using machine learning. It helps you monitor application performance, optimize resource utilization, and respond to system-wide performance changes.

Medium Senior Level AWS

What is Amazon Redshift?

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze data using standard SQL and existing Business Intelligence tools. Redshift delivers ten times faster performance than traditional data warehouses by using columnar storage, data compression, and massively parallel processing. It supports petabyte-scale data warehousing with automated backups, snapshots, and monitoring. Redshift Spectrum allows querying data directly from S3 without loading. It integrates with various data loading tools, supports federated queries, and provides end-to-end encryption for data security.

Medium Senior Level AWS

What is Amazon SQS?

Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. SQS offers two types of queues: Standard queues for maximum throughput and best-effort ordering, and FIFO queues for guaranteed ordering and exactly-once processing. It eliminates the complexity and overhead of managing message-oriented middleware. SQS provides server-side encryption, dead-letter queues for handling message processing failures, and integration with AWS services. It scales elastically and provides reliable message delivery.

Medium Senior Level AWS

What is AWS CodeDeploy?

AWS CodeDeploy is a deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, serverless Lambda functions, or Amazon ECS services. CodeDeploy enables you to rapidly release new features, helps avoid downtime during deployment, and handles the complexity of updating applications. It supports blue/green deployments, rolling deployments, and canary deployments. CodeDeploy integrates with your existing software release process and continuous delivery toolchain. It provides centralized control over application deployments and detailed deployment history and logs.

Medium Senior Level AWS

What is AWS CodeBuild?

AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces ready-to-deploy software packages. CodeBuild scales continuously and processes multiple builds concurrently, eliminating the need to provision, manage, and scale your own build servers. It provides preconfigured build environments for popular programming languages and allows you to create custom build environments. CodeBuild integrates with source control systems, supports caching for faster builds, and provides detailed logs and metrics. You pay only for the compute resources you use during builds.

Medium Senior Level AWS

What is AWS Fargate?

AWS Fargate is a serverless compute engine for containers that works with both Amazon ECS and EKS. It eliminates the need to provision and manage servers, letting you focus on building applications. Fargate allocates the right amount of compute resources, eliminating the need to choose instance types or scale cluster capacity. You pay only for the resources required to run your containers. Fargate automatically scales, patches, secures, and manages infrastructure, allowing you to deploy and manage containers without worrying about the underlying servers. It provides workload isolation and improved security through design.

Medium Senior Level AWS

What is Amazon EKS?

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service that makes it easy to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane. EKS runs upstream Kubernetes and is certified Kubernetes conformant, ensuring compatibility with existing Kubernetes tooling and plugins. It automatically manages the availability and scalability of Kubernetes control plane nodes, provides integrated logging and monitoring, and offers security best practices. EKS integrates with AWS services like VPC, IAM, ELB, and works with AWS Fargate for serverless compute.

Medium Senior Level AWS

What is Amazon ECS?

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to deploy, manage, and scale containerized applications. ECS eliminates the need to install and operate your own container orchestration software. It supports Docker containers and allows you to run applications on a managed cluster of EC2 instances or serverless with AWS Fargate. ECS provides deep integration with AWS services, security, networking, and monitoring capabilities. It helps you focus on building applications instead of managing infrastructure.

Medium Senior Level AWS

What is Amazon Route 53?

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service designed to route end users to Internet applications. Route 53 connects user requests to AWS infrastructure running in EC2, ELB, S3, and external resources. It offers domain registration, DNS routing, and health checking of resources. Route 53 supports multiple routing policies including simple, weighted, latency-based, failover, geolocation, and geoproximity routing. It provides 100% availability SLA and integrates with other AWS services for comprehensive application health monitoring.

Medium Senior Level AWS

What is Amazon RDS?

Amazon Relational Database Service (RDS) is a managed database service that makes it easy to set up, operate, and scale relational databases in the cloud. RDS provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups. It supports multiple database engines including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and SQL Server. RDS provides high availability with Multi-AZ deployments, read replicas for scaling read traffic, and automated backup and recovery capabilities.

Medium Senior Level AWS

What is AWS Lambda?

AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources. You can run code for virtually any type of application or backend service without provisioning or managing servers. Lambda executes your code only when needed and scales automatically from a few requests per day to thousands per second. You pay only for the compute time consumed. Lambda supports multiple programming languages and integrates with other AWS services, making it ideal for building microservices, data processing, and event-driven applications.

Medium Senior Level AWS

What is Amazon EC2?

Amazon Elastic Compute Cloud (EC2) is a web service that provides secure, resizable compute capacity in the cloud. It allows you to launch virtual servers (instances) on demand, choosing from various instance types optimized for different workloads. EC2 offers complete control over computing resources with the ability to scale capacity up or down based on demand. You can choose from multiple operating systems, configure security and networking, and manage storage. EC2 integrates with other AWS services and provides flexible pricing options including On-Demand, Reserved Instances, and Spot Instances.

Medium Senior Level AWS

What is AWS Inspector?

Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. It automatically assesses applications for vulnerabilities, deviations from best practices, and exposure risks. Inspector evaluates network accessibility of EC2 instances, security state of applications, and checks for common vulnerabilities. It produces detailed findings with severity ratings and remediation recommendations, helping security teams identify security issues before deployment.

Medium Senior Level AWS

What is the difference between Amazon RDS and Aurora, and when should you use each?

Amazon RDS and Aurora are both managed relational database services from AWS, but they differ significantly in architecture, performance, and capabilities.

Amazon RDS

RDS is a managed service that handles common database administration tasks for traditional database engines.

Supported Engines

MySQL, PostgreSQL, MariaDB
Oracle, Microsoft SQL Server
Db2

Architecture

Traditional single-server or Multi-AZ setup
Synchronous replication for Multi-AZ standby
Standard EBS storage (gp2, io1)
Up to 5 read replicas

Key Features

Automated backups, patching, monitoring
Multi-AZ for high availability (standby not readable)
Point-in-time recovery
Familiar database engine compatibility

Amazon Aurora

Aurora is a cloud-native relational database engine built from the ground up for cloud performance and availability.

Supported Engines

Aurora MySQL (compatible with MySQL 5.7/8.0)
Aurora PostgreSQL (compatible with PostgreSQL)

Architecture

Distributed, shared storage layer across 6 copies in 3 AZs
Storage automatically scales from 10GB to 128TB
Up to 15 Aurora Replicas (all readable)
Continuous backup to S3

Performance Advantages

5x throughput vs MySQL RDS
3x throughput vs PostgreSQL RDS
Faster failover: typically under 30 seconds

Additional Features

Aurora Serverless: Auto-scales compute up/down to zero
Aurora Global Database: Multi-region replication with < 1 second lag
Aurora Multi-Master: Multiple read-write instances
Backtrack: Roll back database to specific point without restore

Comparison

Feature	RDS	Aurora
Engines	MySQL, PG, MSSQL, Oracle	MySQL, PostgreSQL
Storage	Single-server EBS	Distributed cluster
Read Replicas	Up to 5	Up to 15
Failover	1-2 minutes	< 30 seconds
Storage scaling	Manual	Automatic
Cost	Lower for simple workloads	Higher base cost

When to Use Each

Use RDS when:

Running Oracle or SQL Server (no Aurora equivalent)
Cost is primary concern for small workloads
You need exact MySQL/PostgreSQL feature compatibility

Use Aurora when:

High performance and availability are critical
Multi-region replication required
Serverless or variable workload patterns
Large-scale workloads > 5 read replicas needed

Medium Senior Level AWS

How does AWS Auto Scaling work and what are the different scaling policies?

AWS Auto Scaling automatically adjusts compute capacity to maintain performance and minimize costs. It monitors your applications and automatically adjusts capacity to maintain steady, predictable performance.

Core Components

Auto Scaling Group (ASG)

Defines the group of EC2 instances to scale
Specifies minimum, maximum, and desired capacity
Distributes instances across multiple Availability Zones

Launch Template / Launch Configuration

Defines the instance configuration (AMI, instance type, key pair, security groups)

Health Checks

EC2 health checks (default)
ELB health checks (recommended for web apps)

Scaling Policies

1. Target Tracking Scaling

Maintains a specific metric at a target value automatically.

Example: Keep average CPU utilization at 60%
- AWS automatically adds/removes instances to maintain this target

Best for most use cases – simple to configure and responsive.

2. Step Scaling

Scales based on CloudWatch alarm breaches with step adjustments.

Example:
- CPU 60-70%: Add 2 instances
- CPU 70-90%: Add 4 instances  
- CPU > 90%: Add 8 instances

3. Simple Scaling

Legacy policy – adds/removes a fixed number of instances based on a single alarm.
Recommend using Target Tracking or Step Scaling instead.

4. Scheduled Scaling

Scales based on predictable load patterns.

Example: Increase to 20 instances every Monday 8 AM,
reduce to 5 instances every Friday 8 PM

5. Predictive Scaling

Uses ML to predict future traffic and proactively scales in advance.

Analyzes historical patterns
Creates scaling schedules automatically
Ideal for cyclical traffic patterns

Lifecycle Hooks

Hooks allow you to run custom actions when instances launch or terminate:

Launch hook: Install software, run tests before instance joins the group
Terminate hook: Drain connections, backup data before termination

Best Practices

Use Target Tracking as the primary policy
Enable multiple AZs for fault tolerance
Use launch templates over launch configurations
Set appropriate cooldown periods to prevent rapid scaling oscillation
Use warm pools for applications with long startup times

Medium Senior Level AWS

What is the difference between ALB, NLB, and CLB in AWS?

AWS provides three types of load balancers under the Elastic Load Balancing (ELB) service, each designed for different use cases.

Application Load Balancer (ALB)

Operates at Layer 7 (HTTP/HTTPS).

Routing: Content-based routing by URL path, host, headers, query strings
Protocols: HTTP, HTTPS, WebSockets, HTTP/2, gRPC
Use cases: Microservices, container-based apps, web applications
Features: Sticky sessions, authentication (Cognito, OIDC), Lambda targets, WAF integration

Example: Route /api/* to API servers, /images/* to image servers

Network Load Balancer (NLB)

Operates at Layer 4 (TCP/UDP).

Performance: Handles millions of requests per second with extremely low latency
Protocols: TCP, UDP, TLS
Use cases: High-performance gaming, financial trading, IoT, real-time streaming
Features: Static IP addresses, Elastic IP support, preserves source IP

Classic Load Balancer (CLB)

Operates at Layer 4 and Layer 7 (legacy).

Status: Legacy – AWS recommends migrating to ALB or NLB
Protocols: HTTP, HTTPS, TCP, SSL
Limitation: Less feature-rich, cannot route to targets by port

Comparison

Feature	ALB	NLB	CLB
OSI Layer	7	4	4/7
Protocols	HTTP/HTTPS	TCP/UDP	HTTP/HTTPS/TCP
Latency	Low	Ultra-low	Medium
Static IP	No	Yes	No
WebSockets	Yes	Yes	Limited
Path routing	Yes	No	No

When to Use Which

ALB: Most web applications, microservices, REST APIs, gRPC
NLB: Ultra-high performance, TCP/UDP apps, Static IP requirement, gaming
CLB: Avoid for new workloads – migrate to ALB or NLB

Medium Senior Level AWS

What is the difference between SQS, SNS, and EventBridge in AWS?

SQS, SNS, and EventBridge are all AWS messaging services but serve different purposes and communication patterns.

Amazon SQS (Simple Queue Service)

SQS is a point-to-point message queue for decoupling distributed systems.

Pattern: Producer → Queue → Consumer (pull-based)
Delivery: At-least-once delivery, messages persist until consumed or expired
Use cases: Task queues, background job processing, load leveling
Types: Standard (best-effort ordering) and FIFO (exactly-once, ordered)

Example: Order service puts messages in SQS; fulfillment service processes them at its own pace.

Amazon SNS (Simple Notification Service)

SNS is a publish-subscribe (pub/sub) messaging service.

Pattern: Publisher → Topic → Multiple Subscribers (push-based)
Delivery: Fan-out to multiple endpoints simultaneously
Subscribers: SQS queues, Lambda functions, HTTP endpoints, email, SMS
Use cases: Fan-out notifications, alert broadcasting, mobile push

Example: Payment event publishes to SNS; billing, analytics, and email services all receive it simultaneously.

Amazon EventBridge

EventBridge is a serverless event bus for event-driven architectures.

Pattern: Event Source → Event Bus → Rules → Targets (content-based routing)
Delivery: Route events based on content/patterns
Sources: AWS services, custom apps, SaaS applications (Salesforce, Zendesk, etc.)
Use cases: Event-driven architectures, microservice decoupling, AWS service integration

Comparison

Feature	SQS	SNS	EventBridge
Pattern	Queue	Pub/Sub	Event Bus
Consumers	Single	Multiple	Multiple
Routing	FIFO/Standard	All subscribers	Content-based rules
SaaS integration	No	No	Yes
Schema registry	No	No	Yes

When to Use Which

SQS: Decouple services, handle burst traffic, ensure reliable processing
SNS: Broadcast to multiple services simultaneously
EventBridge: Complex routing, AWS service events, third-party SaaS integration
SNS + SQS: Combined fan-out with reliable processing per subscriber

Easy Associate Level AWS

What is the difference between horizontal and vertical scaling in AWS?

Vertical Scaling (Scale Up): Increase the size of an existing instance (e.g., t3.medium → c5.4xlarge). Simple but has a ceiling (there’s a maximum instance size). Requires downtime to resize EC2.

Horizontal Scaling (Scale Out): Add more instances behind a load balancer. No theoretical ceiling. Enables high availability and fault tolerance because traffic is spread across multiple instances in multiple AZs.

AWS Auto Scaling Groups with Application Load Balancers enable fully automated horizontal scaling based on metrics like CPU or custom CloudWatch metrics.

Hard Lead / Architect Level AWS

Explain AWS Lambda cold starts and how to mitigate them in production.

A cold start occurs when Lambda needs to initialize a new execution environment — download the code, start the runtime, run your initialization code. This adds 100ms-1s+ of latency on the first request.

Mitigation strategies:

Provisioned Concurrency: Pre-warm a set number of Lambda execution environments. Eliminates cold starts for warmed instances (at extra cost).
Minimize package size: Smaller deployment packages initialize faster.
Use faster runtimes: Node.js and Python cold start faster than Java/C#.
Move init code outside the handler: DB connections and SDK clients initialized at module level persist across invocations.
Lambda SnapStart (Java): AWS-managed snapshot of initialized execution environment.

Medium Senior Level AWS

What is AWS CloudWatch and what are its main components?

CloudWatch is AWS’s native observability service with four main areas:

Metrics: Time-series data from AWS services (CPU, NetworkIn, etc.) and custom metrics you publish.
Logs: CloudWatch Logs for storing, searching, and analyzing log data from EC2, Lambda, ECS, etc.
Alarms: Alerts triggered when metrics exceed thresholds. Can trigger SNS, Auto Scaling, Lambda.
Dashboards: Visual widgets to display metrics across services in real-time.

For advanced analytics, ship logs to OpenSearch (ELK) or use CloudWatch Logs Insights for SQL-like queries.

Medium Senior Level AWS

How do you reduce AWS costs in a cloud environment? What are your go-to strategies?

Cloud cost optimization is an ongoing practice. High-impact strategies:

Right-sizing: Use AWS Cost Explorer and Compute Optimizer to identify oversized EC2 instances.
Reserved Instances/Savings Plans: Commit to 1-3 years for stable workloads — saves up to 72%.
Spot Instances: Use for stateless, fault-tolerant, or batch workloads. Up to 90% savings.
S3 Lifecycle policies: Auto-transition to cheaper storage tiers.
Delete idle resources: Audit unused EIPs, old snapshots, unattached EBS volumes.
Auto Scaling: Scale down to zero or minimum outside business hours.

Hard Lead / Architect Level AWS

How does IAM assume-role work and how do you implement cross-account access securely?

Cross-account access uses the sts:AssumeRole API. A role in Account B has a trust policy that allows Account A to assume it:

# Trust policy on role in Account B
{
  "Principal": {
    "AWS": "arn:aws:iam::ACCOUNT_A_ID:root"
  },
  "Action": "sts:AssumeRole"
}

Account A’s entity calls aws sts assume-role to get temporary credentials (up to 12 hours) for Account B. Security controls:

Add ExternalId condition for third-party access (prevents confused deputy attacks)
Add MFA condition for sensitive roles
Use SCPs at the AWS Organization level to restrict what can be assumed

Hard Lead / Architect Level AWS

How would you architect a highly available, multi-region AWS deployment?

Multi-region HA involves several layers:

DNS: Route53 with health checks and latency/failover routing policies to direct users to the nearest healthy region.
Data replication: RDS Multi-Region Read Replicas with promotion capability. DynamoDB Global Tables for active-active.
Edge: CloudFront CDN with origins in multiple regions.
Infrastructure: Identical infrastructure in each region managed by Terraform.
DR strategy: Define RTO (Recovery Time Objective) and RPO (Recovery Point Objective) to determine your architecture (Pilot Light, Warm Standby, or Active-Active).

Troubleshooting Scenarios

Live system debugging, incident diagnostics, and latency resolution.

Medium Senior Level AWS

What is AWS Route 53 and how do you implement DNS failover?

Amazon Route 53 is a scalable and highly available DNS web service that routes end users to internet applications and supports domain registration.

Key Features

DNS Resolution

Route 53 translates domain names (example.com) into IP addresses. It supports all standard DNS record types: A, AAAA, CNAME, MX, TXT, NS, SOA, and Route 53-specific alias records.

Routing Policies

Simple: Route traffic to a single resource
Weighted: Split traffic by percentage between resources (A/B testing, gradual rollouts)
Latency: Route to the region with lowest network latency
Geolocation: Route based on user’s geographic location
Geoproximity: Route based on geographic location with configurable bias
Failover: Active-passive failover routing
Multivalue Answer: Responds with up to 8 healthy records

Implementing DNS Failover

Active-Passive Failover Setup

Create Health Checks

Configure health checks for your primary endpoint (HTTP/HTTPS/TCP)
Set evaluation period, failure threshold, and interval

Create Primary Record

   Type: A
   Routing Policy: Failover
   Failover Type: Primary
   Health Check: my-primary-health-check
   TTL: 60

Create Secondary Record

   Type: A
   Routing Policy: Failover
   Failover Type: Secondary
   Value: [backup IP or S3 static site]
   TTL: 60

Failover Behavior

If primary health check fails, Route 53 routes to secondary
When primary recovers, traffic automatically returns

Active-Active Failover

Use Weighted routing with health checks:

Both endpoints active with equal weight (50/50)
Route 53 automatically removes unhealthy endpoints
Traffic redistributes to healthy endpoints

Multi-Region Failover Pattern

Route 53 (Latency routing)
├── us-east-1 ALB (Primary)
│   └── Auto Scaling Group
└── eu-west-1 ALB (Failover)
    └── Auto Scaling Group

Health Check Types

Endpoint health checks: HTTP/HTTPS/TCP checks on IP or domain
Calculated health checks: Combine results of multiple health checks
CloudWatch alarm health checks: Based on CloudWatch alarm state

My Practice Workspace

No saved questions yet. Click the Save button on any question to save it here.

No recently viewed questions.