Beginner Questions
Core concepts, syntax, and foundational command-line knowledge.
What is the AWS Shared Responsibility Model?
AWS and customers share security responsibilities — the line depends on the service type:
AWS is responsible for: Security “of” the cloud — physical data centers, hypervisors, networking hardware, managed service infrastructure.
You are responsible for: Security “in” the cloud — your operating systems, your application code, IAM configurations, data encryption, network configuration (VPC, security groups), and patching guest OS on EC2.
For managed services like RDS or Lambda, AWS takes on more responsibility (OS patching), but you still own IAM, data, and network controls.
What is the difference between S3 Standard, S3 Infrequent Access, and S3 Glacier?
AWS S3 offers storage classes with different cost/access tradeoffs:
- Standard: High durability, low latency, high throughput. For frequently accessed data.
- Standard-IA (Infrequent Access): Same latency as Standard but cheaper storage cost. Higher per-retrieval cost. Use for data accessed less than once a month.
- Glacier Instant Retrieval: For archive data accessed a few times per year. Millisecond retrieval.
- Glacier Deep Archive: Lowest cost. Retrieval takes 12 hours. Use for compliance/regulatory long-term retention.
Use S3 Lifecycle Policies to automatically transition objects between classes based on age.
What is the difference between IAM users, groups, roles, and policies in AWS?
Users: Individual identities for people or applications with long-term credentials (access key + secret).
Groups: Collections of users that share the same permissions. Manage permissions at group level, not individually.
Roles: Identities assumed temporarily by AWS services (EC2, Lambda), federated users, or cross-account access. No long-term credentials — they use short-lived tokens. This is the preferred approach.
Policies: JSON documents that define permissions. Attached to users, groups, or roles.
Best practice: Always use roles over users for AWS service authentication.
Intermediate Questions
Infrastructure management, deployment strategies, and delivery flows.
What is AWS ECS and when would you choose it over EKS?
ECS (Elastic Container Service) is AWS’s native container orchestrator. EKS (Elastic Kubernetes Service) is managed Kubernetes.
Choose ECS when:
- Your team is AWS-native and doesn’t have Kubernetes expertise
- You want lower operational overhead (no Kubernetes control plane concepts to manage)
- Tight AWS service integration is a priority (IAM roles per task, ALB integration is simpler)
Choose EKS when:
- You need Kubernetes-native features (CRDs, Operators, Helm ecosystem)
- You have multi-cloud or hybrid requirements
- Your team already has Kubernetes expertise
Explain AWS VPC and its core components (subnets, route tables, IGW, NAT).
A VPC (Virtual Private Cloud) is your isolated network within AWS.
- Subnets: Subdivisions of your VPC in a specific AZ. Public subnets have a route to the IGW; private subnets do not.
- Route Tables: Rules defining where traffic is directed. A public subnet’s route table has
0.0.0.0/0 → IGW. - Internet Gateway (IGW): Allows public subnets to communicate with the internet.
- NAT Gateway: Allows private subnets to make outbound internet requests (e.g., pulling packages) without exposing them to inbound internet traffic.
What is the difference between an AWS Security Group and a Network ACL?
Security Groups (SGs): Stateful firewalls at the instance level. If you allow inbound traffic, the corresponding outbound response is automatically allowed. Rules are allow-only (no deny rules).
Network ACLs (NACLs): Stateless firewalls at the subnet level. You must explicitly allow both inbound and outbound traffic. Rules are evaluated in order (by rule number) and support both allow and deny.
In practice: Use Security Groups for most use cases. Use NACLs as an additional layer for blocking specific IP ranges (e.g., blocking a bad actor’s IP at the subnet boundary).
Advanced Questions
Enterprise orchestration, deep architectural concepts, and scaling issues.
How do you implement least-privilege IAM policies and why is it critical?
Least-privilege means granting only the exact permissions needed to perform a task — no more. This limits blast radius if credentials are compromised.
Implementation steps:
- Start with deny-all, add allows: Begin with minimal permissions and add only what’s needed.
- IAM Access Analyzer: Use to identify unused permissions and generate least-privilege policies based on CloudTrail logs.
- Policy conditions: Add
StringEqualsconditions to restrict resources by tag, region, or account. - Permission boundaries: Cap the maximum permissions a principal can have, even if attached policies are more permissive.
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "us-east-1"
}
}
Real Production Scenarios
Real-world architecture, system migration, and design challenges.
What is the difference between horizontal and vertical scaling in AWS?
Vertical Scaling (Scale Up): Increase the size of an existing instance (e.g., t3.medium → c5.4xlarge). Simple but has a ceiling (there’s a maximum instance size). Requires downtime to resize EC2.
Horizontal Scaling (Scale Out): Add more instances behind a load balancer. No theoretical ceiling. Enables high availability and fault tolerance because traffic is spread across multiple instances in multiple AZs.
AWS Auto Scaling Groups with Application Load Balancers enable fully automated horizontal scaling based on metrics like CPU or custom CloudWatch metrics.
Explain AWS Lambda cold starts and how to mitigate them in production.
A cold start occurs when Lambda needs to initialize a new execution environment — download the code, start the runtime, run your initialization code. This adds 100ms-1s+ of latency on the first request.
Mitigation strategies:
- Provisioned Concurrency: Pre-warm a set number of Lambda execution environments. Eliminates cold starts for warmed instances (at extra cost).
- Minimize package size: Smaller deployment packages initialize faster.
- Use faster runtimes: Node.js and Python cold start faster than Java/C#.
- Move init code outside the handler: DB connections and SDK clients initialized at module level persist across invocations.
- Lambda SnapStart (Java): AWS-managed snapshot of initialized execution environment.
What is AWS CloudWatch and what are its main components?
CloudWatch is AWS’s native observability service with four main areas:
- Metrics: Time-series data from AWS services (CPU, NetworkIn, etc.) and custom metrics you publish.
- Logs: CloudWatch Logs for storing, searching, and analyzing log data from EC2, Lambda, ECS, etc.
- Alarms: Alerts triggered when metrics exceed thresholds. Can trigger SNS, Auto Scaling, Lambda.
- Dashboards: Visual widgets to display metrics across services in real-time.
For advanced analytics, ship logs to OpenSearch (ELK) or use CloudWatch Logs Insights for SQL-like queries.
How do you reduce AWS costs in a cloud environment? What are your go-to strategies?
Cloud cost optimization is an ongoing practice. High-impact strategies:
- Right-sizing: Use AWS Cost Explorer and Compute Optimizer to identify oversized EC2 instances.
- Reserved Instances/Savings Plans: Commit to 1-3 years for stable workloads — saves up to 72%.
- Spot Instances: Use for stateless, fault-tolerant, or batch workloads. Up to 90% savings.
- S3 Lifecycle policies: Auto-transition to cheaper storage tiers.
- Delete idle resources: Audit unused EIPs, old snapshots, unattached EBS volumes.
- Auto Scaling: Scale down to zero or minimum outside business hours.
How does IAM assume-role work and how do you implement cross-account access securely?
Cross-account access uses the sts:AssumeRole API. A role in Account B has a trust policy that allows Account A to assume it:
# Trust policy on role in Account B
{
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT_A_ID:root"
},
"Action": "sts:AssumeRole"
}
Account A’s entity calls aws sts assume-role to get temporary credentials (up to 12 hours) for Account B. Security controls:
- Add ExternalId condition for third-party access (prevents confused deputy attacks)
- Add MFA condition for sensitive roles
- Use SCPs at the AWS Organization level to restrict what can be assumed
How would you architect a highly available, multi-region AWS deployment?
Multi-region HA involves several layers:
- DNS: Route53 with health checks and latency/failover routing policies to direct users to the nearest healthy region.
- Data replication: RDS Multi-Region Read Replicas with promotion capability. DynamoDB Global Tables for active-active.
- Edge: CloudFront CDN with origins in multiple regions.
- Infrastructure: Identical infrastructure in each region managed by Terraform.
- DR strategy: Define RTO (Recovery Time Objective) and RPO (Recovery Point Objective) to determine your architecture (Pilot Light, Warm Standby, or Active-Active).