AWS DevOps Interview Questions

Are you preparing for an AWS DevOps Interview? This curated list of AWS DevOps Interview Questions covers technical expertise and analytical thinking skills.

1. What is AWS DevOps?

AWS DevOps integrates AWS cloud services with DevOps practices to automate and optimize software delivery.
It uses tools like AWS CodePipeline (CI/CD), CloudFormation (IaC), CloudWatch (monitoring), and ECS/EKS (containers) for scalable, reliable deployments.
Focus areas: infrastructure automation, security (IAM/KMS), cost optimization, and collaboration across development/operations teams.

2. AWS DevOps Interview Topics

  • Core: CI/CD pipelines (CodeBuild/CodeDeploy), IaC (CloudFormation/Terraform), configuration management (Ansible).
  • Services: EC2, S3, Lambda, VPC, CloudWatch/X-Ray.
  • Security: IAM roles, encryption (KMS), compliance (GDPR).
  • Advanced: Serverless architectures, container orchestration (ECS, Kubernetes), disaster recovery.
  • Scenarios: Troubleshooting deployment failures, optimizing CloudFormation templates, and reducing latency.
  • Tools: Jenkins, Docker, Prometheus, Git.
  • Concepts: Blue/green deployments, autoscaling, cost optimization (Cost Explorer)
AWS DevOps Interview Questions

1. How do you uncommit the changes that have already been pushed to GitHub?

To undo committed changes already pushed to GitHub, use git revert to create a new commit, reversing the unwanted changes.

git revert HEAD
git push origin main

This preserves history and avoids disrupting collaborators. If you must erase the commit, use git reset to remove it locally, then force-push (git push -f). However, force-pushing can disrupt others’ work. Use cautiously and communicate with your team. Always prefer git revert shared branches to maintain traceability.

2. If there is suddenly a file in git, how do you get it back?

If a file is deleted but not yet committed, restore it with:

git checkout -- filename

If the deletion was committed, retrieve it from a previous commit. Find the commit hash via git log, then:

git checkout <commit-hash> -- filename

For example, to restore config.yml From the last commit:

git checkout HEAD^ -- config.yml

3. Can you increase the size of the root volume without shutting down the instance?

On AWS, you can resize an EBS root volume without stopping the instance. Modify the volume size in the EC2 console or via CLI. After resizing, extend the file system:

  • For Linux:
sudo growpart /dev/xvda 1
sudo resize2fs /dev/xvda1

Verify with df -h. Note: This works for ext4 file systems; adjust commands for other OS types.

4. If you lost the .pem file, then how will you connect to EC2?

If the .pem file is lost:

  1. Create a new key pair in AWS.
  2. Stop the instance, detach the root volume.
  3. Attach the volume to another instance, replace authorized_keys With the new public key.
  4. Reattach the volume and restart the instance.

Alternatively, use AWS Systems Manager (SSM) if configured, enabling CLI access without a key

5. S3 bucket has a policy for only read-only, but you’re having full access for you? Can you modify S3 objects?

If the bucket policy grants read-only access but your IAM policy allows full access, you can modify objects. IAM policies override bucket policies when granting higher permissions. For example, if your IAM user has s3:PutObject permission, you can overwrite files despite the bucket’s read-only policy. Always follow the principle of least privilege to avoid unintended access.

6. Difference between Classic ELB and Application ELB?

Classic Load Balancer (CLB) operates at both Layer 4 (TCP) and Layer 7 (HTTP/HTTPS), offering basic load balancing.
Application Load Balancer (ALB) is Layer 7-focused, supporting advanced features like path-based routing, host-based routing, and integration with containers/serverless.
For example, ALB can route /api requests to a backend service and /static to a CDN. ALB also natively supports WebSockets and HTTP/2, while CLB lacks these. CLB is legacy; AWS recommends ALB for modern architectures.
Use ALB for microservices or multi-tier apps requiring granular traffic control.

7. How many subnets are assigned to the routing table?

Each subnet is linked to exactly one routing table. However, a routing table can be associated with multiple subnets.
For instance, a public routing table (with an internet gateway route) might map to three public subnets across Availability Zones (AZs).
A private routing table (with a NAT gateway route) could map to five private subnets.
Use the AWS CLI to verify:

aws ec2 describe-route-tables --route-table-id <rtb-id>  

8. In your VPC, all the IPs are finished you require resources to provision them?

If your subnet’s IPs are depleted:

  1. Add a secondary CIDR block to the VPC (e.g., expand from 10.0.0.0/16 to 10.0.0.0/15).
  2. Create a new subnet with a larger CIDR range (e.g., /23 instead of /24).
  3. Terminate unused instances or release Elastic IPs.
  4. Use smaller instance types (e.g., t3.micro uses 1 IP vs. larger instances with multiple NICs).
    Example: A /24 subnet (251 usable IPs) can’t host 300 instances; create a /23 subnet (507 IPs) instead.

9. Are you only using CloudWatch for monitoring?

CloudWatch is foundational, but complemented with:

  • AWS X-Ray for tracing distributed apps (e.g., latency in API Gateway).
  • Datadog/Prometheus for containerized apps (e.g., EKS pod metrics).
  • CloudTrail for auditing API activity.
    Example: Use CloudWatch for CPU alarms, X-Ray for tracing Lambda cold starts, and Prometheus for custom Kubernetes metrics.

10. If you’re using load balancing in 2 availability zones, then which load balancer should you use?

Use Application Load Balancer (ALB) for HTTP/HTTPS traffic across two AZs. ALB requires subnets in each AZ for high availability.

Example:

aws elbv2 create-load-balancer --name my-alb --subnets subnet-123 subnet-456  

ALB distributes traffic to targets (e.g., EC2, ECS) in both AZs. For TCP/UDP, use Network Load Balancer (NLB). Avoid Classic ELB—it lacks AZ-aware routing granularity.

11. Can you write a Docker file where the Linux environment conditions deploy a static web server

Here’s a minimal Dockerfile using Nginx on Alpine Linux:

# Stage 1: Build static assets (optional)  
FROM node:18-alpine AS build  
WORKDIR /app  
COPY . .  
RUN npm install && npm run build  

# Stage 2: Deploy to web server  
FROM nginx:alpine  
COPY --from=build /app/dist /usr/share/nginx/html  
COPY nginx.conf /etc/nginx/conf.d/default.conf  
EXPOSE 80  
CMD ["nginx", "-g", "daemon off;"]  

This uses multi-stage builds to optimize size. Replace npm steps if pre-built files exist. The nginx.conf can define server rules. For Apache, replace nginx:alpine with httpd:alpine and adjust paths.

12. Is it possible to run any VM in AWS without creating an EC2 instance?

No, EC2 is AWS’s primary VM service.
However, AWS Fargate runs containers without managing EC2 instances (serverless containers).
AWS Lambda executes code without servers but isn’t a VM.
For lightweight VM-like workloads, use Firecracker microVMs via AWS Lambda or custom integrations.
Example: Fargate tasks mimic VM behavior but abstract infrastructure management.

13. I want to create a pipeline in Jenkins which needs to have 10 different stages and based on my input it needs to execute some stages not every stages how you will configure that.

Use a Jenkinsfile with parameters and when directives:

pipeline {  
  agent any  
  parameters {  
    choice(name: 'STAGE_SELECTOR', choices: ['build', 'test', 'deploy'], description: 'Select stages')  
  }  
  stages {  
    stage('Build') {  
      when { expression { params.STAGE_SELECTOR.contains('build') } }  
      steps { sh 'make build' }  
    }  
    stage('Test') {  
      when { expression { params.STAGE_SELECTOR.contains('test') } }  
      steps { sh 'make test' }  
    }  
    // Add 8 more stages with similar conditions  
  }  
}  

This skips stages based on user input during the pipeline trigger.

14. What are the Terraform modules? Have you used any modules in the project?

Terraform modules are reusable configurations encapsulating resources. For example, a VPC module can standardize network setups:

module "vpc" {  
  source  = "terraform-aws-modules/vpc/aws"  
  version = "3.14.0"  
  cidr    = "10.0.0.0/16"  
  azs     = ["us-east-1a", "us-east-1b"]  
}  

I’ve used modules for EKS clusters, S3 buckets, and IAM roles to enforce consistency. Modules reduce redundancy and simplify scaling (e.g., reusing an S3 module for 10 buckets).

15. Is it possible to configure communication between 2 servers those are having private access

Yes, use VPC peering or Transit Gateway for cross-account/VPC communication. For same-VPC servers:

  1. Ensure both are in the same subnet or route traffic via private subnets.
  2. Configure security groups to allow inbound/outbound rules (e.g., allow port 443 from server A’s SG to server B’s SG).
  3. Example:
resource "aws_security_group_rule" "allow_private" {  
  type              = "ingress"  
  from_port         = 443  
  to_port           = 443  
  protocol          = "tcp"  
  source_security_group_id = aws_security_group.serverA.id  
  security_group_id = aws_security_group.serverB.id  
}  

16. What happens when you delete /var/lib/docker/overlay?

Deleting this directory removes Docker’s OverlayFS storage driver data, including all images, containers, and volumes.
Running containers will crash due to missing layers, and data not persisted externally will be lost.
For example, a PostgreSQL container without volume mounts will lose its database. To recover:

  • Restore from backups.
  • Rebuild images and containers.
  • Use docker system prune -a to clean up safely.
    Never delete manually—use Docker commands for cleanup.

17. Write a simple script that calls with “Foo” prints “bar” and when called with “bar” prints “Foo”. Every other option should print “Try again.”?

#!/bin/bash  
input=$1  
case $input in  
  "Foo") echo "bar" ;;  
  "bar") echo "foo" ;;  
  *) echo "Try again" ;;  
esac  

Save as script.sh, run chmod +x script.sh, and test:

./script.sh Foo  # Output: bar  
./script.sh bar  # Output: foo  
./script.sh test # Output: Try again  

18. Tell all the scenarios to implement security in Kubernetes.

RBAC: Restrict access with roles and role bindings.

apiVersion: rbac.authorization.k8s.io/v1  
kind: Role  
rules:  
- apiGroups: [""]  
  resources: ["pods"]  
  verbs: ["get", "list"]  
  • Network Policies: Block cross-pod traffic.
  • Secrets Management: Use Secrets or integrate with Vault.
  • Pod Security Policies: Enforce non-root users.
  • Image Scanning: Use Trivy in CI/CD.
  • Audit Logging: Track API server activity.

19. Your EKS application is experiencing higher-than-expected traffic. How would you automatically scale the Pods?

Use Horizontal Pod Autoscaler (HPA) with CPU/metrics:

apiVersion: autoscaling/v2  
kind: HorizontalPodAutoscaler  
metadata:  
  name: my-app-hpa  
spec:  
  scaleTargetRef:  
    apiVersion: apps/v1  
    kind: Deployment  
    name: my-app  
  minReplicas: 2  
  maxReplicas: 10  
  metrics:  
  - type: Resource  
    resource:  
      name: cpu  
      target:  
        type: Utilization  
        averageUtilization: 70  

Combine with Cluster Autoscaler to add nodes if resources are exhausted.

20. Your team needs to be alerted when the CPU usage of any Pod in your EKS cluster exceeds 80% for more than 5 minutes. How would you set this up?

  • Install Prometheus & Alertmanager via Helm:
helm install prometheus prometheus-community/kube-prometheus-stack  
  • Define a PrometheusRule:
apiVersion: monitoring.coreos.com/v1  
kind: PrometheusRule  
metadata:  
  name: cpu-alert  
spec:  
  groups:  
  - name: cpu-usage  
    rules:  
    - alert: HighPodCPU  
      expr: sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod) > 0.8  
      for: 5m  
      labels:  
        severity: critical  
  • Configure Alertmanager to send alerts to Slack/email.

21. Your team wants a Grafana dashboard to visualize the HTTP request latency of your applications running in EKS. How would you achieve this?

  1. Collect Metrics: Deploy Prometheus with a service monitor for your app.
  2. Add Data Source: Link Prometheus to Grafana.
  3. Import/Create Dashboard:
    • Use a pre-built dashboard (e.g., ID 315 for Kubernetes).
    • Add a panel with this query:
      `histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))`
    • Label axes as “Latency (seconds)” and “Time”.

Scroll to Top