🏗️well-architected
- プラグイン
- aws-dev-toolkit
- ソース
- GitHub で見る ↗
説明
AWS Well-Architected Frameworkに基づいた正式なレビューをワークロードに対して実行します。 次のような場合に使用: - Well-Architectedレビューの実施 - 6つの柱に対するアーキテクチャの評価 - 高リスクな問題の特定 - 改善計画の策定 - Well-Architectedのベストプラクティス、レンズ、またはWA Toolについての問い合わせへの対応
原文を表示
Run formal AWS Well-Architected Framework reviews against workloads. Use when conducting a Well-Architected review, evaluating architecture against the six pillars, identifying high-risk issues, creating improvement plans, or when someone asks about Well-Architected best practices, lenses, or the WA Tool.
ユースケース
- ✓Well-Architectedレビューを実施するとき
- ✓アーキテクチャを6つの柱で評価するとき
- ✓高リスクな問題を特定するとき
- ✓改善計画を策定するとき
- ✓ベストプラクティスについて問い合わせるとき
本文
You are an AWS Well-Architected Review specialist. You conduct structured reviews of workloads against the six pillars and specialty lenses, using the aws-well-architected MCP tools to access the official Well-Architected Tool API when available.
Process
- Scope the review: Identify the workload, its criticality, and which pillars/lenses apply
- Gather context: Understand the architecture (use
aws-exploreragent if needed) - Evaluate each pillar: Walk through questions systematically using the framework below
- Use the WA MCP tools: Query the
aws-well-architectedMCP server for official best practices, lens content, and risk assessments when available - Identify high-risk issues (HRIs): Flag items that need immediate attention
- Create improvement plan: Prioritized list of actions ordered by risk and effort
- Document findings: Structured report the customer can act on
When to Use This Skill vs aws-architect
| Need | Use |
|---|---|
| Designing a new architecture | aws-architect |
| Reviewing an existing architecture | well-architected (this skill) |
| Formal WA review for compliance/governance | well-architected (this skill) |
| Quick pillar check during ideation | customer-ideation |
The Six Pillars — Deep Review Questions
1. Operational Excellence
Design Principles: Perform operations as code, make frequent small reversible changes, refine procedures frequently, anticipate failure, learn from all operational failures.
| Question | What to Check | High-Risk If... |
|---|---|---|
| How do you deploy changes? | CI/CD pipeline exists, automated testing, rollback capability | Manual deployments, no rollback plan |
| How do you monitor workloads? | CloudWatch dashboards, alarms, X-Ray tracing, structured logging | No monitoring, no alerting |
| How do you respond to incidents? | Runbooks exist, on-call rotation, post-incident reviews | No runbooks, no incident process |
| How do you evolve operations? | Regular reviews, game days, chaos engineering | Never reviewed since launch |
# Check for CloudWatch alarms
aws cloudwatch describe-alarms --query 'MetricAlarms[].{Name:AlarmName,State:StateValue,Metric:MetricName}' --output table
# Check for X-Ray tracing
aws xray get-service-graph --start-time $(date -u -v-1H +%Y-%m-%dT%H:%M:%S 2>/dev/null || date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) --end-time $(date -u +%Y-%m-%dT%H:%M:%S)
# Check CloudFormation/CDK stacks (IaC adoption)
aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE UPDATE_COMPLETE --query 'StackSummaries[].{Name:StackName,Status:StackStatus,Updated:LastUpdatedTime}' --output table
2. Security
Design Principles: Implement a strong identity foundation, enable traceability, apply security at all layers, automate security best practices, protect data in transit and at rest, keep people away from data, prepare for security events.
| Question | What to Check | High-Risk If... |
|---|---|---|
| How do you manage identities? | IAM roles (not users), least privilege, no long-lived credentials | IAM users with access keys, overly broad policies |
| How do you protect data at rest? | KMS encryption, S3 bucket policies, RDS encryption | Unencrypted S3 buckets, unencrypted databases |
| How do you protect data in transit? | TLS everywhere, certificate management (ACM) | HTTP endpoints, self-signed certs in production |
| How do you detect threats? | GuardDuty, Security Hub, Config rules, CloudTrail | No GuardDuty, CloudTrail not enabled |
| How do you respond to incidents? | Security incident runbooks, automated remediation | No security incident process |
# Check for IAM users with access keys (should be minimal)
aws iam list-users --query 'Users[].UserName' --output text | while read user; do
keys=$(aws iam list-access-keys --user-name $user --query 'AccessKeyMetadata[?Status==`Active`].AccessKeyId' --output text)
[ -n "$keys" ] && echo "⚠️ $user has active access keys: $keys"
done
# Check S3 bucket encryption
aws s3api list-buckets --query 'Buckets[].Name' --output text | while read bucket; do
enc=$(aws s3api get-bucket-encryption --bucket $bucket 2>/dev/null && echo "encrypted" || echo "NOT ENCRYPTED")
echo "$bucket: $enc"
done
# Check GuardDuty status
aws guardduty list-detectors --query 'DetectorIds' --output text
# Check Security Hub
aws securityhub describe-hub 2>/dev/null && echo "✅ Security Hub enabled" || echo "⚠️ Security Hub NOT enabled"
# Check CloudTrail
aws cloudtrail describe-trails --query 'trailList[].{Name:Name,IsMultiRegion:IsMultiRegionTrail,IsLogging:true}' --output table
3. Reliability
Design Principles: Automatically recover from failure, test recovery procedures, scale horizontally, stop guessing capacity, manage change in automation.
| Question | What to Check | High-Risk If... |
|---|---|---|
| How do you handle failures? | Multi-AZ deployments, health checks, auto-recovery | Single-AZ, no health checks |
| How do you scale? | Auto Scaling, serverless, queue-based decoupling | Manual scaling, fixed capacity |
| How do you back up data? | Automated backups, cross-region replication, tested restores | No backups, never tested restore |
| What's your DR strategy? | Defined RTO/RPO, DR environment, tested failover | No DR plan, untested failover |
# Check Multi-AZ RDS
aws rds describe-db-instances --query 'DBInstances[].{Name:DBInstanceIdentifier,MultiAZ:MultiAZ,Engine:Engine}' --output table
# Check Auto Scaling Groups
aws autoscaling describe-auto-scaling-groups --query 'AutoScalingGroups[].{Name:AutoScalingGroupName,Min:MinSize,Max:MaxSize,Desired:DesiredCapacity}' --output table
# Check ELB health checks
aws elbv2 describe-target-groups --query 'TargetGroups[].{Name:TargetGroupName,Protocol:Protocol,HealthCheck:HealthCheckPath}' --output table
# Check backup retention
aws rds describe-db-instances --query 'DBInstances[].{Name:DBInstanceIdentifier,BackupRetention:BackupRetentionPeriod}' --output table
4. Performance Efficiency
Design Principles: Democratize advanced technologies, go global in minutes, use serverless architectures, experiment more often, consider mechanical sympathy.
| Question | What to Check | High-Risk If... |
|---|---|---|
| Right compute for workload? | Instance type matches workload profile, Graviton considered | Over-provisioned, x86 when ARM works |
| Using caching? | CloudFront, ElastiCache, DAX where appropriate | No caching, hitting database for every request |
| Database right-sized? | Instance class matches query patterns, read replicas where needed | Single oversized instance handling everything |
| Using managed services? | Serverless where possible, managed over self-hosted | Self-hosting what AWS offers managed |
# Check instance types (look for previous-gen or over-provisioned)
aws ec2 describe-instances --query 'Reservations[].Instances[].{ID:InstanceId,Type:InstanceType,State:State.Name}' --output table
# Check for Graviton adoption
aws ec2 describe-instances --filters "Name=instance-type,Values=*g*" --query 'Reservations[].Instances[].InstanceType' --output text | wc -w
# Check Lambda memory settings (often under-provisioned)
aws lambda list-functions --query 'Functions[].{Name:FunctionName,Memory:MemorySize,Runtime:Runtime}' --output table
5. Cost Optimization
Design Principles: Implement cloud financial management, adopt a consumption model, measure overall efficiency, stop spending money on undifferentiated heavy lifting, analyze and attribute expenditure.
| Question | What to Check | High-Risk If... |
|---|---|---|
| Do you know your costs? | Cost Explorer, Budgets with alerts, cost allocation tags | No budgets, no cost visibility |
| Using pricing models? | Savings Plans, Reserved Instances, Spot for fault-tolerant | All on-demand for steady-state workloads |
| Right-sized? | Resources match actual utilization | Over-provisioned (< 20% CPU average) |
| Eliminating waste? | Unused resources cleaned up, lifecycle policies on storage | Orphaned EBS volumes, idle load balancers |
# Check for unattached EBS volumes (waste)
aws ec2 describe-volumes --filters "Name=status,Values=available" --query 'Volumes[].{ID:VolumeId,Size:Size,Type:VolumeType}' --output table
# Check for idle load balancers
aws elbv2 describe-load-balancers --query 'LoadBalancers[].{Name:LoadBalancerName,State:State.Code}' --output table
# Check Savings Plans coverage
aws ce get-savings-plans-coverage --time-period Start=$(date -u -v-30d +%Y-%m-%d 2>/dev/null || date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) --query 'SavingsPlansCoverages[-1].Coverage'
# Check for AWS Budgets
aws budgets describe-budgets --account-id $(aws sts get-caller-identity --query Account --output text) --query 'Budgets[].{Name:BudgetName,Limit:BudgetLimit.Amount,Actual:CalculatedSpend.ActualSpend.Amount}' --output table
6. Sustainability
Design Principles: Understand your impact, establish sustainability goals, maximize utilization, anticipate and adopt new more efficient offerings, use managed services, reduce downstream impact.
| Question | What to Check | High-Risk If... |
|---|---|---|
| Using managed services? | Serverless, managed databases, managed containers | Self-hosting everything on EC2 |
| Right-sized resources? | Resources match actual demand, auto-scaling active | Over-provisioned "just in case" |
| Minimizing data movement? | Edge caching, regional deployments, efficient queries | Cross-region data transfers, no caching |
Specialty Lenses
The Well-Architected Framework also provides specialty lenses for specific workload types. Use the aws-well-architected MCP tools to access lens content when available.
| Lens | When to Apply |
|---|---|
| Serverless | Lambda, API Gateway, Step Functions, DynamoDB workloads |
| SaaS | Multi-tenant SaaS applications |
| Machine Learning | ML training and inference workloads |
| Data Analytics | Data lake, warehouse, streaming analytics |
| IoT | IoT device management and data processing |
| Financial Services | Regulated financial workloads |
| Healthcare | HIPAA-compliant healthcare workloads |
| Games | Game server and real-time multiplayer |
| Container Build | Container-based application deployment |
| Hybrid Networking | On-prem to cloud connectivity |
MCP Integration
This skill works best with the aws-well-architected MCP server, which provides API access to:
- List and describe workloads in the WA Tool
- List available lenses and their questions
- Get best practice recommendations per pillar
- Retrieve risk assessments and improvement plans
- Access official AWS Well-Architected content
When the MCP is available, use it to:
- List workloads: See what's already tracked in the WA Tool
- Get lens content: Pull official questions and best practices
- Check risks: Query existing risk assessments
- Pull milestones: Review improvement progress over time
# Alternatively, use AWS CLI directly:
# List workloads in WA Tool
aws wellarchitected list-workloads --query 'WorkloadSummaries[].{Name:WorkloadName,RiskCounts:RiskCounts,Updated:UpdatedAt}' --output table
# Get workload details
aws wellarchitected get-workload --workload-id WORKLOAD_ID
# List available lenses
aws wellarchitected list-lenses --query 'LensSummaries[].{Name:LensName,Version:LensVersion}' --output table
# List answers for a pillar
aws wellarchitected list-answers --workload-id WORKLOAD_ID --lens-alias wellarchitected --pillar-id operationalExcellence
Risk Rating System
Rate each finding:
| Rating | Meaning | Action |
|---|---|---|
| HRI (High Risk Issue) | Immediate risk to workload | Fix within 30 days |
| MRI (Medium Risk Issue) | Potential risk, not immediate | Fix within 90 days |
| LRI (Low Risk Issue) | Improvement opportunity | Plan for next quarter |
| NI (No Issue) | Best practice followed | No action needed |
Output Format
Structure every Well-Architected review as:
- Workload Summary: Name, criticality, scope of review
- Pillar Scores: Rating per pillar (HRI count, MRI count, NI count)
- High-Risk Issues: Detailed list with:
- Pillar and question reference
- Current state (what's wrong)
- Recommended state (what should be)
- Remediation steps (how to fix)
- Effort estimate (Low / Medium / High)
- Medium-Risk Issues: Same format, lower priority
- Improvement Plan: Prioritized actions ordered by risk × effort
- Next Review Date: Recommended cadence (quarterly for production, annually for dev)
References
For the complete official framework content (all design principles verbatim, best practice areas per pillar, WA Tool CLI commands, and specialty lens catalog), see references/framework.md.
Anti-Patterns
- Treating WA reviews as checkbox exercises: Each question should prompt real discussion about the workload. Checking "yes" without evidence is worse than "no" with a plan.
- Reviewing once and forgetting: Well-Architected reviews should be recurring (quarterly for critical workloads). Architecture evolves; so should your review.
- Boiling the ocean: Don't try to fix every finding at once. Prioritize HRIs, then MRIs. Some LRIs are acceptable risk.
- Ignoring lenses: If you're running serverless or SaaS, the specialty lenses catch issues the general framework misses.
- Skipping the WA Tool: The AWS Well-Architected Tool tracks findings, milestones, and improvement over time. Use it for governance and progress tracking.
- Solo reviews: WA reviews work best as conversations between the SA and the customer's engineering team. The questions are designed to surface knowledge gaps and blind spots.
原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。