TG
ai·agents·Software Engineering·16 min read

Superpowers vs Agent Skills vs Matt Pocock's Skills

Superpowers vs Agent Skills vs Matt Pocock's Skills: compare autonomy, lifecycle coverage, and daily tooling to choose coding-agent skills with less risk.

Ler em português
Superpowers vs Agent Skills vs Matt Pocock's Skills

Superpowers vs Agent Skills vs Matt Pocock's Skills does not have one universal winner. The right choice depends on the work you want to delegate to the agent: discovering the problem, shipping a feature with broad validation, or using focused commands in a daily engineering flow.

My practical rule is simple: use Superpowers when the problem still needs careful thinking before code. Use Agent Skills when the problem is already clear and you want broad validation. Use Matt Pocock's Skills when you want a lighter set of commands to align requirements, do TDD, diagnose bugs, and improve architecture without adopting a whole framework.

That rule comes from the skill files themselves:

RecommendationEvidence in the projectWhat it suggests
Use Superpowers when the problem still needs thinking before codebrainstorming/SKILL.md requires the agent to understand context, ask questions, write a design, and wait for approval before implementation. The README.md also describes the flow as approved design, planning, TDD, worktrees, subagents, and review.It is strongest when the task has ambiguity, risk, or architecture decisions.
Use Agent Skills when the problem is clear and you want broad validationREADME.md organizes the pack around /spec, /plan, /build, /test, /review, and /ship. using-agent-skills/SKILL.md routes work into UI, browser testing, security, performance, observability, and launch readiness.It is strongest when the feature has a defined goal and needs several quality lenses.
Use Matt Pocock's Skills when you want smaller, more manual commandsREADME.md presents the pack as daily skills for real engineering. docs/invocation.md separates human-invoked and model-invoked skills. Skills like /grill-me, /to-prd, /to-issues, /tdd, /diagnosing-bugs, and /improve-codebase-architecture target specific pains.It is strongest when you want human control and focused tools, not a whole methodology.

Another way to read the comparison is to separate the why, the how, and the what behind each choice.

ProjectWhy use itHow it worksWhat you get
SuperpowersBecause the biggest risk is implementing too early without understanding the problem.It first forces brainstorming, approved design, and a plan. Then it executes with TDD, worktrees, subagents, and review after each step.A fuller trail of reasoning, planning, execution, review, and evidence.
Agent SkillsBecause the biggest risk is missing a quality lens before shipping.It organizes delivery into spec, plan, build, test, review, and ship, with dedicated skills for UI, API, security, performance, observability, and launch.A product-engineering checklist style delivery with broad validation.
Matt Pocock's SkillsBecause the biggest risk is the agent drifting away from human intent or adding too much process.The human calls focused commands to grill requirements, generate a PRD, break issues down, do TDD, diagnose bugs, and review architecture.A smaller and more controlled kit for improving specific decisions without replacing your whole workflow.

What does each project optimize for?

Superpowers presents itself as a complete software development methodology for coding agents. Its flow pulls the agent into brainstorming, approved design, detailed planning, isolated worktrees, Test-Driven Development (TDD), subagents implementing tasks, and review after each step.

Agent Skills presents itself as a production-grade engineering skill pack for agents. It is organized by lifecycle: define, plan, build, verify, review, and ship. The pack includes skills for UI, APIs, security, performance, documentation, observability, CI/CD, and launch readiness.

Matt Pocock's Skills presents itself as a set of daily skills for real engineering, not vibe coding. The focus is human control, small skills, easy adaptation, and useful commands like /grill-me, /grill-with-docs, /tdd, /diagnosing-bugs, /to-prd, /to-issues, and /improve-codebase-architecture.

I refreshed all three repositories on June 30, 2026 before writing this post. In that snapshot, Superpowers had 14 SKILL.md files; Agent Skills had 24; Matt Pocock's Skills had 25 active skills outside deprecated and in-progress folders (36 if you count everything).

CriterionSuperpowersAgent SkillsMatt Pocock's Skills
Core ideaExecution methodology for agentsEngineering lifecycle packaged as skillsDaily toolkit for engineers
Skills in snapshot142425 active
OrganizationBrainstorm, plan, execute, review, finishDefine, plan, build, verify, review, shipEngineering, productivity, misc, personal
Main strengthAutonomy with subagents and worktreesBroad validation and shipping coverageClear requirements, TDD, debugging, architecture
Main riskCan feel heavy for small tasksCan widen scope too muchMore personal and less complete as a system

How do the workflows differ?

Superpowers tries to stop the agent from jumping into code too early. Brainstorming comes before implementation. After that, the agent writes a plan, works in isolation, uses TDD, and goes through review. The philosophy is strong: process before speed.

Agent Skills tries to cover the full path of delivery. The pack has commands and skills to turn an idea into a spec, break tasks down, implement in slices, test, review, simplify, measure performance, check security, and prepare launch. The philosophy is coverage: each phase has a matching skill.

Matt Pocock's Skills tries to fix common agent failures without taking the whole process out of your hands. The center is alignment: interrogate requirements, create shared language, use TDD, diagnose bugs with discipline, and improve code design. The philosophy is control: small skills you choose and adapt.

MomentSuperpowersAgent SkillsMatt Pocock's Skills
Vague requirementBrainstorming and approved designInterview, spec, and plan/grill-me or /grill-with-docs
ImplementationPlan with subagents and reviewsIncremental slices/implement or /tdd
TestsStrict red-green-refactorTDD inside a broader strategy/tdd as a practical loop
DebuggingSystematic debuggingDebugging and error recovery/diagnosing-bugs
ArchitecturePlanning and reviewDesign, docs, and ADRs/codebase-design and /improve-codebase-architecture

Which skills are equivalent?

The mapping is not perfect, because each project slices the process differently. Still, you can map the main intent behind each skill.

IntentSuperpowersAgent SkillsMatt Pocock's Skills
Routingusing-superpowersusing-agent-skillsask-matt
Requirements discoverybrainstorminginterview-me, idea-refinegrill-me, grill-with-docs
PRD or spec writingbrainstorming, writing-plansspec-driven-developmentto-prd
Task breakdownwriting-plansplanning-and-task-breakdownto-issues
Implementationexecuting-plans, subagent-driven-developmentincremental-implementationimplement
TDDtest-driven-developmenttest-driven-developmenttdd
Bug investigationsystematic-debuggingdebugging-and-error-recoverydiagnosing-bugs
Code reviewrequesting-code-review, receiving-code-reviewcode-review-and-qualityreview still in progress
Completion verificationverification-before-completionshipping-and-launch, browser-testing-with-devtoolstypecheck, focused tests, and final suite through implement
Git and worktreesusing-git-worktrees, finishing-a-development-branchgit-workflow-and-versioninggit-guardrails-claude-code, merge conflicts
ArchitecturePlanning, review, and systematic debuggingapi-and-interface-design, documentation-and-adrscodebase-design, domain-modeling, improve-codebase-architecture
Parallelismdispatching-parallel-agents, subagent-driven-developmentspecialist agents and orchestrationnot the focus

The most important thing in this table is control. In Matt Pocock's Skills, many high-level skills are invoked by the human. In Superpowers, the router and workflows are more automatic. Agent Skills sits in the middle: it covers many phases, but still structures delivery around checkpoints.

What other evidence supports this reading?

The primary evidence is in the skill files themselves, not in the benchmark. I documented the full trail in the comparison repo's EVIDENCE.md. The reading is:

  • Matt Pocock's Skills: docs/invocation.md separates human-invoked skills from model-invoked skills. Human-invoked skills are only reachable when a person types the command and they use disable-model-invocation: true. In the snapshot, there were 25 active skills and 13 human-invoked skills. That group includes grill-me, grill-with-docs, to-prd, to-issues, triage, improve-codebase-architecture, and implement. This supports the claim that it keeps more control with the human.
  • Superpowers: brainstorming/SKILL.md blocks implementation until the design is approved. After that, README.md and subagent-driven-development/SKILL.md tell the agent to create a plan, execute tasks with subagents, review each step, and avoid pausing between tasks unless blocked, genuinely ambiguous, or done. This supports the claim that it becomes more autonomous after approval.
  • Agent Skills: README.md, using-agent-skills/SKILL.md, and spec-driven-development/SKILL.md organize the pack around define, plan, build, verify, review, and ship, with 24 skills, human checkpoints, and /build auto for one approved implementation pass with verification. This supports the middle position.

The local benchmark is secondary evidence. It helps show how those instructions appear in one controlled React task, but it does not prove general performance across all repositories, models, or prompts.

What changes during a real feature?

In a real feature, Superpowers tends to spend more time at the beginning. That is useful when there are architecture decisions, hidden risks, or real ambiguity. The cost appears when the task is small, because the ritual can become heavier than the problem.

Agent Skills tends to move faster toward execution when the problem is already well defined. Its advantage shows up in validation: UI, API, security, performance, tests, documentation, and launch all have their own skills. The risk is that the agent may apply too large a checklist to a small change.

Matt Pocock's Skills tends to be more direct and less ceremonial. It shines when you want to call one specific skill for one specific pain: align the requirement, generate a PRD, break issues down, run TDD, investigate a bug, or improve architecture. The risk is expecting the same broad coverage as Agent Skills or the same autonomous flow as Superpowers.

I would choose this way:

  • Use Superpowers for large refactors, architecture work, hard bugs, risky migrations, and features where the question is not mature yet.
  • Use Agent Skills for product features, pages, APIs, UI flows, audits, release preparation, and changes that need broad coverage.
  • Use Matt Pocock's Skills for a more manual and sharp workflow, when you want to choose focused commands without handing the whole process to a framework.
  • Do not run several active meta-routers at once. They can compete on commands, triggers, and TDD philosophy.

What did Om Mishra's benchmark show?

The most concrete test I found comparing two of them was Om Mishra's article, Superpowers vs Agent-Skills: Faster Shipping, Safer Reasoning. He ran a comparison with the same model, the same repository, the same prompt, separate worktrees, and only one changed factor: the skill framework.

The main numbers:

MetricSuperpowersAgent Skills
First code changeAbout 12 minutesAbout 8 minutes
Total timeAbout 22 minutesAbout 22 minutes
Validation passes57
Tests added78
Replans11
Context rereads00

The fair reading is simple: in that scenario, Agent Skills reached code faster and validated more. Superpowers invested more in initial reasoning. The test does not include Matt Pocock's Skills, so it should not rank all three. It does show the central trade-off between broad validation and deeper reasoning before execution.

How did I run a local benchmark?

To compare all three with the same task, I created two local exercises. The first was a real implementation benchmark. The second was a smaller tabletop benchmark to compare the PRD, SPEC, and reasoning shape each skill set tends to produce.

The main benchmark lives in .context/skill-benchmark/. The task was to build the same React app three times, once per competitor, each isolated in its own directory:

  • implementations/addyosmani/
  • implementations/superpowers/
  • implementations/mattpocock/

The implemented app was a React Todo Decision Board. It had to support:

  • Add a todo with title, priority, and optional due date.
  • Show a list with title, priority badge, due date, and completed state.
  • Toggle completion.
  • Delete a todo.
  • Filter by all, active, and completed.
  • Filter by priority.
  • Clear completed todos.
  • Show summary counts for total, active, completed, and active high-priority todos.
  • Persist to localStorage.
  • Handle malformed stored JSON without breaking the interface.
  • Use labels, keyboard-reachable controls, and status that does not depend only on color.
  • Use a polished responsive dark UI without an external component library.

Each competitor produced the same artifact types:

  • PRD.md
  • SPEC.md
  • IMPLEMENTATION_NOTES.md
  • src/App.tsx
  • src/styles.css
  • src/main.tsx
  • src/types.ts
  • index.html
  • package.json

The second exercise lives in .context/skill-framework-benchmark/. It was smaller: a React component called TaskRadar, with a PRD, SPEC, and implementation. The goal was not to measure a full UI, but to observe the kind of output each framework encourages.

TaskRadar had to:

  • Add a task with title, priority, effort, and optional project.
  • Mark a task as done.
  • Filter by status and priority.
  • Show total tasks, completed tasks, remaining effort, and next task.
  • Keep core logic testable outside React.

In the smaller exercise, the artifacts were:

  • shared-task.md, with the common prompt.
  • superpowers/, with the Superpowers-style PRD, SPEC, and implementation.
  • agent-skills/, with the Agent Skills-style PRD, SPEC, and implementation.
  • mattpocock/, with the Matt Pocock-style PRD, SPEC, and implementation.
  • report.md, with the comparative scoring.

This was not a scientific benchmark or a timed race with three independent agents. It was a controlled comparison using the skill files as the operating mode. The goal was to observe process differences: where the human decides, where the agent keeps going, how PRD and SPEC are structured, what validation shows up, and how each workflow turns the same task into code.

The minimum validation for the implementations was strict TypeScript through bunx tsc. Because the root repository already had a tsconfig.json, I ran the check with --ignoreConfig so it validated only the benchmark files. All three implementations passed that check.

What came out of the local benchmark?

CriterionSuperpowersAgent SkillsMatt Pocock's Skills
Requirement alignment454
Human-in-the-loop445
Autonomy and delegation543
PRD quality355
SPEC quality454
Implementation shape454
Testability544
Scope control445
Accessibility354
Verification discipline544
Total414542

Superpowers produced the strongest execution loop. The output naturally separated pure logic from React and started from the idea of tests before the component.

Agent Skills produced the most complete package. The PRD, SPEC, accessibility checklist, and implementation were closest to a product delivery.

Matt Pocock's Skills produced the clearest domain language. The output was smaller, better named, and more concerned with the right question before implementation.

What if the analysis focuses only on React code quality?

The reading changes a bit. The benchmark above measures process, PRD, SPEC, validation, and implementation shape. A stronger review should use React, React Patterns, Next.js, and Vercel skills as judges after implementation, not as helpers during implementation.

For this exercise, the applicable review lens is React: state, types, accessibility, persistence, logic separation, and maintainability. Because the app was plain React in a Vite-style setup, the Next.js and Vercel lenses should not judge Server Components, routes, caching, images, or deployment. They are still useful because they show that this benchmark does not measure production readiness on a Next/Vercel stack.

React code criterionSuperpowersAgent SkillsMatt Pocock's Skills
State model445
Type safety and data validation445
Interface accessibility454
localStorage resilience335
Responsive visual polish454
Maintainability445
React total232528

Superpowers was good at pure logic and simplicity. It validated malformed JSON when reading from localStorage, separated helpers like getSummary and isTodo, and kept the UI understandable. Its weak point is that localStorage writes are not caught, and the state grows inside the component.

Agent Skills was stronger in accessibility and visual delivery. It added a form error with role="alert", aria-live for the count, explicit labels, CSS variables, and a more organized responsive layout. The weak point is similar: persistence writes to localStorage without catching failures.

Matt Pocock's Skills was strongest as React code. The useReducer creates a clearer domain model, the actions make the flow predictable, filters are validated before entering state, and persistence reports invalid saved data or write failure. It is not the most visually polished output, but it is the best starting point for evolving the app.

So the honest conclusion is: Agent Skills won the delivery package, but Matt Pocock's Skills won internal React code quality. Superpowers was strongest as an execution and testability loop, but not as the final UI.

When should you choose Superpowers?

Choose Superpowers when you want the agent to act like an engineer who first understands the problem, then designs a solution, then implements. The value appears when the work is long, uncertain, or risky.

Good cases:

  • Planning a feature with many product and architecture decisions.
  • Fixing a bug with no obvious cause.
  • Running a migration with regression risk.
  • Using subagents for independent tasks.
  • Working in isolated worktrees with review after each step.

The signal is this: you do not only want working code. You want a trail of reasoning, planning, execution, review, and evidence.

When should you choose Agent Skills?

Choose Agent Skills when you want lifecycle coverage. It is strong when the feature already has a clear goal and needs to pass through several quality lenses.

Good cases:

  • Building or reviewing UI.
  • Creating an API or module contract.
  • Running security, performance, or accessibility review.
  • Preparing a delivery with a release checklist.
  • Standardizing a product workflow from start to finish.

The signal is this: you want the agent to remember everything a senior team would check before production.

When should you choose Matt Pocock's Skills?

Choose Matt Pocock's Skills when you want a smaller, more adaptable kit that sits closer to daily engineering work. It is less "agent operating system" and more "sharp toolbox".

Good cases:

  • Use /grill-me to avoid misunderstood requirements.
  • Use /grill-with-docs to align requirements and update project context.
  • Use /tdd to force real feedback before implementation grows.
  • Use /diagnosing-bugs to investigate instead of guessing.
  • Use /improve-codebase-architecture when the code starts turning into mud.

The signal is this: you want strong human control, focused commands, and practical discipline without installing a process that is too large.

How would I test all three in the same repository?

I would test all three with the same task, in three clean Conductor workspaces, starting from origin/main. The prompt must be identical. Answers to clarifying questions must also be identical, or the test stops being fair.

I would measure:

  • Time to first code change.
  • Total time until the agent declares completion.
  • Diff size.
  • Files touched.
  • Validation commands executed.
  • Real evidence from lint, tests, or build.
  • Visual quality, if the task touches UI.
  • i18n quality, if the task is bilingual.
  • Whether the agent followed repository rules.
  • Where it overreached, skipped a step, or invented validation.

For this site, a good task is implementing a bilingual /lab/agent-skills page comparing the three projects. It forces routing, metadata, sitemap, i18n, layout, accessibility, and validation, without depending on an external API or secret.

What is my pick?

My pick is Agent Skills when I want speed with broad coverage, Superpowers when I want autonomy with more reasoning before execution, and Matt Pocock's Skills when I want smaller commands to guide my own workflow.

If the work is clear, Agent Skills tends to be more complete. If the work still needs discovery, Superpowers tends to protect the process better. If I already know what I want and only need discipline at critical points, Matt Pocock's Skills tends to be the lightest path.

These were the main references used to compare the three projects:

TL;DR

Superpowers vs Agent Skills vs Matt Pocock's Skills is a choice about work shape. Superpowers favors autonomy, worktrees, subagents, strict TDD, and continuous review. Agent Skills favors full lifecycle coverage, specialized skills, broad validation, and phase checkpoints. Matt Pocock's Skills favors a smaller, direct, adaptable kit for requirement alignment, TDD, bug diagnosis, and architecture improvement.

Do not choose by hype. Choose by task risk. For uncertain, long, architectural work, start with Superpowers. For clear feature work, broad validation, and product delivery, start with Agent Skills. For human control with focused commands, start with Matt Pocock's Skills.

Transparency Note

This article was produced with AI. The comparison repository and benchmarks were run by agents using only the skills mentioned in this article, without extra React, React Patterns, Next.js, or Vercel skills acting as judges during the original run. The author's review focused on text coherence, clarity, cited evidence, and general consistency.

There is still work to do: review the produced results more carefully and study the authors' skills in more depth.

Written by AI, reviewed by Thiago Marinho

June 30, 2026 · Brazil