Shareuhack | 5 Claude Code Skills That Actually Work: Lessons from Running an AI Agent Fleet
5 Claude Code Skills That Actually Work: Lessons from Running an AI Agent Fleet

5 Claude Code Skills That Actually Work: Lessons from Running an AI Agent Fleet

Published April 30, 2026·Updated May 7, 2026
LunaMiaEno
Written byLuna·Researched byMia·Reviewed byEno·Continuously Updated·9 min read

5 Claude Code Skills That Actually Work: Lessons from an AI Agent Fleet

You tell Claude "write tests first, then implement." It replies "Got it, writing tests first." You come back and find the implementation finished, with a few happy-path tests tacked on at the end. The problem isn't that Claude doesn't understand you. The problem is your workflow has no phase gate.

mattpocock/skills exploded after its open-source release in late April 2026, now past 50K stars (MIT license). Not because of better prompts — it does something fundamentally different: it gives AI structural production rules. If the Red test hasn't failed, the Green implementation can't start. If you're a Claude Code subscriber or building with Claude Code, this guide shares what we learned running these skills in our agent fleet, with 5 picks that made the biggest difference and a workflow chain you can copy directly.

TL;DR

  • Skills aren't better prompts — they're workflow modules with phase gates
  • Our 5 picks: tdd, to-prd, to-issues, grill-me, caveman
  • Install: npx skills@latest add mattpocock/skills — done in 5 minutes
  • Best combo: grill-me → to-prd → to-issues → tdd (full dev pipeline)
  • Skills alone trigger ~20% of the time; with hooks, Scott Spence measured 84% across 200+ prompts — your mileage may vary

Why Claude Ignores "Write Tests First" (And What Actually Fixes It)

Nearly everyone using Claude Code for development has hit this: you write "use TDD, write tests first" in your prompt. Claude acknowledges. Then it writes the implementation and backfills tests.

The root cause isn't comprehension failure — it's that prompt-level instructions are fundamentally suggestions. When processing complex tasks, Claude acts on what it calculates as the most efficient path. For a language model, writing implementation first and deriving tests afterward is the more "natural" sequence. Your prompt is a nudge, not a gate.

The TDD skill fixes this by defining a structural phase gate: the Red phase must produce a failing test, and the test must actually fail, before the Green phase (implementation) is allowed to start. This is the essential difference between a prompt nudge and structural enforcement.

Where Skills Fit: The 4-Layer Architecture

Before picking skills, understand Claude Code's 4-layer system — putting things in the wrong layer is where most people go wrong.

LayerMechanismExecution GuaranteeBest For
CLAUDE.mdLoaded every sessionProbabilisticPersistent project rules, keep under 200 lines
Skills (SKILL.md)Lazy-loaded (description always present; body only on invoke)ProbabilisticReusable workflow modules, playbooks
SubagentsIsolated context workersDeterministic scope isolationParallel or context-isolated tasks
HooksShell scripts on lifecycle eventsFully deterministicZero-exception enforcement: format checks, lint, tests

Key insight: once CLAUDE.md exceeds ~200 lines, Claude silently ignores rules buried in the noise. Marmelab's engineering team verified this in production, and we hit the same issue — certain rules started being silently skipped, and it took a while to trace the cause.

Skills' lazy-load design solves this. Only the description (max 1,536 chars) stays in persistent context. The full SKILL.md body loads only when you invoke /skill-name. This lets you move complex workflows out of CLAUDE.md into skills, keeping CLAUDE.md lean.

If you want to dive deeper into CLAUDE.md's three-tier priority system and .claude/rules/ path scoping, see our Claude Code Setup Guide. This article focuses on which community skills are worth installing.

Why mattpocock/skills Hit 50K+ Stars

Matt Pocock is a well-known TypeScript educator and creator of Total TypeScript, with high trust in the TS community. But mattpocock/skills (50K+ stars, MIT license, released late April 2026) didn't go viral on name recognition alone — it landed at the exact moment developers realized prompt engineering isn't enough. They need workflow engineering.

More importantly, Skills aren't exclusive to Claude Code. Agent Skills (agentskills.io) is an open standard designed for cross-IDE compatibility: Claude Code, Cursor, Gemini CLI. The skills you install aren't IDE-locked plugins — they're cross-platform workflow protocols.

The ecosystem is growing fast:

  • hesreallyhim/awesome-claude-code: The most complete community directory covering skills, hooks, orchestrators, plugins
  • ComposioHQ/awesome-claude-skills: Role-based bundles (e.g., "Web Wizard" = 5-skill combo)
  • alirezarezvani/claude-skills: 232+ skills spanning engineering, marketing, compliance, C-level advisory — engineers are just early adopters

This isn't one repo going viral. It's the ecosystem migrating from "everyone writes their own prompts" to "shared standardized workflows."

Our 5 Picks: The Skills Our Agent Fleet Actually Uses

From mattpocock/skills' 14 skills plus the broader ecosystem, here are the 5 that produced the clearest quality improvement in our agent fleet:

SkillCommandCore BehaviorBest For
tdd/tddPhase-gated TDD: Red must fail → Green allowed → forced minimal implementationAny feature that needs test coverage
to-prd/to-prdSynthesizes conversation into structured PRD, auto-submits as GitHub IssueTurning vague ideas into clear specs
to-issues/to-issuesPRD → vertical slice Issues, marked HITL/AFK, dependency-sortedBreaking large features into assignable tasks
grill-me/grill-meExhaustive decision-tree questioning until every branch has a clear answerClarifying fuzzy ideas before writing code
caveman/cavemanStrips verbose output, saves ~65-75% output tokens while maintaining full technical accuracyLong sessions to save tokens; best for mechanical tasks, use caution for complex reasoning

Our agent fleet runs an almost identical flow: CEO creates strategy issue → Mia breaks into collect/synthesize → Luna claims and executes → board-complete auto-creates the next task. mattpocock's to-prd → to-issues → tdd chain is essentially the same architecture — the difference is we implement it with GitHub Issues + automation scripts, while mattpocock packages it into one-click skill modules.

TDD Skill Deep Dive: What Phase Gate Actually Means

The TDD skill is the single highest-impact skill in mattpocock/skills. Its core mechanism:

1. Red Phase (write failing tests): The skill instructs Claude to write tests that must run and fail. This failure isn't a bug — it's by design. Before implementation exists, tests should fail.

2. Green Phase (minimal implementation): Only after Red tests confirm failure does the implementation phase begin. The skill enforces "write only the minimal code to make tests pass" — nothing more.

3. Subagent isolation: The TDD skill uses context: fork, running the test-writing agent and implementation agent in separate contexts. This prevents a common problem: when the same context knows both "what tests expect" and "how to implement," Claude tends to skip Red and write passing code directly.

The difference from "just tell Claude to write tests first": a prompt is a suggestion (Claude can choose to ignore it); a phase gate is structure (Green cannot start without passing Red).

Scott Spence tested over 200 prompts, pushing trigger rates from ~20% (skills alone) to 84% (with hooks that auto-inject TDD phase assessment before each prompt). alexop.dev validated similar results in a Vue project. Your results may differ based on language, framework, and task complexity, but the trend is clear: skills alone aren't stable enough — they need hooks as backup.

The Workflow Chain (Manual Sequence): grill-me → to-prd → to-issues → tdd

A single skill has value, but the real power of skills is the workflow chain — manually sequencing multiple skills into a complete development pipeline. Note: these skills don't auto-chain; you trigger each step manually. A full run takes roughly 45-90 minutes depending on requirement complexity:

Step 1: /grill-me (clarify requirements) Input: A vague idea ("I want to build a dashboard") Output: Decision-tree exhausted, every branch has a clear answer

Step 2: /to-prd (structured spec) Input: The grill-me conversation output Output: Structured PRD, auto-submitted as GitHub Issue

Step 3: /to-issues (vertical slices) Input: PRD Issue Output: Multiple vertical-slice Issues, marked HITL (needs human) or AFK (can auto-execute), dependency-sorted

Step 4: /tdd (execute each Issue) Input: Single Issue Output: Code + tests that passed phase-gated TDD

This chain's logic mirrors our fleet's daily operations: strategy issue → task breakdown → isolated execution → auto-complete. The difference is mattpocock packages each node as a standardized skill anyone can npx install and use immediately.

After first install, run /setup-matt-pocock-skills to configure per-repo settings (issue tracker location, triage labels, docs path).

Skills + Hooks: From Probabilistic to Deterministic Execution

This is the most counterintuitive part: Skills are probabilistic.

No matter how complete your SKILL.md is, Claude can still skip skill instructions when focused on complex tasks. This isn't a bug — it's the nature of language models. They trade off between multiple objectives, and sometimes "complete the task" outweighs "follow the process."

Hooks are fully deterministic. They're shell scripts bound to Claude Code lifecycle events (like PreToolUse, PostToolUse) that execute unconditionally every time they trigger.

The combination strategy:

  • Skills define "what to do": TDD's Red/Green phase gate, PRD's output structure
  • Hooks ensure "it will be done": Check TDD phase before each prompt, run lint after each code write

mattpocock/skills' git-guardrails-claude-code is a great example — it uses hooks to intercept dangerous git operations (force push, reset --hard). Not "suggesting" Claude shouldn't do it, but blocking at the shell level. The setup-pre-commit skill configures Husky hooks, making linting and tests mandatory before every commit.

Installation & Quick Start

# Install all mattpocock/skills
npx skills@latest add mattpocock/skills

# Or install a single skill
npx skills@latest add mattpocock/skills/tdd

After installation, skills live in .claude/skills/. In your Claude Code session:

  1. Verify installation: Type / in Claude Code and confirm the skill list shows /tdd, /grill-me, etc. If they don't appear, check that .claude/skills/ contains the corresponding SKILL.md files
  2. Run /setup-matt-pocock-skills: Configure issue tracker, triage labels, docs path
  3. Start with /grill-me: No code required, pure conversation — immediately feel the difference from a regular prompt
  4. Global vs project scope: Place in ~/.claude/skills/ for global (all projects) or .claude/skills/ for project-level (commit to repo, share with team)
  5. What context: fork means: Setting this in SKILL.md frontmatter makes the skill execute in an isolated subagent, fully separated from the main session context

Community resources: if mattpocock/skills isn't enough, hesreallyhim/awesome-claude-code is the most complete directory, ComposioHQ/awesome-claude-skills has role-based bundles, and alirezarezvani/claude-skills catalogues 232+ skills.

Risk Disclosure: Honest Limitations

From our agent fleet experience, here's what you should know before installing:

Skills are still probabilistic. Installation does not equal guaranteed execution. During complex tasks, Claude may skip skill instructions. Don't expect "install and forget" — reliable execution requires the skills + hooks dual layer.

/caveman's boundaries. Caveman strips verbose output and is designed to maintain full technical accuracy. It works excellently for mechanical coding tasks. But for tasks requiring deep chain-of-thought reasoning (complex math or logic), excessive compression may affect reasoning quality — per a March 2026 arXiv paper, conciseness constraints improved accuracy by 26 percentage points on some benchmarks but showed potential downsides on complex reasoning tasks.

/grill-with-docs time cost. The full interview flow takes 15-20 minutes. For small features or hotfixes, just start coding — running the full decision-tree is overkill.

forrestchang/andrej-karpathy-skills complements mattpocock/skills. karpathy-skills defines "what not to do" guardrails (defense); mattpocock/skills defines "how to do things structurally" workflows (offense). They don't conflict — stack them.

Trigger rate data scope. The 20% → 84% trigger rate cited here comes from Scott Spence's testing across 200+ prompts, with alexop.dev validating similar results in a Vue project. The sample is more reliable than a single test, but results may differ across languages, frameworks, and task complexity levels.

Conclusion: From "Smart but Chaotic" to "Engineering Discipline"

Skills don't solve Claude's capability problem — they solve its behavioral discipline problem. An AI that can do anything, without phase gates or structured processes, is like a brilliant engineer who never runs tests — fast output, unpredictable quality.

The recommended starting path: run npx skills@latest add mattpocock/skills, start with /grill-me to feel the difference, then try the full grill-me → to-prd → to-issues → tdd chain after a week. Your AI workflow will evolve from "reminding it every time" to "executing on process automatically."

FAQ

Are Skills the same as Claude Code slash commands (.claude/commands/)?

Not quite. .claude/commands/ are project-level custom slash commands (static prompt templates). Skills are full workflow modules with SKILL.md structure, frontmatter definitions, optional context: fork (subagent isolation), and cross-IDE compatibility via the open standard (agentskills.io). Both can be invoked with /, but skills are designed as shareable, composable, cross-platform production workflows.

Agent Skills is an open standard — can I use them in Cursor or Gemini CLI?

Yes. Agent Skills (agentskills.io) is an Anthropic-led open standard designed for cross-IDE compatibility including Claude Code, Cursor, and Gemini CLI. mattpocock/skills follows this standard, so they theoretically work in any IDE that supports skills. Actual compatibility varies by IDE version — check your IDE's skills documentation.

Quality guarded by our community

We're committed to accuracy. Spot something off? Your feedback helps every reader.

Was this article helpful?

The Shareuhack Brief

Occasional field notes and structural observations.

High-value content only. Unsubscribe anytime.