Your AI Agent Should Review Your Pull Requests

I review pull requests every day. Most of the time, it goes like this: I skim the diff, check for obvious bugs, leave a couple of comments about naming or structure, and approve. The whole thing takes 15 minutes if I'm focused, 40 if Slack keeps pinging me.

Here's the thing nobody wants to admit: I miss stuff. Everyone does. After the third PR of the day, your eyes glaze over. You stop reading the migration file closely. You trust that the tests pass because they passed last time. And then, two weeks later, something breaks in production and you trace it back to a change that went through review without a single comment.

AI agents can do something about this. Not the "AI linting" tools that tell you a variable name is too short. I mean actual agents that read the full context of a PR, understand what it's trying to do, and flag the things that matter.

What makes an agent different from a linting bot

Every team has ESLint. Maybe Prettier. Maybe SonarQube running in CI. These tools are fine. They catch formatting issues, unused imports, basic security patterns. They don't understand your codebase.

An AI agent reviewing your PR can:

Read the linked issue or ticket to understand what problem the code solves
Look at previous PRs that touched the same files
Notice when a migration doesn't have a rollback
Catch logic bugs that compile perfectly but do the wrong thing
Flag when an API endpoint suddenly accepts new parameters without validation

That last one bit my team a few months ago. Someone added an optional field to a request body. No validation, no sanitization, nothing in the docs. The linter was happy. The types compiled. It shipped. Two weeks later, someone passed HTML in that field and we had a stored XSS.

An agent reading that PR would have caught it. Not because it's smarter than a human, but because it doesn't get bored at 4pm on a Friday.

The setup that actually works

There are two approaches here, and one of them is bad.

The bad approach: Install a SaaS tool that comments on every single PR with a wall of AI-generated suggestions. Your team ignores them all within a week. I've watched this happen at three different companies.

The approach that works: Give an AI agent access to your repo, let it run as part of your workflow, and configure it to only speak up when something actually matters.

Here's what this looks like with an OpenClaw agent on UniClaw:

Your agent watches GitHub webhooks for new PRs
When a PR opens, the agent clones the branch and reads the diff
It pulls context: the linked issue, recent commits to the same files, your team's coding conventions (from a CONVENTIONS.md or similar)
It reviews the changes against that context
It posts comments only when it finds something worth flagging
It stays quiet when everything looks fine

The "stays quiet" part is what separates a useful agent from an annoying one. If your tool comments on every PR with "looks good!" or low-value style nitpicks, people will tune it out. The agent should be like that senior engineer who only speaks up when they spot a real issue.

What to look for (and what to ignore)

After running this for a while, I've found the agent is most useful at catching these categories:

Security gaps — missing input validation, SQL queries built with string concatenation, secrets accidentally committed, auth checks that don't cover new endpoints. These are the bugs that humans skip when they're reviewing fast.

Missing error handling — a new API call that doesn't handle the failure case. A database query that assumes it'll always return results. These are boring to review manually, which is exactly why they slip through.

Inconsistencies with existing patterns — your codebase does pagination one way in 47 endpoints, and this PR introduces a 48th that does it differently. A human reviewer might not remember the convention. The agent just checked.

Migration safety — no rollback script, no backward compatibility, column renames that'll break the old code still running during deployment. This one has saved my team actual production incidents.

What the agent should NOT do:

Comment on style choices that are already covered by your formatter
Suggest renaming variables based on its own preferences
Add "nice work!" comments (nobody needs that from a bot)
Block PRs without a human confirming the issue

The trust problem

There's a reasonable objection here: "If I can't trust the AI to be right 100% of the time, what's the point?"

The point is that you're not trusting it 100% of the time already. You trust your linter. You trust your test suite. Neither is perfect. The agent is one more layer, and it catches a different category of bugs than the other layers.

In my experience, a well-configured agent catches something meaningful on about 15-20% of PRs. The other 80% it stays silent. That ratio is what you want. If it's flagging every single PR, your thresholds are too sensitive. If it never speaks up, it's either misconfigured or your team writes perfect code (unlikely).

The key configuration decision: should the agent be able to block merges, or only comment? My recommendation is comments only, at least at first. Let your team build trust with it. After a month of useful catches, you can graduate it to requiring dismissal before merge.

Setting this up on UniClaw

Your OpenClaw agent on UniClaw already has shell access, the ability to run scripts, and persistent memory. Adding code review is a skill you configure, not a separate product you buy.

The basic flow:

# In your agent's skills directory
~/.openclaw/skills/code-review/SKILL.md

The skill tells your agent how to:

Authenticate with GitHub (personal access token stored securely)
Listen for webhook events (or poll for new PRs on a cron)
Clone and read diffs
Apply your review rules
Post comments via the GitHub API

The whole thing runs on the same $12/month machine that handles your other agent tasks. No per-seat pricing, no separate subscription for "AI code review." Your agent just has another job.

You can also set it up with cron-based polling if webhooks feel like too much infrastructure. Have the agent check for open PRs every 15 minutes, review any new ones it hasn't seen, and move on.

The part nobody talks about: context

Most AI code review tools treat each PR in isolation. They look at the diff, maybe the full file, and that's it. They don't know that you refactored the auth module last month, or that there's a known performance issue with the approach this PR uses, or that this is the third attempt at solving this ticket and the first two had subtle bugs.

An AI agent with persistent memory knows all of that. If it reviewed the previous attempts, it remembers what went wrong. If it saw the refactoring PR, it knows the new patterns. This is the actual advantage of an agent over a standalone tool: it accumulates context over time.

On UniClaw, your agent's memory files persist between sessions. When it reviews PR #847, it can recall what it learned from PR #612 three months ago. That's not something a stateless API call gives you.

Real numbers

For a team of 5 developers, each opening maybe 3-4 PRs per day, here's what I've observed:

Agent review time: 2-5 minutes per PR (most of that is cloning and reading)
Meaningful comments per week: 8-12 (across all PRs)
False positives: about 1 per week (agent flags something that's actually fine)
Production bugs caught before merge: 2-3 per month
Monthly API cost for reviews: around $8-15 (depends on PR size and model choice)

Those 2-3 caught bugs per month are what make this worth it. A single production incident costs more in engineer time, user trust, and stress than a year of agent hosting.

Getting started

If you already run an OpenClaw agent on UniClaw, adding code review is maybe an hour of setup:

Create a GitHub personal access token with repo read + PR comment permissions
Store it in your agent's environment
Write a skill that defines your review rules (what to flag, what to ignore, how to format comments)
Set up either a webhook receiver or a cron job
Let it run for a week in "comment only" mode and see what it catches

If you don't have an agent yet, UniClaw gets you running in under 5 minutes. One-click deploy, $12/month, and your agent lives on a dedicated machine with the shell access it needs to clone repos and run reviews.

The best time to add automated review was before your last production incident. The second best time is now.