Back to news

How-to Guide

How to build an AI code review system.

Deploy an automated code review agent that checks every pull request for bugs, style violations, security issues, and performance problems, with human-review integration.

AI Kick Start editorial image for How to build an AI code review system.

Decision

Pilot

Choose one repeated workflow with a visible owner and enough weekly volume to prove the saving.

Risk to watch

Faster mistakes

Keep a review queue and scoped credentials until the workflow has survived real production runs.

Proof to collect

Time baseline

Measure the manual run time, exception rate, approval time, and weekly hours returned.

TL;DR

TL;DR: An AI code review system reads every pull request and looks for bugs, style problems, security holes, and performance issues before a human reviewer opens it. This guide builds the whole thing using [Claude Sonnet 4.6](https://www.cnbc.com/2026/02/17/anthropic-ai-claude-sonnet-4-6-default-free-pro.html) for the analysis, GitHub Actions to run it automatically, and a review screen so a person stays in charge of what actually gets fixed.

Key takeaways

  • Coverage: Bug detection, style, security, performance, documentation
  • Integration: GitHub Actions + PR comments
  • Model: Claude Sonnet 4.6 for code analysis; GPT-5.5 Instant for quick checks
  • Human-in-loop: AI flags issues; humans approve/reject each finding
  • Speed: Target < 30 seconds for PRs under 500 lines

Analysis

Most engineering teams have the same quiet problem with code review. The reviews that matter get rushed, and the ones that don't matter eat an afternoon. A senior developer ends up skimming a 600-line pull request between meetings, missing the off-by-one error, and waving through the part that ships to production.

The idea here is simple: let a model do the first pass. It reads the diff the moment a pull request opens, flags what looks wrong, and posts its notes as comments before any human has spent a minute on it. By the time a reviewer shows up, the obvious stuff is already circled.

The catch, and the reason this isn't just "let the AI approve everything," is that the model only suggests. A person still decides what's a real bug and what's noise. That split matters, so it's baked into the design from the start.

What follows is the build itself: pulling the diff out of GitHub, feeding it to a review agent, and wiring the result back into a pull request. The code below uses Claude Sonnet 4.6 for the heavy analysis and, where you want a faster and cheaper second opinion, GPT-5.5 Instant for quick checks.

Analysis

Prerequisites

  • GitHub repository
  • GitHub Actions enabled
  • Anthropic API key
  • Python 3.10+

Step-by-Step Framework

Step 1: PR Diff Extraction

First job is getting the changes out of GitHub in a shape you can work with. The List pull request files endpoint hands back one object per changed file, with the filename, status, line counts, and the raw patch. The code below grabs that, skips anything that was deleted, and breaks each patch into hunks so you keep track of which line numbers changed.

# code_review/diff_extractor.py
import requests
import re

def fetch_pr_diff(owner: str, repo: str, pr_number: int, token: str) -> list[dict]:
    """Fetch and parse PR diff into structured file changes."""
    url = f"https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/files"
    headers = {"Authorization": f"token {token}", "Accept": "application/vnd.github.v3+json"}

    response = requests.get(url, headers=headers)
    files = response.json()

    changes = []
    for f in files:
        if f["status"] == "removed":
            continue

        patch = f.get("patch", "")
        # Parse hunk headers
        hunks = parse_hunks(patch)

        changes.append({
            "filename": f["filename"],
            "status": f["status"],
            "additions": f["additions"],
            "deletions": f["deletions"],
            "patch": patch,
            "hunks": hunks
        })

    return changes

def parse_hunks(patch: str) -> list[dict]:
    """Parse diff patch into hunks with line numbers."""
    hunks = []
    current_hunk = None

    for line in patch.split("\n"):
        if line.startswith("@@"):
            # New hunk: @@ -old_start,old_count +new_start,new_count @@
            match = re.match(r"@@ -(\d+)?(\d*) \+(\d+)?(\d*) @@", line)
            if match:
                if current_hunk:
                    hunks.append(current_hunk)
                current_hunk = {
                    "old_start": int(match.group(1)),
                    "new_start": int(match.group(3)),
                    "lines": []
                }
        elif current_hunk is not None:
            current_hunk["lines"].append(line)

    if current_hunk:
        hunks.append(current_hunk)

    return hunks

Step 2: Code Analysis Agent

Now the part that does the reading. This agent takes one file change at a time and asks the model to review it. Working file by file keeps each prompt small, which is what holds the response time down on big pull requests. The client comes straight from the official Anthropic Python SDK, so there's no custom plumbing to maintain.

# code_review/analyzer.py
from anthropic import Anthropic
import json

class CodeReviewAgent:
    def __init__(self):
        self.client = Anthropic()

    def review_file(self, file_change: dict, repo_context: str = "") -> list[dict]:
        """Review a single file change and return findings."""
        prompt = f"""You are an expert code reviewer. Review this code change carefully.

File: {file_change['filename']}
Status: {file_change['status']}
Lines changed: +{file_change['additions']}/-{file_change['deletions']}

Repository context: {repo_context}

Code diff:

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick one repeated workflow with a clear owner and weekly volume.
  2. Automate the preparation step first, then keep human approval for important actions.
  3. Measure time saved, errors reduced, and response speed for four weeks.

Want help applying this? Explore our AI automation services.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: How to build an AI code review system

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call