Repository Pattern Done Right in PHP and Go, A Decade of Mistakes Distilled

Repository Pattern Done Right in PHP and Go, A Decade of Mistakes Distilled

October 26, 2023 · 9 min read · by Muhammad Amal programming

TL;DR — A repository is a collection of aggregates; treat it like one / Methods name intentions, not queries / Generic Repository<T> is almost always a code smell

The repository pattern is one of those ideas where everyone agrees on the name and nobody agrees on the shape. Half the Laravel tutorials online use “repository” to mean “a class that wraps an Eloquent model with one method per query,” which is closer to a query object than a repository. Half the Go tutorials use it to mean “an interface that exposes CRUD methods,” which is closer to a generic DAO.

The original Evans definition, from Domain-Driven Design, is narrower and more useful: a repository is an in-memory collection of aggregates. You ask it for things, you put things into it, and the persistence mechanism is invisible. That last clause is what most implementations miss.

After decoupling Eloquent the repository is your main interface boundary. This post is about the shape that interface should have. Same problem in PHP and Go; mostly the same answer.

The Mental Model: A Collection

A repository is a collection. Forget databases for a moment. If I had every invoice in memory, what would I do with the collection?

Get one by identity: $invoices->ofId($id)
Add a new one or update an existing one: $invoices->save($invoice)
Filter by domain-meaningful criteria: $invoices->overdueFor($customerId, $asOf)
Maybe remove one: $invoices->remove($invoice)

That is the shape. Three to maybe seven methods, all named in domain language, each one corresponding to something the application actually needs to do.

What I would NOT do with an in-memory collection:

$invoices->findByCustomerIdAndStatusAndCreatedAtBetween(...) — that is a query, not a collection operation.
$invoices->paginate(15)->where('status', 'open')->orderBy('due_at') — that is a query builder.
$invoices->update(['status' => 'overdue'], ['due_at' => '<', $now]) — that is a SQL UPDATE.

The first one belongs on a query service (the read side). The second one belongs in infrastructure, hidden inside the repository implementation. The third one belongs in a use case that loads, mutates each, and saves.

The Right Shape in PHP

A repository for invoices:

<?php
declare(strict_types=1);

namespace App\Domain\Billing;

use DateTimeImmutable;

interface InvoiceRepository
{
    public function ofId(InvoiceId $id, TenantId $tenant): Invoice;

    public function save(Invoice $invoice): void;

    public function remove(Invoice $invoice): void;

    /** @return list<Invoice> */
    public function overdueFor(TenantId $tenant, DateTimeImmutable $asOf): array;

    public function nextIdentity(): InvoiceId;
}

Five methods. Each one means something.

nextIdentity() is the underrated one. Aggregates have identity from the moment of creation, not from the database. The repository hands out IDs, the domain constructs the entity with the ID already known, and the save is idempotent. This makes UUIDs natural and lets you write tests without faking auto-increment.

public function ofId(InvoiceId $id, TenantId $tenant): Invoice;
//                                  ^^^^^^^^^^^^^^^^
//                                  tenant always explicit

Notice tenant is a first-class parameter, not a hidden scope. Multi-tenancy is too important to leave to global state. If the use case asks for an invoice, it has to say which tenant. The compiler enforces it.

The Right Shape in Go

Same shape, different syntax:

package billing

import (
    "context"
    "time"
)

type InvoiceRepository interface {
    OfID(ctx context.Context, id InvoiceID, tenant TenantID) (*Invoice, error)
    Save(ctx context.Context, inv *Invoice) error
    Remove(ctx context.Context, inv *Invoice) error
    OverdueFor(ctx context.Context, tenant TenantID, asOf time.Time) ([]*Invoice, error)
    NextIdentity() InvoiceID
}

Differences:

context.Context first parameter, always. This is Go convention and gives you cancellation, deadlines, and tracing for free.
error return value, always. Even Save returns error. Even Remove.
NextIdentity does not need context — it is in-memory.

The implementation in PostgreSQL with pgx:

package postgres

import (
    "context"
    "errors"
    "time"

    "github.com/jackc/pgx/v5"
    "github.com/jackc/pgx/v5/pgxpool"
    "github.com/google/uuid"

    "example.com/billing/internal/billing"
)

type InvoiceRepository struct {
    pool *pgxpool.Pool
}

func NewInvoiceRepository(pool *pgxpool.Pool) *InvoiceRepository {
    return &InvoiceRepository{pool: pool}
}

func (r *InvoiceRepository) NextIdentity() billing.InvoiceID {
    return billing.InvoiceID(uuid.NewString())
}

func (r *InvoiceRepository) OfID(
    ctx context.Context, id billing.InvoiceID, tenant billing.TenantID,
) (*billing.Invoice, error) {
    var row invoiceRow
    err := r.pool.QueryRow(ctx, `
        SELECT id, tenant_id, customer_id, number, status,
               subtotal_minor, currency, issued_at, due_at, paid_at
        FROM invoices
        WHERE id = $1 AND tenant_id = $2
    `, string(id), string(tenant)).Scan(
        &row.ID, &row.TenantID, &row.CustomerID, &row.Number, &row.Status,
        &row.SubtotalMinor, &row.Currency, &row.IssuedAt, &row.DueAt, &row.PaidAt,
    )
    if errors.Is(err, pgx.ErrNoRows) {
        return nil, billing.ErrInvoiceNotFound
    }
    if err != nil {
        return nil, err
    }
    return rowToInvoice(row), nil
}

func (r *InvoiceRepository) Save(ctx context.Context, inv *billing.Invoice) error {
    row := invoiceToRow(inv)
    _, err := r.pool.Exec(ctx, `
        INSERT INTO invoices (id, tenant_id, customer_id, number, status,
                              subtotal_minor, currency, issued_at, due_at, paid_at)
        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
        ON CONFLICT (id) DO UPDATE SET
            status = EXCLUDED.status,
            subtotal_minor = EXCLUDED.subtotal_minor,
            paid_at = EXCLUDED.paid_at
    `, row.ID, row.TenantID, row.CustomerID, row.Number, row.Status,
       row.SubtotalMinor, row.Currency, row.IssuedAt, row.DueAt, row.PaidAt)
    return err
}

func (r *InvoiceRepository) OverdueFor(
    ctx context.Context, tenant billing.TenantID, asOf time.Time,
) ([]*billing.Invoice, error) {
    rows, err := r.pool.Query(ctx, `
        SELECT id, tenant_id, customer_id, number, status,
               subtotal_minor, currency, issued_at, due_at, paid_at
        FROM invoices
        WHERE tenant_id = $1 AND status = 'open' AND due_at < $2
        ORDER BY due_at ASC
    `, string(tenant), asOf)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var out []*billing.Invoice
    for rows.Next() {
        var row invoiceRow
        if err := rows.Scan(
            &row.ID, &row.TenantID, &row.CustomerID, &row.Number, &row.Status,
            &row.SubtotalMinor, &row.Currency, &row.IssuedAt, &row.DueAt, &row.PaidAt,
        ); err != nil {
            return nil, err
        }
        out = append(out, rowToInvoice(row))
    }
    return out, rows.Err()
}

A few production lessons:

Use UPSERT (INSERT ... ON CONFLICT) for Save, not separate INSERT and UPDATE paths. Simpler, atomic, and handles race conditions.
Always defer rows.Close(). I have seen connection-pool exhaustion from forgetting this. The Go stdlib will eventually GC it, but “eventually” is too late under load.
Scan into intermediate row structs, then map. Do not scan directly into domain types — that couples the domain to the column layout.

Don’t: Generic Repositories

The single most popular mistake. You see this in tutorials all the time:

// DON'T
interface Repository
{
    public function find(int $id);
    public function findAll(): array;
    public function save($entity): void;
    public function delete($entity): void;
}

class InvoiceRepository extends BaseRepository {}
class CustomerRepository extends BaseRepository {}

Or in Go:

// DON'T
type Repository[T any] interface {
    FindByID(ctx context.Context, id string) (T, error)
    FindAll(ctx context.Context) ([]T, error)
    Save(ctx context.Context, t T) error
    Delete(ctx context.Context, t T) error
}

Why this is wrong:

It assumes all entities have similar persistence needs. They do not. Invoice needs tenant scoping; Customer needs uniqueness on email; AuditLog is append-only.
It pushes complexity to the caller. FindAll is never the right answer. You always want a filter.
It hides domain language behind generic verbs. find vs ofId is the difference between framework-speak and ubiquitous language.
It actively encourages CRUD thinking. CRUD is what databases do; it is not what domains do.

The right shape is: each aggregate type gets a hand-crafted repository interface, named for the aggregate, with methods named for the actual use cases that drive the design. No inheritance, no generics, no shared base class. Five to seven methods, max.

The In-Memory Adapter for Tests

If your repository is honest about being a collection, you can implement it with an actual array. This is the single biggest testing win you get from this pattern:

<?php
declare(strict_types=1);

namespace Tests\Doubles;

use App\Domain\Billing\Invoice;
use App\Domain\Billing\InvoiceId;
use App\Domain\Billing\InvoiceRepository;
use App\Domain\Billing\TenantId;
use App\Domain\Billing\Exceptions\InvoiceNotFound;
use DateTimeImmutable;
use Symfony\Component\Uid\Uuid;

final class InMemoryInvoiceRepository implements InvoiceRepository
{
    /** @var array<string, Invoice> */
    private array $items = [];

    public function nextIdentity(): InvoiceId
    {
        return new InvoiceId(Uuid::v7()->toRfc4122());
    }

    public function ofId(InvoiceId $id, TenantId $tenant): Invoice
    {
        $invoice = $this->items[$id->toString()] ?? null;
        if ($invoice === null || !$invoice->customerId()->belongsTo($tenant)) {
            throw new InvoiceNotFound($id);
        }
        return $invoice;
    }

    public function save(Invoice $invoice): void
    {
        $this->items[$invoice->id()->toString()] = $invoice;
    }

    public function remove(Invoice $invoice): void
    {
        unset($this->items[$invoice->id()->toString()]);
    }

    public function overdueFor(TenantId $tenant, DateTimeImmutable $asOf): array
    {
        return array_values(array_filter(
            $this->items,
            fn (Invoice $i) =>
                $i->customerId()->belongsTo($tenant)
                && $i->isOpen()
                && $i->dueAt() < $asOf,
        ));
    }
}

Tests using this run in microseconds, do not need a database, and exercise the entire application layer including the handlers and the entities. They miss the SQL-specific behaviour, which is what the contract tests are for.

Contract Tests: Keep the Two in Sync

The risk of having two implementations is they drift. The fix is a shared test suite that runs against both:

abstract class InvoiceRepositoryContract extends TestCase
{
    abstract protected function repository(): InvoiceRepository;

    public function test_can_save_and_retrieve_invoice(): void
    {
        $repo = $this->repository();
        $invoice = InvoiceBuilder::open()->build();

        $repo->save($invoice);
        $retrieved = $repo->ofId($invoice->id(), $invoice->tenantId());

        $this->assertEquals($invoice->id(), $retrieved->id());
        $this->assertSame($invoice->status(), $retrieved->status());
    }

    public function test_overdue_for_returns_only_open_past_due_invoices(): void
    {
        // ...
    }
}

class InMemoryInvoiceRepositoryTest extends InvoiceRepositoryContract
{
    protected function repository(): InvoiceRepository
    {
        return new InMemoryInvoiceRepository();
    }
}

class EloquentInvoiceRepositoryTest extends InvoiceRepositoryContract
{
    use RefreshDatabase;

    protected function repository(): InvoiceRepository
    {
        return $this->app->make(EloquentInvoiceRepository::class);
    }
}

Now every test runs against both. If a behaviour passes in-memory but fails on Postgres, you know exactly which case the in-memory double got wrong. Update the double, re-run. The Symfony framework has done this for years with their interop tests; it works.

Common Pitfalls

The repository becomes a generic query bus. Already covered above. Split read and write.
Methods that return Builder or QueryBuilder. Defeats the entire abstraction. The caller now constructs queries; you have a thin facade over Eloquent.
saveOrFail, saveQuietly, saveWithoutEvents. All Eloquent-specific. The interface has save. The implementation decides what that means.
Inheritance for code reuse. class CustomerRepository extends BaseEloquentRepository ties every aggregate’s persistence to a base class’s lifecycle. Composition or duplication, not inheritance.
Repositories that call use cases. I have seen InvoiceRepository::markPaidAndSave($invoice). The repository should never invoke domain logic. That is the use case’s job. The repository persists; the entity decides.
Async save semantics. If save() returns before the write has hit disk, you have given up the collection mental model. There are valid CQRS scenarios where this is intentional, but they need to be explicit. Default to synchronous; opt into async per use case.
Cross-aggregate methods. InvoiceRepository::saveInvoiceAndCustomer($i, $c). No. Each repository owns one aggregate. Cross-aggregate consistency goes through the use case, with two repository calls inside a transaction.

The cross-aggregate one matters because it is where consistency boundaries leak. Each repository covers one aggregate. If two aggregates need to change together, the use case is responsible for coordinating. The repository never makes that call.

For the philosophical foundation, the PHP Manual on object cloning is relevant background — repositories must hand out fresh entity instances per ofId, never shared references, or you get the same identity-map problems Doctrine carefully solves.

Wrapping Up

The repository pattern is small and easy to overcomplicate. Three to five domain-named methods, one aggregate per repository, no inheritance, no generics. The in-memory implementation is your test asset; the production implementation is your boundary. Get those two right and the rest follows.

This is the last post in this October series on architecture. The next one I have planned digs into observability — tracing, structured logging, and how the architectural choices in these posts make those things easier or harder. Architecture is what you do at design time; observability is what you do when it breaks at 3am. They are the same conversation, scheduled six months apart.

The Mental Model: A Collection

The Right Shape in PHP

The Right Shape in Go

Don’t: Generic Repositories

The In-Memory Adapter for Tests

Contract Tests: Keep the Two in Sync

Common Pitfalls

Wrapping Up

Related posts

Hexagonal Architecture Explained for PHP and Go Developers

Dependency Injection in Laravel 10, Container Patterns That Earn Their Keep

Ports and Adapters in Go 1.21 with Wire and uber-go/fx

Clean Architecture in PHP Without the Cult, A Pragmatic Take

Upgrading to Laravel 10, A Real-World Checklist From a Production Codebase

Why I'm Trying Clean Architecture in 2022

From Monolith to First Go Microservice, A Pragmatic Cutover

Decoupling Laravel From Eloquent Without Fighting the Framework

Let’s Start a Project