Repository Pattern Done Right in PHP and Go, A Decade of Mistakes Distilled
TL;DR — A repository is a collection of aggregates; treat it like one / Methods name intentions, not queries / Generic
Repository<T>is almost always a code smell
The repository pattern is one of those ideas where everyone agrees on the name and nobody agrees on the shape. Half the Laravel tutorials online use “repository” to mean “a class that wraps an Eloquent model with one method per query,” which is closer to a query object than a repository. Half the Go tutorials use it to mean “an interface that exposes CRUD methods,” which is closer to a generic DAO.
The original Evans definition, from Domain-Driven Design, is narrower and more useful: a repository is an in-memory collection of aggregates. You ask it for things, you put things into it, and the persistence mechanism is invisible. That last clause is what most implementations miss.
After decoupling Eloquent the repository is your main interface boundary. This post is about the shape that interface should have. Same problem in PHP and Go; mostly the same answer.
The Mental Model: A Collection
A repository is a collection. Forget databases for a moment. If I had every invoice in memory, what would I do with the collection?
- Get one by identity:
$invoices->ofId($id) - Add a new one or update an existing one:
$invoices->save($invoice) - Filter by domain-meaningful criteria:
$invoices->overdueFor($customerId, $asOf) - Maybe remove one:
$invoices->remove($invoice)
That is the shape. Three to maybe seven methods, all named in domain language, each one corresponding to something the application actually needs to do.
What I would NOT do with an in-memory collection:
$invoices->findByCustomerIdAndStatusAndCreatedAtBetween(...)— that is a query, not a collection operation.$invoices->paginate(15)->where('status', 'open')->orderBy('due_at')— that is a query builder.$invoices->update(['status' => 'overdue'], ['due_at' => '<', $now])— that is a SQL UPDATE.
The first one belongs on a query service (the read side). The second one belongs in infrastructure, hidden inside the repository implementation. The third one belongs in a use case that loads, mutates each, and saves.
The Right Shape in PHP
A repository for invoices:
<?php
declare(strict_types=1);
namespace App\Domain\Billing;
use DateTimeImmutable;
interface InvoiceRepository
{
public function ofId(InvoiceId $id, TenantId $tenant): Invoice;
public function save(Invoice $invoice): void;
public function remove(Invoice $invoice): void;
/** @return list<Invoice> */
public function overdueFor(TenantId $tenant, DateTimeImmutable $asOf): array;
public function nextIdentity(): InvoiceId;
}
Five methods. Each one means something.
nextIdentity() is the underrated one. Aggregates have identity from the moment of creation, not from the database. The repository hands out IDs, the domain constructs the entity with the ID already known, and the save is idempotent. This makes UUIDs natural and lets you write tests without faking auto-increment.
public function ofId(InvoiceId $id, TenantId $tenant): Invoice;
// ^^^^^^^^^^^^^^^^
// tenant always explicit
Notice tenant is a first-class parameter, not a hidden scope. Multi-tenancy is too important to leave to global state. If the use case asks for an invoice, it has to say which tenant. The compiler enforces it.
The Right Shape in Go
Same shape, different syntax:
package billing
import (
"context"
"time"
)
type InvoiceRepository interface {
OfID(ctx context.Context, id InvoiceID, tenant TenantID) (*Invoice, error)
Save(ctx context.Context, inv *Invoice) error
Remove(ctx context.Context, inv *Invoice) error
OverdueFor(ctx context.Context, tenant TenantID, asOf time.Time) ([]*Invoice, error)
NextIdentity() InvoiceID
}
Differences:
context.Contextfirst parameter, always. This is Go convention and gives you cancellation, deadlines, and tracing for free.errorreturn value, always. EvenSavereturnserror. EvenRemove.NextIdentitydoes not need context — it is in-memory.
The implementation in PostgreSQL with pgx:
package postgres
import (
"context"
"errors"
"time"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
"github.com/google/uuid"
"example.com/billing/internal/billing"
)
type InvoiceRepository struct {
pool *pgxpool.Pool
}
func NewInvoiceRepository(pool *pgxpool.Pool) *InvoiceRepository {
return &InvoiceRepository{pool: pool}
}
func (r *InvoiceRepository) NextIdentity() billing.InvoiceID {
return billing.InvoiceID(uuid.NewString())
}
func (r *InvoiceRepository) OfID(
ctx context.Context, id billing.InvoiceID, tenant billing.TenantID,
) (*billing.Invoice, error) {
var row invoiceRow
err := r.pool.QueryRow(ctx, `
SELECT id, tenant_id, customer_id, number, status,
subtotal_minor, currency, issued_at, due_at, paid_at
FROM invoices
WHERE id = $1 AND tenant_id = $2
`, string(id), string(tenant)).Scan(
&row.ID, &row.TenantID, &row.CustomerID, &row.Number, &row.Status,
&row.SubtotalMinor, &row.Currency, &row.IssuedAt, &row.DueAt, &row.PaidAt,
)
if errors.Is(err, pgx.ErrNoRows) {
return nil, billing.ErrInvoiceNotFound
}
if err != nil {
return nil, err
}
return rowToInvoice(row), nil
}
func (r *InvoiceRepository) Save(ctx context.Context, inv *billing.Invoice) error {
row := invoiceToRow(inv)
_, err := r.pool.Exec(ctx, `
INSERT INTO invoices (id, tenant_id, customer_id, number, status,
subtotal_minor, currency, issued_at, due_at, paid_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
ON CONFLICT (id) DO UPDATE SET
status = EXCLUDED.status,
subtotal_minor = EXCLUDED.subtotal_minor,
paid_at = EXCLUDED.paid_at
`, row.ID, row.TenantID, row.CustomerID, row.Number, row.Status,
row.SubtotalMinor, row.Currency, row.IssuedAt, row.DueAt, row.PaidAt)
return err
}
func (r *InvoiceRepository) OverdueFor(
ctx context.Context, tenant billing.TenantID, asOf time.Time,
) ([]*billing.Invoice, error) {
rows, err := r.pool.Query(ctx, `
SELECT id, tenant_id, customer_id, number, status,
subtotal_minor, currency, issued_at, due_at, paid_at
FROM invoices
WHERE tenant_id = $1 AND status = 'open' AND due_at < $2
ORDER BY due_at ASC
`, string(tenant), asOf)
if err != nil {
return nil, err
}
defer rows.Close()
var out []*billing.Invoice
for rows.Next() {
var row invoiceRow
if err := rows.Scan(
&row.ID, &row.TenantID, &row.CustomerID, &row.Number, &row.Status,
&row.SubtotalMinor, &row.Currency, &row.IssuedAt, &row.DueAt, &row.PaidAt,
); err != nil {
return nil, err
}
out = append(out, rowToInvoice(row))
}
return out, rows.Err()
}
A few production lessons:
- Use UPSERT (
INSERT ... ON CONFLICT) forSave, not separate INSERT and UPDATE paths. Simpler, atomic, and handles race conditions. - Always
defer rows.Close(). I have seen connection-pool exhaustion from forgetting this. The Go stdlib will eventually GC it, but “eventually” is too late under load. - Scan into intermediate row structs, then map. Do not scan directly into domain types — that couples the domain to the column layout.
Don’t: Generic Repositories
The single most popular mistake. You see this in tutorials all the time:
// DON'T
interface Repository
{
public function find(int $id);
public function findAll(): array;
public function save($entity): void;
public function delete($entity): void;
}
class InvoiceRepository extends BaseRepository {}
class CustomerRepository extends BaseRepository {}
Or in Go:
// DON'T
type Repository[T any] interface {
FindByID(ctx context.Context, id string) (T, error)
FindAll(ctx context.Context) ([]T, error)
Save(ctx context.Context, t T) error
Delete(ctx context.Context, t T) error
}
Why this is wrong:
- It assumes all entities have similar persistence needs. They do not. Invoice needs tenant scoping; Customer needs uniqueness on email; AuditLog is append-only.
- It pushes complexity to the caller.
FindAllis never the right answer. You always want a filter. - It hides domain language behind generic verbs.
findvsofIdis the difference between framework-speak and ubiquitous language. - It actively encourages CRUD thinking. CRUD is what databases do; it is not what domains do.
The right shape is: each aggregate type gets a hand-crafted repository interface, named for the aggregate, with methods named for the actual use cases that drive the design. No inheritance, no generics, no shared base class. Five to seven methods, max.
The In-Memory Adapter for Tests
If your repository is honest about being a collection, you can implement it with an actual array. This is the single biggest testing win you get from this pattern:
<?php
declare(strict_types=1);
namespace Tests\Doubles;
use App\Domain\Billing\Invoice;
use App\Domain\Billing\InvoiceId;
use App\Domain\Billing\InvoiceRepository;
use App\Domain\Billing\TenantId;
use App\Domain\Billing\Exceptions\InvoiceNotFound;
use DateTimeImmutable;
use Symfony\Component\Uid\Uuid;
final class InMemoryInvoiceRepository implements InvoiceRepository
{
/** @var array<string, Invoice> */
private array $items = [];
public function nextIdentity(): InvoiceId
{
return new InvoiceId(Uuid::v7()->toRfc4122());
}
public function ofId(InvoiceId $id, TenantId $tenant): Invoice
{
$invoice = $this->items[$id->toString()] ?? null;
if ($invoice === null || !$invoice->customerId()->belongsTo($tenant)) {
throw new InvoiceNotFound($id);
}
return $invoice;
}
public function save(Invoice $invoice): void
{
$this->items[$invoice->id()->toString()] = $invoice;
}
public function remove(Invoice $invoice): void
{
unset($this->items[$invoice->id()->toString()]);
}
public function overdueFor(TenantId $tenant, DateTimeImmutable $asOf): array
{
return array_values(array_filter(
$this->items,
fn (Invoice $i) =>
$i->customerId()->belongsTo($tenant)
&& $i->isOpen()
&& $i->dueAt() < $asOf,
));
}
}
Tests using this run in microseconds, do not need a database, and exercise the entire application layer including the handlers and the entities. They miss the SQL-specific behaviour, which is what the contract tests are for.
Contract Tests: Keep the Two in Sync
The risk of having two implementations is they drift. The fix is a shared test suite that runs against both:
abstract class InvoiceRepositoryContract extends TestCase
{
abstract protected function repository(): InvoiceRepository;
public function test_can_save_and_retrieve_invoice(): void
{
$repo = $this->repository();
$invoice = InvoiceBuilder::open()->build();
$repo->save($invoice);
$retrieved = $repo->ofId($invoice->id(), $invoice->tenantId());
$this->assertEquals($invoice->id(), $retrieved->id());
$this->assertSame($invoice->status(), $retrieved->status());
}
public function test_overdue_for_returns_only_open_past_due_invoices(): void
{
// ...
}
}
class InMemoryInvoiceRepositoryTest extends InvoiceRepositoryContract
{
protected function repository(): InvoiceRepository
{
return new InMemoryInvoiceRepository();
}
}
class EloquentInvoiceRepositoryTest extends InvoiceRepositoryContract
{
use RefreshDatabase;
protected function repository(): InvoiceRepository
{
return $this->app->make(EloquentInvoiceRepository::class);
}
}
Now every test runs against both. If a behaviour passes in-memory but fails on Postgres, you know exactly which case the in-memory double got wrong. Update the double, re-run. The Symfony framework has done this for years with their interop tests; it works.
Common Pitfalls
- The repository becomes a generic query bus. Already covered above. Split read and write.
- Methods that return
BuilderorQueryBuilder. Defeats the entire abstraction. The caller now constructs queries; you have a thin facade over Eloquent. saveOrFail,saveQuietly,saveWithoutEvents. All Eloquent-specific. The interface hassave. The implementation decides what that means.- Inheritance for code reuse.
class CustomerRepository extends BaseEloquentRepositoryties every aggregate’s persistence to a base class’s lifecycle. Composition or duplication, not inheritance. - Repositories that call use cases. I have seen
InvoiceRepository::markPaidAndSave($invoice). The repository should never invoke domain logic. That is the use case’s job. The repository persists; the entity decides. - Async save semantics. If
save()returns before the write has hit disk, you have given up the collection mental model. There are valid CQRS scenarios where this is intentional, but they need to be explicit. Default to synchronous; opt into async per use case. - Cross-aggregate methods.
InvoiceRepository::saveInvoiceAndCustomer($i, $c). No. Each repository owns one aggregate. Cross-aggregate consistency goes through the use case, with two repository calls inside a transaction.
The cross-aggregate one matters because it is where consistency boundaries leak. Each repository covers one aggregate. If two aggregates need to change together, the use case is responsible for coordinating. The repository never makes that call.
For the philosophical foundation, the PHP Manual on object cloning is relevant background — repositories must hand out fresh entity instances per ofId, never shared references, or you get the same identity-map problems Doctrine carefully solves.
Wrapping Up
The repository pattern is small and easy to overcomplicate. Three to five domain-named methods, one aggregate per repository, no inheritance, no generics. The in-memory implementation is your test asset; the production implementation is your boundary. Get those two right and the rest follows.
This is the last post in this October series on architecture. The next one I have planned digs into observability — tracing, structured logging, and how the architectural choices in these posts make those things easier or harder. Architecture is what you do at design time; observability is what you do when it breaks at 3am. They are the same conversation, scheduled six months apart.