Time Series at Scale with TimescaleDB Hypertables
TL;DR — TimescaleDB 2.18 hypertables handle billion-row time-series workloads cleanly when you size chunks correctly, enable compression on cold data, and use continuous aggregates for dashboards. The defaults are reasonable; the wrong chunk size will hurt you.
Time-series workloads are the single most common reason teams reach for a specialized database. Sensor data, financial ticks, application metrics, IoT telemetry. The volumes are huge, the queries are predictable, and the access patterns are heavily skewed toward recent data. TimescaleDB has been the right answer for a lot of these workloads for years now, and 2.18 (the early-2025 release that’s stabilized) keeps improving on the model.
I’ve been running TimescaleDB for mqtt ingest, financial tick data, and observability backends. The patterns are remarkably consistent across all of them. Most of what goes wrong is sizing-related, not protocol-related. Get the chunk size right, get compression policies right, and the rest is just SQL.
This guide assumes you know what Postgres is. If you’re coming from InfluxDB or Prometheus, I covered the comparison in TimescaleDB vs InfluxDB for sensor storage. Here we’re going deep on operating TimescaleDB at scale.
1. The Hypertable Model
A hypertable looks like a regular Postgres table from SQL. Under the hood, TimescaleDB transparently splits it into chunks, each chunk being a regular Postgres table covering a specific time range. Queries that filter by time only touch the relevant chunks.
+-------------------------------------------------+
| Hypertable: sensor_readings |
| |
| +---------+ +---------+ +---------+ +-------+ |
| | Chunk 1 | | Chunk 2 | | Chunk 3 | | ... | |
| | Jun 1-7 | | Jun 8-14| | Jun 15-21| | |
| +---------+ +---------+ +---------+ +-------+ |
| | | | |
| (heap) (heap) (heap) |
| | | | |
| compressed compressed uncompressed |
+-------------------------------------------------+
Each chunk has its own indexes, its own compression state, and its own statistics. Operations like DROP are nearly instant because they drop a chunk, not delete rows. Compression is per-chunk. Replication, partitioning, and retention all happen at the chunk level.
2. Creating Hypertables Properly
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE TABLE sensor_readings (
time timestamptz NOT NULL,
sensor_id int NOT NULL,
temperature double precision,
humidity double precision,
pressure double precision
);
SELECT create_hypertable('sensor_readings', 'time',
chunk_time_interval => INTERVAL '1 day');
Sizing chunk_time_interval
This is the most important decision. The right chunk size depends on your data rate:
- A chunk should hold roughly 1-25% of the active working set in
shared_buffers. - Each chunk should be at least 1 GB for compression to be effective.
- Each chunk should be at most a few tens of GB for query planning to be fast.
A common starting point: 1 day for high-volume data (millions of rows per day), 7 days for moderate volume, 1 month for low volume.
You can change chunk_time_interval later; it only affects new chunks. But if your existing chunks are wrong-sized, you can’t easily resize them without reinserting data. The pragmatic approach when you realize you got it wrong is to set the new interval, let new chunks be the right size, and accept that old chunks will be awkward until retention drops them naturally.
-- Change for future chunks
SELECT set_chunk_time_interval('sensor_readings', INTERVAL '6 hours');
You can verify chunk sizes with:
SELECT chunk_name,
pg_size_pretty(pg_total_relation_size(format('%I.%I', chunk_schema, chunk_name)::regclass)) AS size,
range_start, range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings'
ORDER BY range_start DESC
LIMIT 10;
Look for the recent chunks (uncompressed) to be at least a few GB. If they’re hundreds of MB, your interval is too small.
Space partitioning (multi-dimensional)
For workloads with many distinct entities, you can also partition by an entity column:
SELECT create_hypertable('sensor_readings', 'time',
chunk_time_interval => INTERVAL '1 day');
SELECT add_dimension('sensor_readings', 'sensor_id',
number_partitions => 16);
Now each day’s chunk is split into 16 sub-chunks by sensor_id hash. Queries that filter by sensor_id only touch one sub-chunk. This is useful when you have hundreds of thousands of sensors and queries usually target one.
In 2.18 the planner is smarter about pruning sub-chunks. You’ll see this with EXPLAIN ANALYZE:
EXPLAIN ANALYZE
SELECT avg(temperature)
FROM sensor_readings
WHERE sensor_id = 42
AND time > now() - INTERVAL '1 day';
You want to see only one or two chunks scanned, not all of them.
3. Compression For Cold Data
This is where TimescaleDB earns its keep. Compressed chunks use 90% less storage and (counterintuitively) often query faster because IO is reduced.
ALTER TABLE sensor_readings SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time DESC'
);
-- Compress chunks older than 7 days
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');
Understanding compress_segmentby and compress_orderby
This is what people get wrong. The settings determine how rows are grouped and ordered inside compressed chunks, which determines compression ratio and query performance.
compress_segmentby: columns that are equality filters in your queries. Rows with the same segmentby value are grouped together.compress_orderby: columns that determine order within a segment. Usually the time column.
If you query like WHERE sensor_id = 42 AND time > X, segment by sensor_id. The planner can skip whole segments that don’t match the equality predicate. Compression ratio also improves because similar values are grouped together.
Compression in 2.18
TimescaleDB 2.18 introduced more flexible compression settings, including the ability to add columns to compressed chunks without decompressing them. This is a huge operational improvement. Previously, schema changes on compressed hypertables required decompressing all chunks first.
-- This now works without decompressing
ALTER TABLE sensor_readings ADD COLUMN battery_pct int;
Check compression status:
SELECT chunk_name,
compression_status,
before_compression_total_bytes,
after_compression_total_bytes,
round(100.0 * after_compression_total_bytes::numeric /
nullif(before_compression_total_bytes, 0), 2) AS pct
FROM chunk_compression_stats('sensor_readings')
ORDER BY chunk_name;
A healthy compression ratio is 5-20%. If you’re seeing 50%+, your segmentby is probably wrong or your data is unusually random.
4. Continuous Aggregates
For dashboards, you don’t want to re-aggregate raw data every time a user opens a chart. Continuous aggregates precompute and incrementally refresh.
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS bucket,
sensor_id,
avg(temperature) AS avg_temp,
min(temperature) AS min_temp,
max(temperature) AS max_temp,
count(*) AS reading_count
FROM sensor_readings
GROUP BY bucket, sensor_id;
SELECT add_continuous_aggregate_policy('sensor_hourly',
start_offset => INTERVAL '7 days',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '15 minutes');
The aggregate is incrementally maintained. Every 15 minutes, TimescaleDB looks at what’s changed in the underlying hypertable and updates the aggregate for affected time buckets.
Hierarchical continuous aggregates
You can build aggregates on aggregates:
CREATE MATERIALIZED VIEW sensor_daily
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 day', bucket) AS day,
sensor_id,
avg(avg_temp) AS avg_temp,
min(min_temp) AS min_temp,
max(max_temp) AS max_temp
FROM sensor_hourly
GROUP BY day, sensor_id;
A daily aggregate computed from the hourly one is much faster than one computed from raw data. Dashboards over months or years should query the daily aggregate.
Querying continuous aggregates with real-time data
By default, queries against the aggregate include real-time data computed on-the-fly for the uncovered time range. This is great for dashboards that want both fast historical lookups and fresh data:
SELECT bucket, avg_temp
FROM sensor_hourly
WHERE sensor_id = 42
AND bucket > now() - INTERVAL '24 hours'
ORDER BY bucket;
This query gets precomputed data for the buckets that are materialized and computes the latest hour on-the-fly from raw data. Disable this if you want pure materialized behavior:
ALTER MATERIALIZED VIEW sensor_hourly
SET (timescaledb.materialized_only = true);
5. Retention And Tiering
Time-series data is mostly cold. You want to drop or move old data automatically.
Retention policy
SELECT add_retention_policy('sensor_readings', INTERVAL '2 years');
After 2 years, chunks are dropped entirely. This is fast (drop table operation) regardless of size.
Tiering to object storage (Timescale Cloud)
TimescaleDB Cloud supports tiering chunks to S3 automatically. Self-hosted doesn’t have this built-in, but you can replicate the pattern manually:
- Compress chunks at 7 days.
- Export chunks to Parquet at 90 days, drop them.
- Query historical data through a federated tool or
pg_parquetwhen needed.
Drop chunks manually
You can also drop chunks by time range:
SELECT drop_chunks('sensor_readings', older_than => INTERVAL '1 year');
This returns the names of dropped chunks. Useful for one-off cleanup.
6. Performance Patterns
Use time_bucket aggressively
TimescaleDB’s query optimizer knows about time_bucket(). Use it for any time-based grouping:
-- Good: planner can prune chunks and use indexes
SELECT time_bucket('5 minutes', time) AS bucket,
avg(temperature)
FROM sensor_readings
WHERE time > now() - INTERVAL '1 day'
GROUP BY bucket
ORDER BY bucket;
-- Worse: regular date_trunc, less optimization
SELECT date_trunc('minute', time) AS minute,
avg(temperature)
FROM sensor_readings
WHERE time > now() - INTERVAL '1 day'
GROUP BY minute;
Use the right index
Hypertables get an automatic index on the time column. For other access patterns:
CREATE INDEX ON sensor_readings (sensor_id, time DESC);
This is a compound index with time DESC because most queries want recent data. The DESC sort matches the typical scan direction.
Last-point queries
A common pattern is “give me the latest reading per sensor.” Naive SQL does this with a window function:
SELECT DISTINCT ON (sensor_id) sensor_id, time, temperature
FROM sensor_readings
WHERE time > now() - INTERVAL '1 hour'
ORDER BY sensor_id, time DESC;
DISTINCT ON with the right index is fast. TimescaleDB 2.18 also has the last() aggregate which makes some patterns cleaner:
SELECT sensor_id, last(temperature, time) AS latest_temp
FROM sensor_readings
WHERE time > now() - INTERVAL '1 hour'
GROUP BY sensor_id;
Common Pitfalls
1. Chunks too small
If chunk_time_interval is too small, you end up with thousands of chunks. Planning each query takes longer than the query itself. Aim for chunks measured in GB, not MB.
2. Chunks too large
If chunks are too large, compression takes forever, vacuum is painful, and queries that span chunks read more data than necessary. Keep chunks under 100 GB uncompressed.
3. Compressing too aggressively
Compressing chunks under 7 days old (typical hot data window) hurts. Compressed chunks can’t be inserted into cheaply. If your “now” data is in a compressed chunk, every write triggers expensive decompression. Always set compression policy to cover only data older than your typical write window.
4. Forgetting to ANALYZE compressed chunks
After compression, statistics need updating. TimescaleDB 2.18 does this automatically for new chunks but older versions don’t. Schedule periodic ANALYZE on the hypertable to keep query plans good.
Troubleshooting
Query is slow on recent data
Probably hitting an uncompressed chunk without the right index. Check:
EXPLAIN ANALYZE SELECT ... FROM sensor_readings WHERE ...;
You want “Index Scan” not “Seq Scan” on the chunks being touched. If it’s seq scanning, add an index that matches your filter.
Compression policy isn’t running
SELECT * FROM timescaledb_information.job_stats
WHERE job_id IN (
SELECT job_id FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression'
);
If next_start is in the past and total_runs isn’t incrementing, the background worker isn’t running. Check timescaledb.bgw_log_level = info to see logs.
Continuous aggregate has stale data
Refresh manually:
CALL refresh_continuous_aggregate('sensor_hourly',
now() - INTERVAL '2 hours', now());
Then check that the refresh policy is correctly scheduled. The most common cause is end_offset being too aggressive, leaving data unmaterialized.
The PostgreSQL extensions documentation covers how TimescaleDB’s underlying extension mechanism works if you want to understand the integration deeper.
Wrapping Up
TimescaleDB hypertables are the right answer for most time-series workloads on Postgres. They scale comfortably to billions of rows, compress brilliantly when configured right, and integrate cleanly with the rest of your Postgres ecosystem. The work is in the setup: chunk sizing, compression configuration, and aggregate design. Once those are right, operations are smooth.
What’s next is observability. The pg_stat_io deep dive covers how to monitor IO on these big hypertables specifically.