AI runs on fresh, unified data. StreamBricks stores it once — and serves real-time streams to your agents, point-in-time features to your models, and SQL to your analysts. One copy. No ETL. No duplicated lakehouse.
Built on Apache Pulsar & BookKeeper · Arrow · Vortex · DataFusion · founded by the creator of Apache Pulsar
Streaming, analytics, and SQL over a single physical copy — no second lakehouse to sync.
Fresh context for agents, point-in-time features for models, all on live streaming data.
Data is analytics-ready the moment it is written. No Kafka→S3→Iceberg→Spark plumbing.
Extends the streaming platform you trust — your existing Pulsar clients keep working.
Your agents and models need real-time and historical data. Today that means copying the same record across five systems — slow, expensive, and stale by the time AI sees it.
Producer
→ Kafka / Pulsar
→ Object Store (S3 / HDFS)
→ Iceberg / Delta
→ Spark / Flink / Trino
→ Feature Store / Vector DB
5+ copies · ETL pipelines · high cost · stale features · two stacks to operate.
Producer
→ Pulsar
→ BookKeeper Streaming Lakehouse
↳ Pub-Sub · AI Features · SQL · Temporal Joins
1 copy · no ETL · lower cost · fresh by default · one platform.
The same storage layer, exposed through the interfaces your AI, data, and platform teams already use.
Stream live context to your agents and apps the instant it happens.
Build ML features and training sets directly on streaming history.
Query your streams as tables — including native AS OF joins.
No system today lets one physical copy of streaming data act as a lakehouse and a temporal SQL store at the same time.
Native analytics and temporal joins — instead of exporting to Iceberg and bolting on Spark.
Native streaming and pub-sub — not a batch table you still have to feed.
A true single-copy architecture — no streaming-to-warehouse ETL.
Full pub-sub messaging — not an analytics-only sink.
| Capability | StreamBricks | Kafka + Iceberg+ Spark | Apache Fluss | Apache Hudi |
|---|---|---|---|---|
| Pub-sub messaging API | ●Native Pulsar | ●Kafka | — | — |
| Single physical copy | ● | —multiple copies | ● | ◐multiple views |
| No ETL between stream & lake | ● | — | ● | — |
| Columnar storage | ●Vortex | ●Parquet | ◐log / table | ●Parquet |
| Point-in-time query (AS OF) | ● | ●time travel | ● | ● |
| Native point-in-time joins | ◐on roadmap | —external engine | —limited | —external engine |
| Native SQL engine | ◐DataFusion, roadmap | —Spark / Trino | —uses Flink | —Spark / Trino |
| Message replay & consumer groups | ● | ● | — | — |
| Data lake / analytical scans | ● | ● | ● | ● |
● available ◐ on roadmap — not supported.
Read the full comparison — Fluss, Hudi, RisingWave & Databricks →
Wherever you keep a streaming copy and an analytics copy of the same data in sync, StreamBricks replaces both.
Point-in-time-correct features without leakage — straight from streaming history.
Fresh, queryable context for agents — real-time and historical, in one place.
Score and serve on live events with full history behind every decision.
Temporal joins answer "what was true at the moment of the transaction".
Stream, store, and query high-volume event data without a separate warehouse.
Unify every signal about a customer into one fresh, joinable timeline.
Want to pilot the Streaming Lakehouse on your workload? Tell us about your use case.