StreamBricks — The Streaming Lakehouse for AI Infrastructure

One copy of data

Streaming, analytics, and SQL over a single physical copy — no second lakehouse to sync.

Built for AI

Fresh context for agents, point-in-time features for models, all on live streaming data.

Zero ETL

Data is analytics-ready the moment it is written. No Kafka→S3→Iceberg→Spark plumbing.

On Apache Pulsar

Extends the streaming platform you trust — your existing Pulsar clients keep working.

The problem

AI is starving for fresh, unified data

Your agents and models need real-time and historical data. Today that means copying the same record across five systems — slow, expensive, and stale by the time AI sees it.

Today — copies on copies

Producer
→ Kafka / Pulsar
→ Object Store (S3 / HDFS)
→ Iceberg / Delta
→ Spark / Flink / Trino
→ Feature Store / Vector DB

5+ copies · ETL pipelines · high cost · stale features · two stacks to operate.

With StreamBricks — one copy

Producer
→ Pulsar
→ BookKeeper Streaming Lakehouse

↳ Pub-Sub · AI Features · SQL · Temporal Joins

1 copy · no ETL · lower cost · fresh by default · one platform.

The platform

One backbone, every way AI needs your data

The same storage layer, exposed through the interfaces your AI, data, and platform teams already use.

Real-time for Agents

Stream live context to your agents and apps the instant it happens.

Full Apache Pulsar pub-sub APIs
Existing Java / Python / Go clients
Shared & key-shared subscriptions
Low-latency event delivery

Features for Models

Build ML features and training sets directly on streaming history.

Columnar scans with page pruning
Arrow · Columnar Encoder · DataFusion engine
Point-in-time correct features
No copy into a separate store

SQL & Temporal Joins

Query your streams as tables — including native AS OF joins.

SQL via Arrow Flight, JDBC, REST
SELECT · JOIN · GROUP BY · WINDOW
Point-in-time (temporal) joins
Sub-second historical lookups

1

physical copy of your data

0

ETL pipelines to maintain

3

APIs — stream, lakehouse, SQL

∞

history for point-in-time AI

Why it is unique

The only single-copy streaming lakehouse

No system today lets one physical copy of streaming data act as a lakehouse and a temporal SQL store at the same time.

vs Kafka

Native analytics and temporal joins — instead of exporting to Iceberg and bolting on Spark.

vs Iceberg / Delta

Native streaming and pub-sub — not a batch table you still have to feed.

vs Databricks

A true single-copy architecture — no streaming-to-warehouse ETL.

vs Pinot / Druid

Full pub-sub messaging — not an analytics-only sink.

Capability	StreamBricks	Kafka + Iceberg+ Spark	Apache Fluss	Apache Hudi
Pub-sub messaging API	●Native Pulsar	●Kafka	—	—
Single physical copy	●	—multiple copies	●	◐multiple views
No ETL between stream & lake	●	—	●	—
Columnar storage	●Columnar encoder	●Parquet	◐log / table	●Parquet
Point-in-time query (AS OF)	●	●time travel	●	●
Native point-in-time joins	◐on roadmap	—external engine	—limited	—external engine
Native SQL engine	◐DataFusion, roadmap	—Spark / Trino	—uses Flink	—Spark / Trino
Message replay & consumer groups	●	●	—	—
Data lake / analytical scans	●	●	●	●

● available ◐ on roadmap — not supported.
Read the full comparison — Fluss, Hudi, RisingWave & Databricks →

For AI teams

Infrastructure for AI-native companies

Wherever you keep a streaming copy and an analytics copy of the same data in sync, StreamBricks replaces both.

ML Feature Stores

Point-in-time-correct features without leakage — straight from streaming history.

Agent Memory & RAG

Fresh, queryable context for agents — real-time and historical, in one place.

Real-time Personalization

Score and serve on live events with full history behind every decision.

Fraud & Risk

Temporal joins answer "what was true at the moment of the transaction".

Observability & Events

Stream, store, and query high-volume event data without a separate warehouse.

Customer 360

Unify every signal about a customer into one fresh, joinable timeline.

Store it once. Power all of your AI.

Want to pilot the Streaming Lakehouse on your workload? Tell us about your use case.