⚡ AI Data Infrastructure · on Apache Pulsar

The Streaming Lakehouse that powers your AI

AI runs on fresh, unified data. StreamBricks stores it once — and serves real-time streams to your agents, point-in-time features to your models, and SQL to your analysts. One copy. No ETL. No duplicated lakehouse.

Built on Apache Pulsar & BookKeeper · Arrow · Vortex · DataFusion · founded by the creator of Apache Pulsar

single copy · columnar pages Producers Pub-Sub agents · realtime AI Features point-in-time SQL lakehouse

One copy of data

Streaming, analytics, and SQL over a single physical copy — no second lakehouse to sync.

Built for AI

Fresh context for agents, point-in-time features for models, all on live streaming data.

Zero ETL

Data is analytics-ready the moment it is written. No Kafka→S3→Iceberg→Spark plumbing.

On Apache Pulsar

Extends the streaming platform you trust — your existing Pulsar clients keep working.

The problem

AI is starving for fresh, unified data

Your agents and models need real-time and historical data. Today that means copying the same record across five systems — slow, expensive, and stale by the time AI sees it.

Today — copies on copies

Producer
Kafka / Pulsar
Object Store (S3 / HDFS)
Iceberg / Delta
Spark / Flink / Trino
Feature Store / Vector DB

5+ copies · ETL pipelines · high cost · stale features · two stacks to operate.

With StreamBricks — one copy

Producer
Pulsar
BookKeeper Streaming Lakehouse

Pub-Sub · AI Features · SQL · Temporal Joins

1 copy · no ETL · lower cost · fresh by default · one platform.

The platform

One backbone, every way AI needs your data

The same storage layer, exposed through the interfaces your AI, data, and platform teams already use.

Real-time for Agents

Stream live context to your agents and apps the instant it happens.

  • Full Apache Pulsar pub-sub APIs
  • Existing Java / Python / Go clients
  • Shared & key-shared subscriptions
  • Low-latency event delivery

Features for Models

Build ML features and training sets directly on streaming history.

  • Columnar scans with page pruning
  • Arrow · Vortex · DataFusion engine
  • Point-in-time correct features
  • No copy into a separate store

SQL & Temporal Joins

Query your streams as tables — including native AS OF joins.

  • SQL via Arrow Flight, JDBC, REST
  • SELECT · JOIN · GROUP BY · WINDOW
  • Point-in-time (temporal) joins
  • Sub-second historical lookups
1
physical copy of your data
0
ETL pipelines to maintain
3
APIs — stream, lakehouse, SQL
history for point-in-time AI
Why it is unique

The only single-copy streaming lakehouse

No system today lets one physical copy of streaming data act as a lakehouse and a temporal SQL store at the same time.

vs Kafka

Native analytics and temporal joins — instead of exporting to Iceberg and bolting on Spark.

vs Iceberg / Delta

Native streaming and pub-sub — not a batch table you still have to feed.

vs Databricks

A true single-copy architecture — no streaming-to-warehouse ETL.

vs Pinot / Druid

Full pub-sub messaging — not an analytics-only sink.

Capability StreamBricks Kafka + Iceberg+ Spark Apache Fluss Apache Hudi
Pub-sub messaging APINative PulsarKafka
Single physical copymultiple copiesmultiple views
No ETL between stream & lake
Columnar storageVortexParquetlog / tableParquet
Point-in-time query (AS OF)time travel
Native point-in-time joinson roadmapexternal enginelimitedexternal engine
Native SQL engineDataFusion, roadmapSpark / Trinouses FlinkSpark / Trino
Message replay & consumer groups
Data lake / analytical scans

● available   ◐ on roadmap   — not supported.
Read the full comparison — Fluss, Hudi, RisingWave & Databricks →

For AI teams

Infrastructure for AI-native companies

Wherever you keep a streaming copy and an analytics copy of the same data in sync, StreamBricks replaces both.

ML Feature Stores

Point-in-time-correct features without leakage — straight from streaming history.

Agent Memory & RAG

Fresh, queryable context for agents — real-time and historical, in one place.

Real-time Personalization

Score and serve on live events with full history behind every decision.

Fraud & Risk

Temporal joins answer "what was true at the moment of the transaction".

Observability & Events

Stream, store, and query high-volume event data without a separate warehouse.

Customer 360

Unify every signal about a customer into one fresh, joinable timeline.

Store it once. Power all of your AI.

Want to pilot the Streaming Lakehouse on your workload? Tell us about your use case.

Email us directly