Introducing Materialized Views on Data Streams with the Eventador Platform 2.0

March 18, 2020 in Continuous SQL



Introducing Materialized Views on Data Streams with the Eventador Platform 2.0

With the release of the Eventador Platform v2.0, we are introducing a significant new capability – the creation of materialized views against streams of data. This new capability massively enhances our existing Continuous SQL engine and closes the gap between streaming platforms like Apache Kafka and the developers and data scientists building applications. Now, entire end-to-end data pipelines can be created and the results of Continuous SQL are fully materialized and presented as queryable RESTful endpoints.

Eventador Platform 2.0 replaces the need to create/maintain databases, write complicated consumer code and use rickety half-baked connectors. It also replaces the need to build services that provide for the maintenance, indexing, and storage management of views. It’s all done automatically in the background. You can now directly use streaming data from applications like maps, charts, graphs, notebooks, ML models and in your own programs. It’s designed to be high performance, scalable, and reliable.

Similar to how traditional databases have worked for decades, materialized views are continuously updated and always represent the latest state by a given key. When you create a view you define the creation query in SQL, define a primary key, one or more RESTful endpoints and some basic rules about cleanup, null handling, etc. The resulting view is a picture of events at a given point in time – something typically missing or expensive to create on your own. You can query the view by any key, return one or more results, utilize operators, and pass in parameters with the query – all important features for modern developers.

Why create materialized views on streams?

Your streaming architecture might look something like this:

  • Data is produced to Kafka.
  • Custom consumer/aggregation code is written and deployed as a Microservice.
  • Data is persisted from this service into a database. Custom code or connector deployed/managed for managing changes to the data.
  • Custom reader Microservice code is deployed to query the database using traditional methods and making data available via HTTPS. Load balancing, Microservice management/etc is required.
Kafka --> Consumer/Aggregator --> Persistent Storage <-- Query <-- Microservice/HTTPS   <-- application
(stream)      (Microservice)           (Database)         (SQL)         (webserver)    (javascript/notebook/etc)

With the Eventador Platform 2.0, pipelines are much simpler and entirely managed:

  • Data is produced to Kafka.
  • Continuous SQL queries are run on the stream of data to aggregate, join, filter, normalize, mutate, etc the data and materialized into views.
  • Durable REST endpoints that return JSON data are created for curated query patterns that are utilized directly in applications.
Kafka --> Continuous SQL --> Persistent Storage <-- Query <-- application
(stream)                       (Materialized View)    (REST)  (javascript/notebook/etc)
        ^------------- Eventador Platform 2.0 ---------------^

Building applications with Materialized Views

As we’ve mentioned, Materialized Views allow developers, data engineers, data scientists to consume streaming data easily and quickly. Because you can self-serve you no longer need to engage other teams or write web services yourself. There are just a few steps to creating a Materialized View on Eventador:

  1. Using SQLStreamBuilder, create a SQL statement to represent your view. The FROM clause is the topic you want to read, and associated schema, you define the level of parallelism, number of workers, etc.
  2. Select “Create Materialized View”, name it, and assign various properties as needed. Including NULL handling, and retention. This gets deployed as a distributed job using Continuous SQL to populate your view.
  3. A RESTful endpoint is created, and data is immediately available for queries to pull data into your project or application.

An Example

The easiest way to understand why materialized views on streams are such a game-changer is an example. So let’s take an example where we want to plot locations on a map. In this case, we want to pull locations of aircraft via ADSB (see our blog post). What makes this data stream interesting, and a bit tricky is data doesn’t come in fully formed. Each message is only a fraction of the entire picture – one message may have altitude while another has the GPS location – for the same key (tailnumber). If we just plotted raw messages, our plane icons would blink with a new partial data fragment each time an update to a message showed up, and as humans reading this data, that would be highly annoying. So we must aggregate this data, and create a picture of the current state to make our map. Materialized Views to the rescue!

Our ADSB source data would look something like this for a single tailnumber:

{"icao": "AB2404", "altitude": "37025"},
{"icao": "AB2404", "lat": 37.529572, "lon": -122.265806},
{"icao": "AB2404", "altitude": "37095"},
{"icao": "AB2404", "altitude": "37125"},
{"icao": "AB2404", "lat": 37.530032, "lon": -122.265130},
...
view raw adsb.json hosted with ❤ by GitHub
1. Create the view

The first step is to define a materialized view on the stream so we can see the current state of the aircraft. In the Eventador SQLStreamBuilder interface, we would create a simple query like this:

SELECT icao, altitude, lat, lon
FROM airplanes
WHERE altitude IS NOT NULL
AND speed > 100
AND altitude > 1000;

We would configure a Materialized View on this SQL, in our case we want to ensure we ignore NULL updates (default). We execute the job and it is deployed out to our cluster. It should be noted, this query is running continuously (hence the moniker Continuous SQL). It’s processing the results of the boundless stream of data and keeping the materialized view up-to-date with the latest state of the stream.

2. Query the view from our app

Next, you can Copy/Paste the URL for the RESTful endpoint into your client code. This could be any code – most modern programming languages support REST and JSON to some degree, and it’s especially useful in Javascript applications and data science tools like notebooks.

The materialized view provides the state picture of the stream – We now have the latest position and altitude of the aircraft by tail number. The data is returned as a JSON array of objects via RESTful query:

$>curl -s https://snapper-83a18d57.ae43652a8a6928e.....
[
    {"icao": "AB2404", "altitude": "37125", "lat": 37.530032, "lon": -122.265130},
    ...
]

You can use this information to plot a map using something like Mapbox:

Or maybe do some analysis in a Jupyter notebook:

Where to go from here

We hope this new functionality breaks down barriers for building streaming data applications, allows data scientists to unlock new insights, and helps companies remove much of the brittle complexity in managing. If you want to give it a try sign up for a free trial and give us feedback, we love to hear what folks are cooking up.

Or, if you want to learn more, check out our first Eventador Streams podcast where we dive into why materialized views on streams are critical for businesses, and how v2.0 can unlock that capability.

    1. Bowen,

      Thanks for the comment. The underlaying storage engine is PostgreSQL. We will detail how we use it, and our underlaying MV maintenance service in a deep dive in a follow up post.


Leave a Reply

Your email address will not be published. Required fields are marked *