SQLStreamBuilder Feature Update

July 30, 2019 in Apache Kafka and SQL



SQLStreamBuilder Feature Update

We are rapidly adding features and iterating on SQLStreamBuilder, I thought I would share a couple of improvements and features we recently added. If you aren’t familiar, SQLStreamBuilder allows you to run Streaming SQL jobs against streams of data—initially focused on Apache Kafka. Here is my blog post introducing SQLStreamBuilder, check it out for some background and more details.

Confluent Cloud Kafka Support

In short, we added the ability to connect to Confluent Cloud via SASL/SSL. Grab your SASL username and password from the Confluent Cloud console and paste them into SQLStreamBuilder. You can then create new virtual tables from any Kafka topic. Traffic goes over public interfaces but is encoded via SSL.

Note: This method also works with any cloud provider or Kafka cluster with public interfaces and SASL/SSL support.

SQLStreamBuilder: Rich streaming SQL engine that works with any Kafka provider

AWS Managed Streaming for Kafka (MSK) Support

We also added the ability to connect to AWS MSK. Connecting is done slightly differently. MSK doesn’t have public interfaces, so in order to connect, you must peer to the VPC for MSK.

Eventador allows for two different models – one where we own the VPC being peered, the other where we instantiate SQLStreamBuilder into your account or a sub-account and peer to the VPC with Kafka. Either way, during Early Access ping support and we will get you set up—it’s a one-time thing.

Once peered, you can create virtual tables against MSK topics like any other Kafka cluster.

For both Confluent Cloud and AWS MSK, you can check out the documentation for more information and details on connecting and performing queries directly against these data streams.

Hyper-joins

SQLStreamBuilder has the concept of virtual tables. These are simply named links for a data source. They can be a source or a sink for data, and they can be on different systems and of different types (say Kafka or RDS). When you issue a SQL join you are joining virtual tables.

The SQL join in streaming (for Kafka topics, etc.) works like you would expect, but, because the predicate virtual table in your SQL statement is a virtual table—joins are hyper-powerful. You can query multiple different sources in one statement. The results are sent straight to your defined sink virtual table. So imagine:

  • Joining data from two different cloud providers, on-prem, or migrating between them.
  • Joining two different types of data sources, maybe Kafka and RDS.
  • Distributed ETL – pulling data from many clusters to form a unified picture.

For example something like this:

SELECT p.auth_amount
FROM payments_from_POS_MQTT p
JOIN payments_bad_actors_from_antifraud_cluster b
ON p.userid = b.userid;
view raw joins.sql hosted with ❤ by GitHub

Or:

SELECT a.*, b.*
FROM confluent_cloud a
JOIN on_prem_kafka_cluster b
ON a.sensor_id = b.sensor_id
view raw joins.sql hosted with ❤ by GitHub

Or maybe even:

SELECT us_west.user_score+ap_south.user_score
FROM kafka_in_zone_us_west us_west
FULL OUTER JOIN kafka_in_zone_ap_south ap_south
ON us_west.user_id = ap_south.user_id;
view raw joins.sql hosted with ❤ by GitHub

The possibilities are hyper-powerful.

Get started with an interactive streaming SQL engine for Apache Kafka and streamlined Apache Flink management.

Leave a Reply

Your email address will not be published. Required fields are marked *