We are rapidly adding features and iterating on SQLStreamBuilder, I thought I would share a couple of improvements and features we recently added. If you aren’t familiar, SQLStreamBuilder allows you to run Streaming SQL jobs against streams of data—initially focused on Apache Kafka. Here is my blog post introducing SQLStreamBuilder, check it out for some background and more details.
Confluent Cloud Kafka Support
In short, we added the ability to connect to Confluent Cloud via SASL/SSL. Grab your SASL username and password from the Confluent Cloud console and paste them into SQLStreamBuilder. You can then create new virtual tables from any Kafka topic. Traffic goes over public interfaces but is encoded via SSL.
Note: This method also works with any cloud provider or Kafka cluster with public interfaces and SASL/SSL support.
AWS Managed Streaming for Kafka (MSK) Support
We also added the ability to connect to AWS MSK. Connecting is done slightly differently. MSK doesn’t have public interfaces, so in order to connect, you must peer to the VPC for MSK.
Eventador allows for two different models – one where we own the VPC being peered, the other where we instantiate SQLStreamBuilder into your account or a sub-account and peer to the VPC with Kafka. Either way, during Early Access ping support and we will get you set up—it’s a one-time thing.
Once peered, you can create virtual tables against MSK topics like any other Kafka cluster.
For both Confluent Cloud and AWS MSK, you can check out the documentation for more information and details on connecting and performing queries directly against these data streams.
SQLStreamBuilder has the concept of virtual tables. These are simply named links for a data source. They can be a source or a sink for data, and they can be on different systems and of different types (say Kafka or RDS). When you issue a SQL join you are joining virtual tables.
The SQL join works like you would expect, but, because the predicate virtual table in your SQL statement is a virtual table—joins are hyper-powerful. You can query multiple different sources in one statement. The results are sent straight to your defined sink virtual table. So imagine:
- Joining data from two different cloud providers, on-prem, or migrating between them.
- Joining two different types of data sources, maybe Kafka and RDS.
- Distributed ETL – pulling data from many clusters to form a unified picture.
For example something like this:
SELECT p.auth_amount FROM payments_from_POS_MQTT p JOIN payments_bad_actors_from_antifraud_cluster b ON p.userid = b.userid;
SELECT a.*, b.* FROM confluent_cloud a JOIN on_prem_kafka_cluster b ON a.sensor_id = b.sensor_id
Or maybe even:
SELECT us_west.user_score+ap_south.user_score FROM kafka_in_zone_us_west us_west FULL OUTER JOIN kafka_in_zone_ap_south ap_south ON us_west.user_id = ap_south.user_id;
The possibilities are hyper-powerful.