SQLStreamBuilder October Feature Update

October 31, 2019 in SQLStreamBuilder



SQLStreamBuilder October Feature Update

It’s time for some updates! SQLStreamBuilder remains a huge focus of our team as we build out new features and gather learnings from customers.

Our mission and vision has remained clear—to build the best way for you to create and manage stream processing jobs using SQL, so you can work with your Kafka clusters, databases, and processing logic like the databases you know and love.

Try the Eventador Platform on Eventador.cloud

We just released the ability for you to try both SQLStreamBuilder and Runtime for Flink on AWS. Go to https://eventador.cloud/register to sign up. You will be asked for your AWS IAM keys, and assets will be built in your account. Jump over to the documentation to get started. You get $200 of free credits, so you have plenty of time to evaluate and learn what Eventador can do for you.

You will also notice we have a new user interface and design aimed at helping make things easy for you to navigate. You can use SQLStreamBuilder to write and launch streaming SQL jobs, or you can import and run your own Java/Scala jobs using Runtime for Flink.

Of course, if you need help, have questions or suggestions, please ping us!

AWS S3 sink

SQLStreamBuilder now has the ability to use AWS S3 as a sink. Both JSON and CSV formats are supported, and you can configure how the files are chunked right in the UI. This is great for integration into other components and ecosystems.

Amazon S3 Sink with Eventador

You can also save money by pre-aggregating data before you load Snowflake. For instance, you can load Snowflake using COPY via a SQL statement. Or perhaps you rig it up via Snowpipe.

Maybe you create an aggregation to aggregate over 10 minute windows, like:

SELECT userid, sum(clickcount) as clicks, TUMBLE_END(eventTimestamp) AS ts
FROM clickstream
GROUP BY userid, TUMBLE(eventTimestamp, 'interval 10 minute')

Choose the S3 sink and CSV format, and you can load into Snowflake. From Snowflake docs:

copy into mytable
  from s3://mybucket/data/files credentials=(aws_key_id='$AWS_ACCESS_KEY_ID' aws_secret_key='$AWS_SECRET_ACCESS_KEY')
  file_format = (format_name = my_csv_format);

You can also declaratively route data to different S3 buckets based on values in the data. This allows you to create entire systems for processing data conditionally. For instance:

  • Choose bucket NA sink and execute:
SELECT userid, sum(clickcount) as clicks, TUMBLE_END(eventTimestamp) AS ts
FROM clickstream
WHERE continent = 'NORTH_AMERICA'
GROUP BY userid, TUMBLE(eventTimestamp, 'interval 10 minute')
view raw executetoNAsink.sql hosted with ❤ by GitHub
  • then choose AUS sink and execute:
SELECT userid, sum(clickcount) as clicks, TUMBLE_END(eventTimestamp) AS ts
FROM clickstream
WHERE continent = 'AUSTRALIA'
GROUP BY userid, TUMBLE(eventTimestamp, 'interval 10 minute')
view raw executetoAUSsink.sql hosted with ❤ by GitHub

Kafka key management

Partition keys are an important construct in Kafka—if you want to utilize a partition scheme that is something other than the default round-robin approach. Thus, we have added the ability to specify a key when SQLStreamBuilder sends the output of your query to a Kafka sink. It’s easy to do in SQLStreamBuilder:

-- specify ICAO as key on output topic
SELECT icao, flight, lat, lon, icao AS _eventKey,
PLANELOOKUP(icao) as aircraft_type
FROM airplanes
view raw specify key.sql hosted with ❤ by GitHub

In this case, the producer will specify _eventKey as the partition key when sending messagings to the Kafka topic. In this case _eventKey is the column “ICAO”.

Browser only (no sink)

This feature is just what is sounds like: a sink-less sink. Sometimes you just want to reason about your data, make decisions, and iterate on crafting the perfect SQL statement for your use case—but without actually putting the results into any sink until you are ready. The Browser Only Sink does just this. Data is sampled to the screen where you can contnue to iterate, and when you are ready to ensure the results are spooled to a sink, select one of your virtual table sinks.

We continue to iterate on SQLStreamBuilder and the Eventador Platform, so give these features a try and let us know how you like them!

Get started with an interactive streaming SQL engine for Apache Kafka and streamlined Apache Flink management.

Leave a Reply

Your email address will not be published. Required fields are marked *