Podcast: Getting More Out of Streaming SQL with the Blink Planner in Apache Flink

August 6, 2020 in Eventador Streams Podcast



Podcast: Getting More Out of Streaming SQL with the Blink Planner in Apache Flink
Eventador Streams Podcast: Getting More Out of Streaming SQL with the Blink Planner in Apache Flink A Dive into Apache Heron & Understanding Streaming Systems

In this episode of the Eventador Streams podcast, Kenny and I chatted about the Blink planner and all the fun and massively useful SQL functions that it unlocks for streaming pipelines.

Learn more about the Blink planner, Continuous SQL, and just how they can help you get more—easier—from your real-time data in this episode of Eventador Streams.

Eventador Streams · Getting More Out of Streaming SQL with the Blink Planner in Apache Flink

Want to make sure you never miss an episode? You can follow Eventador Streams on a variety of platforms including Soundcloud, Apple Podcasts, Google Play, Spotify, and more. Happy listening!

Episode 12 Transcript: Getting More Out of Streaming SQL with the Blink Planner in Apache Flink

Leslie Denson: These days, if you’re talking about stream processing, SQL is pretty much always a part of the conversation. And if you’re talking about Flink and SQL, the Blink Planner is also always pretty top of mind. So, Kenny and I sat down to chat about the Blink Planner and just how much it’s helped moved streaming SQL forward during this episode of Eventador Streams, a podcast about all things streaming data.

LD: Hey, everybody, welcome back to another episode of Eventador Streams. Kenny and I are here today, and we are gonna talk about a topic that is very near and dear to our hearts here at Eventador. And if you guys know anything about us, you’ll know that. And also something that is becoming more and more prevalent and near and dear to everybody’s hearts, ’cause they’re using Apache Flink, and that would be SQL, or as we like to call it continuous SQL. So, we’re excited to get started today. How’s it going, Kenny?

Kenny Gorman: Good. How are you?

LD: Good. We are officially recording this one in the morning, so instead of asking if you have a beer in your hand, are you good with coffee?

KG: I am caffeinated, yes. Since we’ve started podcasting, I’ve been through two coffee makers, so hopefully we’re in a good spot now.

LD: Is that because of the podcast or…

KG: Yeah. I don’t know. I’ll leave that up to the listener, I suppose.

LD: Awesome. Well, let’s just dive right in. And I think one of the reasons why this topic came up and we wanted to talk about it is in the conversations that we’ve been having lately. And especially with everything that we’re hearing with Flink and all the stuff that the community is doing, the Blink Planner keeps coming up more and more, and hearing more and more about it. And I’ll be the first one to say that, as much as I understand a lot of things around this, the Blink Planner is just not something I think that I have dove super into. And I’m willing to bet, while many of our listeners know about it, some may not totally understand it a little bit more. So, why don’t you talk to us a little bit about the Blink Planner and why it matters?

KG: Yeah, you’re right. I could totally imagine someone coming in who’s not neck-deep in the Flink community or stream processing community, maybe they’re interested in SQL or whatever, and we say something like the Blink Planner, they’re gonna… What the hell is that? And where does that fit in?

LD: What is going on?

KG: Yeah. I’m not the authoritative source here, but I’ll give you, maybe our listeners, a little bit of history, a little bit of what is the Blink Planner, where it came from, why it matters. Maybe we’ll talk through that a little bit.

LD: Yeah.

KG: Ultimately, if you’re coming from a SQL world, you can think of the Blink Planner as like a SQL optimization engine, but that’s a really, really broad definition. In the stream processing world, there’s a lot more that goes on that’s a lot different than in a traditional database system. But ultimately, the Blink Planner, the planner itself… There’s a lot of changes with Blink, and I guess I’ll talk about that here in a sec. Itself, the Blink Planner component is mostly focused around the SQL implementation. So, the way this whole thing came about, I’ll just tell you in story format, is Alibaba was using Flink and ultimately forked it for their needs, for their specific needs around stream processing. And they were using it at scale, obviously, and they needed to do computations. They wanted to use SQL, they needed some performance improvements and various things. Ultimately, they forked it and called it Blink. So, that was essentially a fork of Flink called Blink, and that was at use at Alibaba.

KG: And a couple of things. It had some interesting changes in and of itself, things like the way they deployed clusters was a little bit different and more optimized around performance. Things like incremental checkpoints were a little bit different for recoverability reasons, and just being a little bit more efficient with state management. They did some cool work around Async IO. Going back to database days, obviously Async IO, when that started happening in relational databases, that was a big deal. And that was a long time ago, so it’s interesting to see that come up in the stream processing world. But none of those things are really SQL things. Those are all just part and parcel of why they had… Well, why they forked it and the performance pieces of it.

KG: The Blink Planner was a little bit different. It’s interesting. Looking forward when we think about SQL, it’s so cool to see… Look, SQL is a declarative language, it’s super easy to describe data, it’s super easy to work with it, and SQL is awesome. And it’s cool to see that Alibaba… This is going back to, let’s call it 2016 at this point, was really hip on SQL as well, and realized the value for developers. It’s not even… Some people say, “Oh, it’s for business folks,” or whatever. No, not necessarily. It’s just so convenient, that’s the thing. And if you have to write a stream processing job, it’s just super nice to be able to describe your data, write SQL, and get done with your task. Anyway, that’s part of, I think, where the mindset was around SQL for them as well, it sounds like.

KG: So, that’s a little bit of that background. The SQL improvements and changes that they had focused around things like different UDFs, how to join better, better query optimization, they had this notion of dynamic tables, things like that. And it was all around getting to this point of machine learning. They did real-time machine learning, they wanted to do accounting, summation and various statistics at scale, and very big scale. And so building the components out of Flink made a ton of sense, but SQL needed to have some robust operators and robust capabilities.

KG: I remember back in those days, it was like, “Oh, yeah, the SQL engine that’s just completely on steroids is the Blink SQL engine.” And that was the one that we were kind of tracking. We were like, “Okay, those guys are really deep into SQL, but it’s a fork of Flink.” That created a problem essentially, ’cause if you were on Blink, cool. If you’re on Flink, you’re like, “Well, can I have that stuff?” And the answer was no, until 1.9. Let’s see. So, 1.9 was somewhere… Was that August-ish when that came out?

LD: That sounds about right.

KG: And the story there was, obviously they have to merge the two things. Alibaba bought Data Artisans, we all know that, and then it was like writing on the wall. These two things are gonna become one, but how long is it gonna take, and when is that gonna happen?

LD: Right.

KG: And so roughly August of 2019, I suppose, if I’m getting my dates right, 1.9 came out, and 1.9 was huge. 1.9 was unbelievable in the sense that now you had the combination of the Blink benefits with just mainline Flink. And they did some cool things, like if you look into the flips and look how the architecture changed. They realized that… Okay, in Flink, you have the DataSet API, which is batch, and the DataStream API, which is streaming. And those two APIs weren’t congruent. They worked in different ways. You had to declare which one you were using. And if you wrote it in one way and you wanted to write it… If you wrote a batch and you were like, “Oh, I have to do something in streaming,” you’d have to essentially rewrite code. And while it was awesome to have both capabilities on tap, they weren’t really unified.

KG: Just to point out the 95th percentile use case I think we see everywhere is, “Hey, I have this stream of stuff and I need to join it with this historical table. Can I just do that simply in SQL?” And the answer is, “You know that’s really hard, right?” ‘Cause of state management and retractions. And the idea of taking something that’s a batch-oriented structure that’s updated over a large timeframe, and then take a stream of data and then continuously return results against those two, it’s very hard. That is a hard data management problem. And to do it at scale and to do it with consistency, it’s tip of the spear computer science. That’s crazy, and it’s hard. So, what they’ve done is realize, A, “That’s hard and we’re gonna work towards solving it,” which I think is killer, but that required some changes, like, we can’t really have a DataSet API anymore.

KG: So, what it looks like, the latest and greatest changes, at least at 1.9, was, “Look, let’s just get rid of the DataSet paradigm, it’s all gonna be DataStream.” And so, that required some changes obviously to the underlying code. And so that’s cool. Now, you have this kind of more simplistic architecture. And batch is still there, it’s just you’re accessing it through the DataStream API, and some of the Blink changes make that possible. And at that time, 1.9, you had your choice. You could choose the Flink Planner or the Blink Planner. And if you chose the Blink, you, of course, get the Blink features. If you don’t, you don’t. And that was done for legacy reasons. But that was cool, ’cause you got the richer, built-in functions that we would expect from SQL, like INITCAP and some of these string processing capabilities. These are just off the top of my head, RANK and DENSE_RANK and things like that.

KG: And so things that honestly are… CONCAT, I think, is another one of them. I’m going off the top of my head. Maybe our listeners can correct me if I’m getting some of these wrong. But they’re important, ’cause if you’re a SQL nerd and you’re messing with SQL, then these functions and capabilities are… You’re used to typing them, you use them all the time. I use concatenation all the time to create composite primary keys. Username, a list of making it up, like username, last name. Well, Kenny is not super unique, Gorman is not super unique, but Kenny Gorman might be better. If I can concatenate the two and maybe my phone number or something, that’s even better, if I don’t have an email or whatever. That kind of trickery and horseplay happens all the time in SQL. It’s just part of why you’d even wanna use SQL. And having these simple functions to operate, and the strings and comparison operators that are a little bit more robust, and especially set operators like RANK, and I think CUBE’s in there, and a few other things, are killer.

LD: Let me ask this question. I know solely because to some degree I know our product back and forth, I know we use Calcite SQL, so it’s ANSI SQL, it’s the SQL most people know. I know there are other folks out there who also, in their streaming SQL product, use ANSI SQL. There are some who use a more SQL-like language where it’s SQL but there’s definite differences that may trip some people up or once you learn ’em… You just kinda have to learn something new. What is this?

KG: Yeah. Man, that’s a good question. Here’s the thing, here’s the reality of it. We say that, ANSI SQL. Having done SQL for 20 years, there is no real ANSI SQL. I mean, there is an ANSI SQL definition. Everybody says it’s ANSI SQL-compliant, but it’s not really that. It’s that there’s functions that are above and beyond what ANSI SQL specifies. When we say “Calcite,” we mean the streaming part of Calcite that Flink supports, so there’s a subset or superset there. It’s super messy to actually define that, to be honest. In a marketing brochure, of course, we talk about the commonalities and all that stuff, ’cause it’s important. But when you’re writing SQL, the way you write SQL is you’ll probably try… This is just the reality of it. You’ll probably try and write what you know. If you’ve been using my MySQL for years, you’ll write MySQL, and it’ll either work or it won’t, and you’ll be like, “Oh, crap, is that how I join? Hold on, let me check the docs.” This is how it’s been done for ages. And you go, “Oh, right. So, when you join in this, you say join on versus a comma between the tables. Okay.”

KG: So, even back in the day, Oracle had different join syntax in MySQL or Postgres, and you just got used to the idea that, like, “Okay, these differ slightly, but the concepts and the way the results are returned are always the same, it’s just the grammar is slightly different.” And so you just learn that over time. You learn that, “Oh, I’m switching to a different system. I’m just gonna muck with this grammar. Okay, what is it for this thing? Damn it. Alright, here we go.” And that’s just normal. That’s just the normal part and parcel for doing SQL.

KG: I think the Blink editions, especially in functions, are well-known. If you use database for a while, you’ll know what INITCAP is, you’ll know what CONCAT is. Those are well-known… CONCAT just takes two inputs and it concats them together and returns a single string. Things like that. INITCAP is initial capitalizations. These have been around for a long time, and that’s kind of just how it is. The Calcite thing is tricky, and you bring that up and you’re like, “Yeah, well, it’s just Calcite.” True. But it is only a subset of Calcite in the part of it that’s streaming. And even then, it’s like the best of most authoritative sources really, the Flink docs, ’cause the implementation of the Calcite SQL can be different based on the fact that it’s being implemented in Flink, not in some other systems.

LD: Okay. Yeah.

KG: That’s a good question, yeah.

LD: That makes a lot of sense. I occasionally have something deep…

KG: I realized that I’m talking for 10 minutes straight, I’m tired now.

LD: You’re fine. In all of that, and with what we’re talking about, the way that we think of SQL is, as people have heard us  say, continuous SQL, because it’s not something that where you go to a database, you run a query, you get a response, you’re done. This is continuously running. So, with the Blink Planner and the functionality that has come along with it and the updates that have come along with that, what… From a continuous SQL capability, whether it’s us or if there’s anybody running SQL through Flink…

KG: Right.

LD: What kind of capabilities are now unlocked that maybe you couldn’t do before?

KG: Right now, Eventador supports Flink 1.10.1, I believe, as of today. That could change tomorrow. 1.11 is out, so there’s that. In 1.9, like I said, the Blink Planner was optional, you would specify it. In 1.10, it is the default. And so you’re kind of seeing this over time, it will be the… I’m sure it will be the only planner, and I think that’s good thing. In Eventador, when you use Eventador today, if you create a SQL StreamBuilder cluster today you get the Blink Planner in 1.10.1, I believe. That’s as of today. That’s the latest and greatest. You can just immediately start writing SQL with Blink SQL, essentially. So, list aggregation, last… Oh, LAST_VALUE. I forget about LAST_VALUE. That’s a big one.

KG: LAST_VALUE is essentially saying, “Hey, take this column and… ” LAST_VALUE is a function. It takes an expression. And you can send it, basically, the result of your query. Like, say, over a window operator, it will return the last value of that particular expression or column, which is super powerful. So, if you’re taking… I’m just gonna make up… I’m just shooting on the fly here, so this is probably a bad example. But if you’re taking airplane altitudes and you’re trying to window them over the last 10 minutes, if you average in altitude over 10 minutes, that would probably be pretty bad. It may go up, it may go down, but average isn’t… That might even not… If you average altitude, it might never have actually been at that altitude. Does that make sense?

LD: Yep.

KG: ‘Cause you’re just averaging… You lose precision.

LD: Right.

KG: LAST_VALUE says, “Hey, I got done with this window. What’s the value right now? What’s the latest value?” If it was descending, it would be the lowest altitude in that window operator or whatever. Super powerful function. This first and last, there’s things like… Like I said, RANK, RANK is cool because RANK basically allows you to take a set, and that set of data then you can score it amongst itself. In a streaming context… You’ll notice in every one of these things I say, window, right? ‘Cause continuous SQL requires time, some notion of time. ‘Cause if you don’t have a notion of time, it never finishes. Right? So, Select Star from table doesn’t ever finish in continuous SQL. It just goes forever. You gotta give it a window operator. You’re gonna say “Select Star from table for,” whatever, “the next five minutes, and then tell me what happened.”

KG: So, RANK, you can give it a dataset and you can rank those results. So, if you wanted to rank… If you wanted to say, “Give me the top five scores in this game data feed, this game stream,” that would be perfect. RANK is perfect for that. Things like that. Those are all the new SQL capabilities that… And those are all available in SQL StreamBuilder today. They’re available in Flink 1.9, 1.10, 1.11. The big difference using SQL StreamBuilder versus just Flink is it’s interactive. You can just actually play with that SQL, it’s a console, you can get feedback immediately, you get data back immediately, and you can pick your sources and syncs, and just kind of architect and creatively build your pipeline however you want. So, instead of crafting a low-level job and writing it in Scala, but putting SQL into the job or using something like the Flink SQL terminal, which is cool, but relatively rudimentary. You get everything from soup to nuts with Eventador. And so we support the Blink Planner now, and that’s been great.

KG: For me, the journey was… We use a lot of examples all the time to show this stuff. And we talk about airplane altitudes. We have ADS-B examples out there. We always talk about this fraud example. We have a bunch of these known hypothetical use cases. And so many of those use cases were just so unlocked. The usefulness is just totally unlocked when you can do things like last value, I told you about the airplane example. CEP framework was really cool for doing stuff in fraud detection and things like that. And that’s all available now. That was available before Blink, but now it’s part of it as well.

KG: And just the simplicity of string functions… Part of the cool thing in SQL is just being able to futz with the data and get the output that you need, and having functions that allow you to mutate Unix time. Just show me a human-readable date from Epoch. It’s like, that just has to be there, right?

LD: Right.

KG: Things like date format and floor and… Some of these may have been there already. I’m just going off top of my head. Oh, yeah, like trim, LTrim, RTrim… All this kind of stuff that just allows you to mess with strings, padding, concatenation, INITCAP, regular expressions, all this stuff. And Flink had some of the stuff. Part of it is… I can’t remember what it did have and what it didn’t have, I’ll just let the listeners look at the docs, and maybe they can comment and tell me how wrong I am. But once you’re able… LAST_VALUE, for sure, was one of the ones that came in, part of the Blink Planner that was so powerful. That basically unlocks a lot of capabilities in SQL StreamBuilder. You can just build your SQL statements with this stuff obviously, and then create materialized views and you’re off to the races.

KG: What we found a lot of people were having to do is actually create secondary… They would either have to write a Flink job just to do something simple, which is always annoying. It’s like, “I don’t wanna have to have a Java process in prod that just… ” It’s like, “What does that thing do? Oh, it just lowercases everything.” It’s like, “Oh, come on, that’s a lot of work, a lot of scaffolding.” And people do that. They would just write up a Java process, a microservice that would attach to the stream and just do a simple thing to mutate the data. Maybe they would normalize the data in some way, like capitalization and concatenation, and maybe they would ignore values that are bogus, and then pass it along. It’s like, “That’s a lot of work just for a simple filter,” and that’s where SQL is so good. But if you don’t have those primitives, then it’s just hard to actually really do it. And so the Blink Planner unlocks those use cases.

LD: Which it makes sense, I think. We’re hearing more and more about it, and it makes sense as folks in, you mentioned this a little bit earlier, in a broader range of job functions. And it’s moving into data science, real-time analytics. Folks who are doing these real-time dashboards need better, faster, and easier access to things like streaming data, SQL, and being able to unlock some of these different things that you’re talking about. I feel that could be incredibly crucial to them being able to do what they need to do.

KG: Yeah, that’s right. That’s what we want people to get, because if… Like in a lot of these organizations, you have a data team, and there are data engineers. And those folks are super talented, they’re under super… They’re super busy, they’re under a lot of stress. The organization… Every organization is a data organization, yeah, and every… The data engineers have to do all the work. And so you have a combination of backend engineers, maybe Java programmers, data engineers who are probably writing Java and Scala jobs, too, but they’re probably also doing a lot of the wiring and DevOps-y stuff. Everybody’s really busy. And for a data scientist to say, “Hey, I know I told you yesterday I needed all the folks from this particular county’s list array in for this voter thing, but today it’s a different list.” And the data engineer is like, “Okay, I’ll turn that around in a week.” The data scientist is like, “Okay, why don’t they take the next week off?” Right?

LD: Right.

KG: ‘Cause they just need that data. So, the SQL… The idea of plugging in SQL is so important, at least to my mind, and that’s where we’re going with the product, is let the analysts, let the data scientists, let the folks who are actually building products out of streaming data, and now increasingly batch data, and joining the two, allow those folks to have the tools they need to build it themselves, and let the data engineering staff build those core building blocks and that core platform, and then let those guys self-serve. That’s the real promise, in my mind, of SQL. Of course, the data engineers can write Java or Scala to build those processes. They have that skill, they’ve been doing it for a long time. And, yes, it’s more convenient and fun to use SQL, and they will for a lot of cases. But in the greater… If you’re the CTO, you’re looking at this, saying, “I’ve got four people who the entire organization is bottlenecked on,” or whatever. “I need to hire more of those people.” The answer might be, “No, I just need to unlock that capability to a wider audience using something like SQL.”

KG: And that promise is becoming… And it’s not just us that is thinking this way. The industry is going this way. I think we’re… I like to think of ourselves as leading this thought process and being thought leaders here, but the truth is that we’re one of a bunch of folks, and I mentioned them in other podcasts. I always think it’s important to mention the other folks. You see, KSQL doing a lot of this stuff, although it’s much more about Kafka. But you see folks like Materialize doing this, you see Rocks coming in from a different direction. So, there is a lot of thinking… You see the Flink SQL console getting more energy. And obviously, I just told you, way back in 2016, you have Alibaba totally doubling down on SQL and going this direction.

KG: It’s only gonna be… I think SQL is only, only gonna be more and more important, and really getting the folks to open it up like they… Relational databases took off when you had a SQL prompt, and you could start typing queries into your data. We haven’t really got there with streaming yet. And it almost sounds so primitive and rudimentary, but we’re starting to. And I think pretty soon we’ll be able to just type in a query against a stream, join it with a batch table, and you won’t know the difference. And you’ll have the operators and functions you need, and, bam, now you have really powerful tooling to build all these really cool use cases.

LD: Right. And that leads me into… My next question was gonna be for you, and I think you answered some of it, but I wanna give you the opportunity to build out, if you want, is what do you see is the impact long term on streaming? Do you see a world where there really is no difference between batch and streaming? Because we’ve now made it so easy to, with something like SQL, bring those together. What do you think, in the next… I hate to say five years, ’cause five years seems like an eternity.

KG: Jesus. Right? Yeah.

LD: In the data world. But what do you see in the next six months, a year, two years, so on, is this really starting to lead into?

KG: Right. Right. We’ve seen… I talked a little bit about the Alibaba history and Blink and how we got to where we’re at now as a community. And I think that… The themes, the high-level themes… And I talked some of the details, but the high-level themes are, hey, batch and streaming must be equal citizens. It’s clearly important that a regular old table can be joined with a high throughput stream at scale. That’s gotta be a thing that has to work. And I think today it works, but I think tomorrow, with the idea of some of the Blink features, how they’re doing dynamic tables, how we’re doing user-defined table functions, some of these capabilities are really gonna help in that area. So, I’m not sure we’re 1000% there yet ’cause… Look at it this way. If you think of Spark, so Spark is just… I’m gonna categorize it in broad brushstrokes here, but Spark is batch first, streaming second. And it works… It’s got really good APIs. Sparks SQL is well known by a lot of people. Querying streams of Spark and Spark streaming is, by comparison of things like using Kafka or Pulsar and Flink, is Neanderthal, but it works. And so you have it, it’s mostly useful.

KG: But it’s kind of a batch first thing. And I think the cool thing about what’s happening here is the hard work is being done, and hats off to the community, especially the folks from Alibaba who are now core committers, putting in those really, really hard, very clever changes to make SQL work on streams as a first class citizen. And bring in batch underneath that data stream API. So, I think, ultimately, yeah, that’s where it’s going. I think, if we’re gonna be successful as a community, we have to make streams just feel like a database. And that is a hard… That is a super hard paradigm. That’s why it’s like there’s all sorts of differences. We’re using jobs to process at scale, first of all, with state and recoverability. And I talked about Async IO, and all that stuff had to be built to make this happen.

KG: On top of that, we had to have all the similar kind of grammar and all the kind of functions. And a lot of them are really hard to do in a streaming context. It’s not like rank over a data set that has a finite boundary. It’s way easier than doing that in a streaming context and returning results incrementally. The whole idea, when you join against a batch table, you have to retract, you have to keep current with the changes that happen in that batch construct. So, it’s hard. And writing the code and testing the code and making sure it works in a reliable and scalable way, it’s God’s work, man. That’s where I see it going. I see us as a community… Our goal is to bring that to the masses and help folks really build on top of that, and widen the audience and use cases that can actually use this stuff. So, you don’t have to be a programmer, you don’t have to have a job, a console to do it. You could just open VI and start typing SQL, something like that. But anyway, that’s where it’s going.

KG: More and more people are using Flink, and I think Flink, when we started off, we looked… Eric and I would look at each other and go, “I don’t know, is Flink gonna be a thing? It’s super complex. Is that really the right answer?” And the answer is, “Yeah, it was exactly the right answer.” And the original creators of Flink, hats off to them, and we’ve had a bunch of them on the podcast, and it just keeps growing and growing. It’s pretty exciting. I can’t wait to see how things go next.

KG: It’s a thrill to just write a PSQL one, just launch it against the stream and, bam, I guess, get my results continuously forever, and they’re correct. And if the job stops, they pick up where they left off. Their window operator doesn’t have to reread the whole window, it picks up right where it left off. That’s intoxicating. The idea that if I need to upgrade my job and change my SQL, I can stop my job with Savepoint. That’s another thing I forgot to mention, you can now stop with Savepoint, and change it and then start up again. That’s crazy. That’s really, really cool. So, a lot of neat capabilities and all around trying to make it a much more, feeling like a database.

LD: And it just makes it… I will say this, it makes it a lot easier, I think, also for folks who are outside of the data realm to interact with the streaming data as well. We have recently started using it for a specific marketing function internally. And Kenny went and wrote a SQL job to get data to where I needed it to be, but also said, “Here’s some materialized view. If you wanna just go look at it, here, just go look at it here. Click on the link, it takes you to a place, you can see it, there you go.” And it took no time at all. And for something like that, that we wanna be able to see in a streaming capacity, is really… I’ve never, as a marketing person, been able to have somebody give me something like that where I can look at the data that I need to look at it in real-time without it being a huge massive company initiative.

KG: Right. Most of this stuff is tribal, right? We were talking about the data scientist who emails his buddy for the database creds, and then gets a login and starts hacking away. It’s like this ridiculous tribal-ness. And I think a lot of it has to do with… That streams are still opaque to the organization. As much as we have done in the last five years around Kafka and all the awareness there and that kind of thing, if you ask a data scientist, “Hey, can you just… Do you wanna do some machine learning on your click stream data?” They don’t know how to get to it. Most of the time, somebody in the data engineering team has written a consumer to put that data into a database, and then they’ve… The data scientist asked his buddy for the login and they get it. That’s really not streaming data. That’s just…

LD: That’s just data.

KG: That’s getting by. Yeah, that’s not… Right. In our case, we obviously gotta eat our own dog food, so the use case you talked about is like, “Hey, if leads are coming in from our partner, those leads need to be processed in real-time, routed in real-time, aggregated in real-time, and then put in the various systems.” In our case, I can’t wait for it to be at mega scale. If we have billions of leads coming in today, that’d be awesome. But it’s not yet. We’ll be there. But today, when those leads come in, they get filtered and aggregated and de-duped and jammed into our CRM, and we do that via webhook. One of the things on Eventador… Just to brag for sec, one of the things Eventador has is a webhook sync, so you can actually configure a webhook and hit an API. So, message by message… It doesn’t have to be message by message, but in our case it is. You just call a webhook, you do a post, and you can hit an API with an API key and integrate with systems that way. So, super easy, also send email alerts, that kind of thing.

KG: And then also do statistics. We wanna be able to do aggregations and statistics, and then route those to various dashboards and things like that. Building a system like that used to be a ton of backend engineering work. Today it’s three or four different SQL jobs that do various different things and have various single source and various syncs. And it’s that simple. And I think a lot of organizations should be thinking that way on a variety of levels, even with smaller systems that you plan to grow and wanna use in various new ways.

LD: Well, I think it might be time for another cup of coffee, if there’s anything else.

KG: I had three. It probably shows.

LD: Oh. Maybe Kenny doesn’t need another coffee today. But, no, this is… I joke about going to grab coffee, but this is a really… Let’s face it, it’s a topic. SQL on streaming is a topic that is everywhere at this point. And the advances that are being made across the board are really spectacular, and it’s making it such that we kind of… From the marketing perspective you kind of laughingly joke ’cause you do materials and you’re like, “You can do things you’ve never done before,” but that it’s… And it’s actually really… Crazy as it sounds when you say it and it seems very marketing fluff, it’s actually really quite true. And it’s just nice to see. And to your point earlier, a lot of it is thanks to the community as well.

KG: Oh, yeah. I was gonna double down on that. The work that folks have been doing, and I really try to characterize it as tip of the spear computer science, it really is. And the teams at Alibaba, the folks at Data Artisans, and the other contributors that are adding code now, man, hats off. Excellent work. Flink, just like anything, sometimes you curse or whatever, but we brag about it all the time here, so it’s amazing. The community, hats off, continue to participate and fund where appropriate, and participate where appropriate, as we can and continue forward. If you haven’t tried Flink and this sounds cool, you should go download Flink. If nothing else, go download Flink and play with it. Because it’s awesome piece of software, and the community is growing and growing. It’s a fun and very, very cool and smart community.

LD: Awesome. I would agree with that. All right. Thanks, Kenny.

KG: All right. Thanks, Leslie.

LD: Well, there you have it, folks, the good and the awesome about the Blink Planner. If you’re interested in learning more about Blink, check out some of the presentations at past Flink Forward events. Or as always, you can reach out to us at hello@eventador.io, or on Twitter, @EventadorLabs. Also, if you’re using Apache Flink and you wanna use SQL, and you’re on Microsoft Azure, we’ve got just the simple solution for you. The Eventador platform is now on the Azure marketplace, so head over there to check it out. Happy streaming.

Leave a Reply

Your email address will not be published. Required fields are marked *