Podcast: Crushing Apache Flink-based Streaming Systems with special guest Gyula Fóra

August 27, 2020 in Eventador Streams Podcast



Podcast: Crushing Apache Flink-based Streaming Systems with special guest Gyula Fóra
Eventador Streams Podcast: Crushing Apache Flink-based Streaming Systems with special guest Gyula Fóra

In this episode of the Eventador Streams podcast, Kenny and I had the chance to chat with Gyula Fóra, currently a Flink engineer at Cloudera and formerly on the streaming systems team at King, about (candy) crushing Flink-based streaming systems in production.

Learn more about the Gyula’s history with Flink and best practices gleaned from helping develop and manage on King’s business-critical streaming systems in this episode of Eventador Streams:

Eventador Streams · Crushing Apache Flink-based Streaming Systems with special guest Gyula Fora

Want to make sure you never miss an episode? You can follow Eventador Streams on a variety of platforms including Soundcloud, Apple Podcasts, Google Play, Spotify, and more. Happy listening!

Episode 13 Transcript: Crushing Apache Flink-based Streaming Systems with Gyula Fóra

Leslie Denson: Think about all the people that you know that play Candy Crush. Now, think about all the millions or more realistically, billions of events that they create that King ingests every day. Kenny and I had the chance to chat with Gyula Fóra formally on the streaming platform team at King and now currently one of Cloudera’s Flink engineers. About the evolution of Flink and how it really works in production environments. During this episode of Eventador Streams, a podcast about all things streaming data.

LD: Hey everybody, welcome back to another episode of Eventador Streams. Today, Kenny and I are super excited to be joined by a guest that has what I think is an incredibly fun background in streaming. We’ve got Gyula Fóra on the line with us who is currently a Flink engineer working on Cloudera’s Flink product, but as we’ll get into, he has some really fun experience being on the user side as well. So Gyula, thank you so much for joining us.

Gyula Fóra: Hi Leslie. Hi Kenny. Thank you very much for hosting me today.

Kenny Gorman: Yeah, welcome.

LD: As I mentioned, you are currently at Cloudera working on their streaming product using Flink, but you have a really fun and cool background using Flink as well and using streaming system. So why don’t you tell us a little bit about your history with Flink and streaming data in general?

GF: All right, so my career in distributed systems and in programming in general started in 2014. I had a university background in mathematics, so I wasn’t really prepared to be a programmer and I didn’t really have anything to do with Big Data systems back then, but actually I had a chance encounter with Márton Balassi at the university exam. So basically, we were waiting for the teacher and we just hit up a conversation, and he told me that he was working with Big Data systems. I didn’t know anything about that at that point. I was mostly interested in machine learning and statistics. But long story short, we went out for some beers, he told me all about Hadoop and company. So he got me hooked on the tech side of things. Yeah.

KG: Wait, wait, so he got you drunk and convinced you to change your career. That’s what I heard.

GF: Well, something like that.

KG: Perfect.

GF: Yeah.

LD: That’s the best way to do it.

GF: Yeah, if put it that way, it doesn’t sound very good, but I think it turned out okay.

KG: We know Márton so…

GF: Yeah, so I joined Márton in a Big Data systems research group at the university and we got pretty lucky there because they were just starting up this cool new project called Stratosphere Streaming. You probably heard that Stratosphere was the project that later became Flink.

KG: Right, yeah.

GF: Yeah. So it was that point in time when the whole Stratosphere Streaming project was just getting bootstrapped, so it was our group that got this project, so we kind of started digging into streaming systems and researching what’s out there, like Storm, Spark Streaming and all the tooling and tech around it. So that’s basically where it all started.

KG: You know that we’ve had Stephan on the podcast, we’ve had Márton and now yourself. And what’s interesting about that to me is the circumstance that brought together all your brains to create something at that time. It was such an interesting, to me, confluence of timing, where it was the right people at the right time and then the right focus, right on correctness and state management and the things that Flink really brings to the table. I know you guys were thinking about that early on. Can you tell me kinda where your head was at when you especially with Márton and you guys were talking early on?

GF: Yeah, I agree with you. It’s such a lucky thing that everybody who was involved at Flink just was there, right place at the right time.

KG: Right.

GF: Yeah. So basically, we didn’t really know where we were gonna go with the streaming project. All we had at that point was the Stratosphere batch processing capabilities, it was really strong, like runtime base and already had some core features for streaming, but we really had to start building it from the ground up, like the streaming API and the missing capabilities. So basically what we did at first is kinda looked at systems that we knew that worked well for customers already or the companies that were using streaming technologies like Storm. I think initially Storm was the biggest source…

KG: The gold standard, yeah.

GF: Yeah, exactly, the gold standard and the source of all our ideas, basically. So we first tried to copy Storm to some extent, just to get an API that works similar to Storm, but on the Flink runtime and once we had that going on, we started improving on that, making sure that it fit well with other things in Stratosphere and Flink, so that it gives you a nice coherent story. And this was all before we started even to look at the stateful capabilities that came a little bit later, I would say a few months later.

LD: We’ve had a couple of people on the podcast that have worked at places like we had Max Michels from Lyft, and you think about Lyft and you go, “Wow, they have a ton of data, I can only imagine what their streaming system is like.” But you have what may have even more of a cool story around that, which is you were at King and working with their streaming systems and if everybody that I know that plays Candy Crush is anything to you. [chuckle] What do have to say about it?

KG: Leslie is a fan.

LD: That must have been wild and crazy. So talk to us a little bit about that experience, ’cause I’m sure coming from helping create Flink and being one of the early innovators and now to actually putting it into practice is probably super interesting and a really cool experience.

KG: Yeah. The two hits that we keep coming on for topics that people really like our number one, what did you do in production, in a real production, how did that work with Flink? And then the other one is, what are the worst stories? What didn’t work and what totally screwed up, those are the two top things that people talk about a lot.

GF: So maybe before I dive into what happened at King, I’ll just take two steps back and just tell you how I all ended up at King actually.

KG: Sure. That would be interesting.

GF: So basically after I start working on the stratosphere streaming project, which became Flink after some time, I ended up working on Flink development for about two years. In the second year, we started focusing more on the stateful capabilities, the fault tolerance story, and we built the checkpointing algorithm and then the savepoint algorithm that now is one of the most defining features of Flink, and I think it’s a great success story.

KG: Yeah, agreed.

GF: Yeah, so I worked about two years on Flink development in the research context first in Budapest, then later I moved to Stockholm. And during that time when I was in Stockholm, we ended up going over to King. King has one of their bigger European offices in Stockholm. That’s actually one of their main offices. We visited them in 2015 and gave them a presentation together with Stefan and Costas, and we gave them a presentation on Flink because they were also investigating some streaming tech. They really liked what they’d seen, so I actually agreed to join them for an internship to just try to put things into production, just try out how it would work. So that’s where the whole King story began for me.

KG: Gotcha.

GF: So I actually stayed after my internship, which is I think pretty typical. It was really fun for me because up to that point, we heard from customers, we’ve seen the systems, we kind of had a good understanding what is interesting, feature-wise and how it should work, but I was really eager to actually get my hands dirty on some real work problems, and I put this to a real proper test and see if what we’ve built is actually useful or is it just a research project. It was actually really useful. Yes. So at the point where I joined King, they already had some basic real-time systems in place, mostly for game monitoring, so it was I would say a typical monitoring system. They consumed the game events that come from the phones as people play the games, those are the push to Kafka and the streaming system, it was basically a Kafka consumer application, just a monolithic job application, consumed the event streams and did some basic aggregation in memory and push it to MySql database. So that’s what they had in place, and actually that was a super important part of the day-to-day operations because it’s very important for a company like King to be able to immediately see if something is wrong with the games, if the revenue streams change, they have to be able to react within minutes and get on top of the situation.

GF: So this was the point where I joined and basically what they wanted is to improve the current streaming system, make it more extensible so that people can use it for different use cases as well. Data scientists were eager to tap into the real-time event streams and kinda see what’s going on, but the current type monolithic systems in place weren’t really flexible so that they can extend it, so that was the really big problem that we were trying to solve. So we had a great team and we came up with this nice vision of a system where data scientists could deploy streaming applications very easily, and they could get these dashboards up and running within a minute or so. This is where Flink comes into the picture. So first, we built a few pilot applications, but we have quickly seen that it’s not so simple to build streaming applications, you can’t just give Flink to a data scientist and expect them to learn Java and learn streaming systems. I’m sure you had the same experience with your customers as well. It’s not so straightforward, right?

KG: That’s true. Yeah. And Leslie’s only cackling ’cause we talk about this seven times a day, but yeah, it’s an omni-present theme around how do you democratize streaming data and how do you let the entire organization benefit from streaming data? That’s the macro problem, right?

GF: Yeah, exactly. So this was the same exact problem that we had at King. Most data scientists know how to write SQL and maybe some Python, but they’re not really Java programmers, but still they’re interested in getting into the real-time data, so we ended up coming up with a system that it was built on top of Flink, so what we did is to build a stream processing platform inside a Flink job. So I would say that’s the easiest way to put it. I know it sounds a bit strange, but what this means, to have a streaming platform inside a Flink job is that we had this very fancy, complicated Flink application that would consume all the data streams that came from the games, and it would also consume scripts that were written in our custom DSL. They were also streamed to this application. And the application would actually dynamically reconfigure itself to execute all these scripts that were deployed by the data scientists.

KG: And what did they write? Was there a language that they were scripting in, or was it was just a simple DSL?

GF: There was some evolution to the DSL side of it as well. First, we started off with Groovy, because we felt at that point is that Groovy might be the closest thing to Python when you come from a Java side.

KG: Right.

GF: We started with Groovy. And then we kind of moved back to Java, and just built this very nice annotation based DSL, which was easy to program. It was more like a trigger based API so that you could have a method that would receive game start events and could increase increment counters like track user state, then you could react to, I don’t know, purchase events, and do all kinds of things.

KG: That’s cool.

GF: And yeah, and this has become actually quite sophisticated. So it wasn’t just a simple DSL that where you can do simple dashboards, but I think we covered most of the core functionality of Flink with it. You could use state and use timers, create different window aggregates. And yeah, so I think we’ve built a lot of cool things in this platform.

KG: And how did they deploy the dashboard? Was that like a notebook based thing or what was the framework for the actual dashboards?

GF: So the way it worked as the back end of this platform was running on YARN. And then we had a nice UI in front of it, where people could develop their scripts. We had like this script repository so people could look at other people’s scripts and modify it, made their own. And once they deployed something, it came up in a list of running jobs and there if their jobs were producing any aggregates, they could hit click on a dashboard icon which would bring up their graphs. So the data itself was stored and sharded MySQL databases, which is I think, quite okay for most of the companies out there.

KG: Sure, tried and true.

GF: And the data communication was mostly built around Kafka. So if they wanted to produce some output or some ETL, they could also write to Kafka and they could also just click and see it on the UI. So it was a very nice experience and it was good because people didn’t need any distributed systems knowledge there was no operational overhead for the data scientists, was fully managed basically.

KG: Yeah, that’s an interesting point, because and that was a while ago that you were doing that. Our sense of Flink even early on was, “Wow, this is a lot of power, but it’s gonna be hard for every… ” Like I was saying earlier, a huge swath of people in an organization to actually write, because writing a Flink job while writing a… It’s fairly simple, APIs are well documented, they’re easy to use. But ultimately, it’s still streaming data. And it’s not just… Even if you just took a great Java programmer, they’re not instantly a great streaming Java programmer, and there’s things to think about, things to learn, understand watermarking for instance, or something like that. And, APIs and UIs and things that lay on top of Flink that make it more powerful and really like a force multiplier or whatever term you wanna use for getting value out of Flink in our… At least in our minds.

KG: And that’s kind of always been our mindset is like, wow, this is an incredible engine and a ton of brainpower went into it. It’s a hard distributed systems problem, they solved some of the things that were our pet peeves with Storm like state management and checkpointing, fault-tolerance and these things that coming from a database background are just unacceptable not to have. And I think the Stratosphere project even being steeped in its roots in batch, probably led you guys… It sounds like it led you guys down that path of thinking about correctness and thinking about state as a first class citizen. And then here we are, a number of years later, where those are still core principles and super important, but also, hey, not every person in your organization is gonna need to know those details.

GF: Yeah, I agree with you. If you take like just pure Java engineers, even for them, this is… I would say it’s a very difficult paradigm shift to start thinking in terms of streaming systems compared to what they’re used to doing. Maybe in certain organizations where you have I don’t know, hundreds of very good data engineers with background in Big Data systems, you can get away with just running Flink, like vanilla Flink DataStream API. But I think in most other companies where you have a good mix of different expertise, I think you need to get something more high level this is the same experience that we had at King as well.

KG: Interesting. Yeah, ’cause we’ve talked to folks about when they hire engineers to write Flink jobs. They don’t say, “We wanna hire job engineers. And oh, by the way, you’re gonna be writing Flink jobs.” What they say is, “We’re going to hire Flink engineers,” they use that as the, is that “Do you have experience with Flink?” And that’s the most desirable trait. It’s not like, “Hey, you can… Anybody can do this.” I don’t think that’s a detraction of Flink, someone’s reaction might be, “Well, that’s ’cause Flink is hard or whatever.” But I would caution someone in that opinion, because I think distributed systems are hard. Streaming is hard. And because of that, having an API on top of it does… That gives you the controls you need to manipulate it is both powerful and also a responsibility. I think that responsibility is what they mean when they say “Flink engineer.”

GF: Yeah, I think it’s not specific to Flink. I mean Java engineers would equally struggle with any other framework. It’s just because you’re processing data in a certain way, so I think the same applies to Flink, Spark, Storm and of course, the APIs evolved over time, but still, you have to learn the general principles of data processing, and once you know how to write Spark or Storm programs, it’s quite easy to migrate to Flink. But just getting into this in the beginning is quite challenging.

KG: Yeah, yeah, you said it better than I did.

LD: Throughout your experience, whether it was initially starting with Stratosphere and turning it into Flink, at King, and even now with what you’re doing at Cloudera, which we’ll get to in a little bit, but what… As Kenny mentioned we always love… ‘Cause I know we see them every day, but we always love hearing the war stories about, “Oh, this is something that we never envisioned was gonna happen, and then it happened and we had to figure it out,” because that is what… I mean that’s what people, to your point, Flink is… It’s not necessarily hard, but it’s not also the easiest thing out there to do, so people do run into going, “Oh goodness, help me. I’ve run into something… “

KG: Yeah, like the production gotchas.

GF: Yeah.

LD: Exactly. What are some of those learnings that you have, what are some of the war stories, worst stories that maybe would help somebody out there who’s trying to build their system now?

GF: Let’s see. I would say we hit almost all the really bad bugs over time. We used Flink for four years at King, and we basically had one application that we tried to keep going forever, like migrate between Flink versions, change state formats without actually losing state. We actually we hit a couple of really nasty problems over time with memory leaks and state leaks. Yeah, let me just think of my favorite or whatever…

GF: Yeah, I think my… I don’t know it’s weird to say favorite bug but…

LD: But there is something to be said for figuring it out and being like, “Oh, I’ve got this… “

GF: Yeah, exactly.

LD: There is absolutely something to be said for that.

KG: Well, if you’re an engineer, you do have a favorite, you have a favorite one, it’s always amusing or at least a problem to think about, right? So you wanna put your brain on that problem.

GF: Yeah. We had this weird issue that we were using broadcast or union states quite heavily, these are states in a Flink job that are kind of replicated across all operators, and in the job. We were actually using this to store the deployed scripts so that the people actually stored. This was really a small state, actually, just probably a few megabytes worth of data. But one day what we noticed is that after restarting the streaming jobs, they almost immediately crashed with out-of-memory errors, but within seconds after restarting them. We actually had a really hard time debugging this because it crashed everything just so quickly all over the place, all the time. After some hours of hitting our heads, we actually realized that the problem was that we tried to remove a state from a Flink application, we tried to remove it by renaming it to something else. But we didn’t know at that time that you can’t just remove state from Flink. What actually ended up happening is within how these state union or broadcast states were implemented is that it started to replicate itself exponentially. So every time the job was restarted, the state was basically multiplied by the number of operators.

GF: Yeah, so the first two, three times it was fine because from a few megabytes it maybe grew to a few gigs, but after we hit the magic number of restarts, it was impossible to restart the job again because it would just blow up instantly.

KG: I see.

GF: Yeah, these are some of the issues that taught us a lot of good techniques about debugging these streaming systems, and we ended up enhancing like adding a lot of logging all across the stack, and just playing around with some JVM flags to make sure that if task managers or other processes crash, we can always just look at what happened, add in a lot of profiling. Yeah, some other things that were very interesting, we had a lot of problem with state in general, because that was the biggest problem with the streaming system with King, that we had so many users, if we wanted to store any kind of state, and in fact, we wanted to store a lot of state for every user because we wanted to build up a real-time player profile for every player that played. So like just to remember the match history, past purchases, how they did on each level and stuff like that.

GF: So it was quite a bit of state for each user, but the problem was that there were billions of user IDs active at King across the different games. Billions of keys is a lot of state no matter what you store.

KG: Right, right.

LD: Just a little. Yeah.

KG: Yeah, it makes sense.

GF: Yeah. So over time, we had a lot of scalability issues with even state backings like RocksDB, we ended up heavily relying on the Flink metric system, and we had very, very detailed metrics in every part of our streaming pipelines, so that we have good visibility on what’s happening in the production backend. I think that the metric system is not very emphasized feature when people talk about streaming application development, but I think it’s one of the most important features, when we talk about moving a POC to production that people integrate with the metric system and set up alerting, and just get a good understanding of how this job works on an operator level.

KG: Do you mean, stuff that’s emitted via JMX or do you mean something more?

GF: No, I mean the stuff emitted by JMX is of course always useful, but what I’m talking about here is the FLink metric system, so the custom Flink metrics. That you can add counters, gauges, histograms, it’s a very rich API for defining custom metrics, and you can hook it up to systems like Grafana or TSDB very easily.

KG: Right.

GF: Yeah. I think most of organizations have the infrastructure to make good use of this.

KG: Right, right. Good point.

LD: You were at King, and now you are at Cloudera, working on their FLink product, which is when we had Márton on the podcast, it was right around the last Flink Forward, and Kenny and I said we love seeing Cloudera throw their hat, and throw their weight behind Flink because obviously, we really love it as a technology. So it’s great to see that as well. So talk to us a little bit about your transition over to Cloudera and what you’re working on there, and what you’ve been hearing from those customers around Flink as well.

GF: I joined Cloudera last year. It was a great opportunity to be able to reunite with Márton.

KG: Was the beer involved this time or…

LD: Yeah.

GF: No, this was over the Internet, a kind of reunion. But anyways it was… I think it was a great opportunity that we got here, so we took our chances, so here we are working at Cloudera. So, so far I think we’ve done a lot of pretty nice work, as far as I can tell from the inside. We had two, three releases already, some on the data center and some on the public cloud. So as far as our customers, I think a lot of Cloudera customers are super excited about Flink, especially customers who have used other streaming technologies and also customers who want to enter the streaming space, so to say. Yeah. So there’s always a lot of buzz around Flink, at least in the last couple of years. I’m sure you see the same patterns that everybody is talking about Flink, this is the cool streaming system now, so this is the same story that we hear from the customers.

KG: And then Cloudera just went preview or what’s the state of the offering? Give us a little… You can do a little plug here, tell us where Cloudera is at?

GF: We have two proper releases on the data center, and we just came out with a preview release on public cloud. I think that’s really interesting for many of our users because now they can easily bootstrap Flink cluster on top of VR. And just with a few clicks and within 30 minutes or so, they get a nice fully-functional cluster that is integrated with all the other services that Cloudera has to offer. So it’s really easy to get started with that.

KG: One of the things that when I step back and think about the… Kind of where we’re going as an industry and as a community, one of the things that’s come back into the fold has been the idea that, “Hey, streaming is amazing, but streaming plus batch, being able to enrich streams, being able to join with legacy sources of data temporarly… ” Those types of workloads are becoming more interesting, I think. And making more value in companies because that data is locked away in some enterprise database, and how do I join it with the very latest, I don’t know, click stream or something. And Cloudera must have an opinion on this, because Cloudera obviously, being a batch-oriented company for so long and having such deep roots there and now adding the data in motion in a big way, maybe you can talk a little bit about that. I think that might be interesting for folks to hear about.

GF: Yeah, definitely. I think this unification or bringing together of batch and streaming is… I think it’s one of the core points in Cloudera’s data strategy when it comes to the streaming systems, because they have so many customers who are heavy batch users already. And when they want to go and start using streaming systems, it’s almost always in the context of some application that they had previously had running on batch, so it’s very likely that they want to integrate their new streaming applications with what they already have. Maybe for data enrichment, maybe for whatever else, but it’s something that comes up all the time.

KG: Yeah.

GF: So, yeah. So I think Cloudera is all in on that vision as well. And really this has been one of Flink’s core visions over the years, that yeah, this is an engine that brings streaming and batch together. And I think even since the beginning Flink was promoting that, yeah, it unifies both batch and streaming. Which wasn’t really true in the early years, because the their APIs were just so disconnected. And I think it really started to happen with the table API and SQL, this actual unification. And as we go more and more, today, with the Blink Planner or the merge to Apache, this nice vision of having this unified system actually working well and being efficient, is something that is starting to actually happen, so it’s super nice to see that.

KG: Yeah, I talked about that in the last podcast, about 1.9 and the Blink Planner coming into the fold, and the unification around the Table API. Exactly, I think that’s one of the more exciting things that’s happened in recent times, for sure.

LD: Well, we’ve talked a lot about streaming, over the last five or six years in leading up to this point. But Gyula, what has got you most excited about streaming, moving forward? I go back to what I said at beginning, you have, I find, to be an incredibly interesting experience with this, coming from helping develop it, to working with it so deeply in production, and now, working with the vendor that’s helping customers use it. That, I think, brings a breath of experience that’s really cool, so with that, kind of what has you excited for the next two, three, four, five years, about what’s happening in the streaming industry?

GF: I’m most excited about fully, or at least partly managed streaming platforms, streaming solutions, where people can kind of get away from the operational complexity of running applications, but focus on development. Because I think the APIs have gotten to a point where it’s very nicely developed, and we have a wide variety of APIs available for application developers to build streaming applications, but the operational complexity is still, I think, a bit too high. So, I’m very excited to see all the managed solutions come, and coming up from Amazon Web Services, from Ververica. And this is also where Cloudera is going to provide a nice managed experience that customers can have in the cloud, or in their data center, even. And this also brings me to Stream SQL, because this is a big part of the story. I’m not claiming that that’s the only way to run a managed streaming service, but I think if we focus on Stream SQL, that actually makes it a bit easier, from at least the back-end perspective, to run a service.

KG: Yeah, obviously, we agree with the SQL story overall. And it’s interesting, we were talking to Jeff Bean from Confluent, and they have a similar vision. SQL is a very important part of this ecosystem, going forward, and it’s early days, we’re just getting started, and there’s a ton to be done. And the thing I continually bring up is like, schema evolution, schema management, versioning… Unstructured data is important in streams, it’s not like the Oracle database of old. Unstructured data plays a key role, and we see… I don’t know, 90% or some high percentage of data being unstructured, mostly JSON. And building schemes around that, dealing with lineage, democratizing that schema, and providing tools to aggregate and filter, and do all those things that one would do with a streaming system, is still challenging around that, and SQL is a big part of that, but hey, SQL requires a schema. And so, that impedance mismatch, or that designing something elegant around handling that situation for folks, I think is an area that the community… And for sure, we are working on, and I know others, is a big part of going forward, as well.

GF: Yeah, there’s a lot of unsolved problems with Stream SQL, as you said. Schema evolution is one problem, and the other is evolving streaming pipelines, how they change over time. One of the key benefits of Flink, is that you can… If you write the data stream program, you can carry the state over to new versions of your application, you can change the whole pipeline and still keep the state. And this is something that has to reach SQL at one point, if it’s going to reach widespread production, or at least it has to reach SQL, if it’s going to be used for everything. To be honest, for a number of years, I’ve been trying to avoid touching Flink SQL, maybe because I didn’t have too much SQL experience to begin with. But then, I came around last year and started to play around with it, and I actually found that it’s really nice, and it’s very powerful, and this is where the technology is going. I think, in a few years, probably the Table API and SQL is gonna be the most important piece of the API.

KG: Yeah, we feel that way too. Erik, my co-founder Erik and I, when we started the company, we were using PipelineDB, and those guys were subsequently buy Conflict. It was a good product, it was built on Postgres, and we loved it, because we could do a group by… Not a super scalable group by, mind you, but we could do a group by, in SQL, against a stream, right? It would integrate with Kafka, and we could just do a group by, and then materialize that output at something. And those two core design principles, being able to do aggregation in SQL language, a declarative language, and then be able to do something with those results, materialized into a table real quick, for a app of some sort, like a map or whatever, was like the “a-ha moment” for us.

KG: And we were like, “This is so cool! We gotta do more of this!” I mean, of course, PipelineDB doesn’t scale very well, but it was very cool technology, and those guys… Very smart guys, did a good job. I don’t wanna take anything away from PipelineDB, but obviously, that led us to a journey to find Flink, and then we’re like, “Okay, this is the thing!” So, that’s how we kind of had our evolution, and it was always in our minds, it was always like, aggregation is important. And then, stream processing is great, in an ETL sense, but what if you’re trying to just take that data and then do something like that. Like you said earlier, the data scientist wants to build a model, or even do a dashboard, or do something in Pandas, right? They need some way to actually grab that snapshot of data and then use it in their project, and SQL is such a great way to do that.

GF: Yeah, a good UI, or a user-friendly API like SQL, goes a long way to get people addicted on the streaming systems. We have the same experience with King when we built our DSL, where it was just like, “Yeah, on a game start, group by, or current level, increment this counter on a, like, one minute window,” and that’s it. Like, fire away, dashboard.

KG: Right. But against billion of user IDs, which is the fascinating part.

GF: Yeah, exactly, and once people can just click on the dashboard, in a few seconds after they started their job, and they can actually see the data, visually, they’re hooked. It’s no going back.

KG: Yeah. Right. Good way to put it.

LD: Well, Gyula, thank you so much for joining us today. I enjoyed the conversation, and I’m sure Kenny did as well. We hope you did also, it’s been super interesting, I’m sure our listeners will also enjoy it. So, thank you for taking the time, we appreciate it.

GF: Thank you very much for the opportunity. It was fun to talk with you guys.

KG: Yeah, Gyula, great chat. Thank you.

LD: And there’s not much I enjoy more than hearing some good battle tales about building and managing production streaming systems, and Gyula’s certainly had some good ones to share. You can always check out more of his talks, either at Flink Forward or other conferences, on YouTube, and hear more about those stories, and all the other ways that he’s worked with Flink. And head on over to Cloudera, to see more about what he’s working on now. Or, if you wanna learn more about the Eventador Platform, with runtime for Flink and SQLStreamBuilder, you can head on over to eventador.io, or reach out to us at hello@eventador.io. Happy streaming.

Leave a Reply

Your email address will not be published. Required fields are marked *