Podcast: The Fun of Stateful Functions and Apache Flink with Stephan Ewen

June 23, 2020 in Eventador Streams Podcast



Podcast: The Fun of Stateful Functions and Apache Flink with Stephan Ewen
Eventador Streams Podcast: The Fun of Stateful Functions and Apache Flink with Stephan Ewen

In this episode of the Eventador Streams podcast, Kenny and I chatted with Stephan Ewen, Co-founder & CTO of Ververica, about the beginnings and growth of Apache Flink as well as the inception and future of Stateful Functions.

Hear more about Flink’s statefulness, the need for and ability to process batch and streaming data concurrently and the development of Stateful Functions in this episode of Eventador Streams.

Eventador Streams · The Fun of Stateful Functions and Apache Flink with special guest Stephan Ewen

Want to make sure you never miss an episode? You can follow Eventador Streams on a variety of platforms including Soundcloud, Apple Podcasts, Google Play, Spotify, and more. Happy listening!

Episode 10 Transcript: The Fun of Stateful Functions & Apache Flink with Stephan Ewen

Leslie Denson: Today, we take our discussions about Apache Flink one step further with an opportunity to dive into not only another invaluable perspective of its history, but also with an intro to its newest API, stateful functions. With Flink co-creator and the Ververica CTO Stephan Ewen, on this episode of the Eventador Streams. A podcast about all things streaming data.

LD: Hey everybody, welcome back to another episode of Eventador Streams. Today, Kenny and I are joined by a person who really needs no introduction to this audience, but we will introduce him anyway. Stephan Ewan of Ververica, and also one of the original creators of Apache Flink is on the line with us. Stephan, how are you?

Stephan Ewan: Hi, thanks for having me. I’m well, how are you

LD: Doing well. Doing well. Thank you so much for joining us today.

Kenny Gorman: Excited to have you.

SE: Thank you very much. I’m excited to be here too.

LD: Awesome. Super excited. Our listeners obviously are very interested in Flink. A lot of people who are using it, thinking about using it, and would be really excited to hear from you.

LD: Why don’t you tell us a little bit about yourself, kind of your history, how you got started with, what was then I believe called, if I remember correctly Stratosphere into Apache Flink? I know that’s a very broad topic but let’s hit some of the high points and then we can dive in from there.

SE: The whole thing started probably, I think you can go back pretty far, actually more than 10 years, if you wish. So during my studies, I had worked a lot on database technology. So most of my internships were in that area.

SE: I started doing a PhD in the University in Berlin and my advisor basically, he also came from database background, it was the era when Hadoop just started to become big. The topic really broadly was let’s try and look at something that built a hybrid of map reduce and database technology. Kind of that’s how we started out. That was the origin of the Stratosphere project.

SE: You could actually see that a lot of the stuff that we developed that day parts of what is still in Flink actually are very much inspired by all the database work we did before. The concepts for memory management, which still forme the basis of a lot of the batch processing operations.

SE: If you look at the data set API and Flink, it has the concept of a lightweight optimizer and so on in there. That’s all very, very database technology style. That was our research project in the university, basically trying to figure out how these things could go together well.

SE: After this all was done, we had this interesting system that could do some impressive use cases. The system Stratosphere, I mean, there was obviously nowhere near production usable also because it was a university research prototype. But it was just too interesting to actually just drop in and move on.

SE: We thought let’s build something real out of it. We founded a company, tried to get some startup seed funding and see where we could take it. The initial use cases we actually had in mind were very much like batch processing use cases. A bit of a machine learning, graph processing, and so on. But a lot of batch data analytics actually, because that was what everybody was doing back then.

SE: I think like a few months into this, we started discovering the stream processing use cases. Partly through actually this collaboration also with Martin and Ula, who we had worked with already together at the university. So they were looking into stream processing use cases. Interestingly, Stratosphere used to be a decent match for that actually, because the underlying engine used to do a lot of streaming other than … It used to do, maybe we should say pipelining, in contrast to other data processors.

SE: So it actually was interestingly, it was a surprisingly well matching foundation to try and do streaming data analytics. Had some very interesting properties that you also couldn’t get from systems like Storm even these days. I mean, there was obviously a lot missing, it didn’t have any idea of faultb tolerance for streaming, no check pointing and so on.

SE: But after we saw that this is a surprisingly good match, this technology that we’ve built there, we should actually put more emphasis on this. I think that’s how it came to be then. Once we saw that, that there’s really a niche that is not really yet conquered, that’s exactly what you’re looking for as a startup. A really good stream processor didn’t exist.

KG: That’s interesting about checkpointing and the database foundation for that. Because that was one of the things that when we looked at it, obviously being from a database background, that computed for us. We thought, oh, checkpoints. It looked more like a recovery log for a traditional database than it looked like a stream processing system at that point.

KG: We thought, oh, I totally get this and I understand why checkpoints are important. Boy, wouldn’t it be cool if we could savepoint too? Oh, you can. That kind of thing. So interesting to hear kind of the history behind those bits as is being steeped in database engineering.

SE: I would say 50/50, to be honest. I mean, this notion of checkpoints as a way to store the materialized view over the transaction log and then being able to truncate the transaction log, that’s a pretty close analogy for what Flink is doing. If you think of opposites in Kafka. Being the transaction log is the input and then Flink computes the tables and so on over that and then it checkpoints. So it can acknowledge back and discard the data from the transaction log. That what it does.

SE: Interestingly, that that was not the way we first came up it as idea. The way we actually came up with this idea originally was, or at least from my side, was very much inspired by the work on another system. I was actually studying, doing my PhD. It’s a machine learning system back then. It used to be kind of famous. I don’t think it really exists much anymore. It was called Graph Lab, Distributed Graph Lab, I think was a Berkeley Research project.

SE: So they were doing distributed machine learning based on the names as graph representations of the data. They could do some very interesting things. They were very, very fast. There was one paper where they said, “Okay, we’ll try to do distributed for tolerance here, and we’re picking Chandy–Lamport snapshots to implement that.”

SE: I mean, they came to the conclusion that restarts are faster than for tolerance for them in their project. But that was the first time I actually looked at this like Chandy–Lamport snapshots for systems that actually do asynchronous messaging. I was trying to actually implement a version of iterations in Flink’s batch API that was doing completely asynchronous processing. Because that’s actually quite efficient for certain algorithms.

SE: We’re trying to look at ways to get that full tolerant recoverable. Or even to figure out when such a distributed asynchronous algorithm has terminated. I was trying to apply the snapshot algorithm, Chandy–Lamport snapshots to that problem. Then when we had this discussion about how could we actually make steam processing fault tolerant? That that was the first thing that I was reminded of.

SE: I had just been looking at this interesting approach to distributed asynchronous iterations. Couldn’t that actually be applied here as well for streaming fault tolerance? Then later we actually found it’s actually, it’s a really good match because it’s exactly that compaction of transaction log and materialized data. That’s how it came together from these two dimensions, I would say.

KG: That’s very cool. Especially early on, it was one of the massively distinguishing features of Flink. It really worked pretty well even early on, especially in a production sense. We were ingesting Twitter feeds and we were doing that at scale. For whatever reason, mostly our fault, we kept having problems. The recover-ability and restart-ability of Flink was super important for us at that point.

KG: Just because we’d get behind so bad. Just being able to kind of catch up and get back to the processing task at hand, I mean, that’s something we couldn’t really seem to get working on Storm the same way. So that was an early epiphany around Flink, at least for us. Interesting that we’re understanding kind of where it came from.

SE: I think there’s something beautiful about that idea of take taking these asynchronous snapshots, doing this lightweight way of, if you wish, observing the data as it flows. Then understanding at the right point in time when to just draw a consistent background copy of the state.

SE: It’s a surprisingly simple thing once you understand it. There’s a few elements you need to grok. But once you have that, it’s a surprisingly straightforward way to think about it. I mean, of course, it has its subtleties in the implementation here and there. You need to make sure that you call into RocksDB the right way to not add or lose one record here or there. But there’s been very few foundational issues around that algorithm. Actually, we’ve been very happy with the choice as well.

KG: RocksDB has been great too. I mean, the notion that RocksDB kind of went through this rise, that even went back, I want to say … I don’t know, even 2010, 11, 12. Maybe my memory is mis-serving me, but RocksDB was starting to kind of come up then. I remember in the Mongo world, RocksDB Engine was kind of becoming a thing. The folks at Facebook were-

KG: Yeah, exactly. It just became this very, very good multipurpose, very simplistic API and reliable. It seemed to fit into a lot of use cases like that.

SE: Exactly. It is a surprisingly good match for this checkpointing algorithm. Because what actually RocksDB does underneath the hood is it’s actually very similar. It has its transaction log that it appends to. Then it has the mem table, which is basically a buffer for the materialized representation, which is then persists and then truncates the log again.

SE: So on the Flink side, we were basically, we don’t need the log because the surrounding side basically replaces that. But there’s this notion of materializing basically snapshot views of the data up to a certain point. Which is this notion that RocksDB has, where it’s a concept of this log structured merge tree.

KG: Right. I was just going to say LSM trees. Exactly.

SE: Yeah. It’s a property of these LSM trees. It just fits very well with a checkpoint algorithm. This notions of snapshots and checkpoints that Flink has, like RocksDB has almost a matching equivalent of both of them.

KG: That came from Level DB, I think, right? Way back when.

SE: Exactly. Yeah.

KG: Interesting.

LD: Let’s dive in a little bit on Flink as it is today with some of the cool things that I know that you guys have both in 1.10, Which is latest stable release. But I know 1.11 Is coming up.

LD: There’s a lot of really great work, especially in the last few years into these releases. With some really cool features and API updates and all of those things. So talk to us a little bit about Flink as it stands today and some of the things that you’re excited about with it.

SE: I could even take a step back and maybe look a bit at what were the trends that Flink was going through in each year. I think for a while after the whole streaming concept had been introduced, after the first version of Checkpoints had been introduced, there’s a lot of focus on just actually making streaming use cases work and build the right API tools and so on around it.

SE: So that was when we incorporated a lot of the concepts from data flow, like event time, watermarks, different types of state timers, asynchronous process functions, and so on. Just the tools that you need to develop most of the use cases. At some point there was a shift more towards operational aspects, metrics, security. State evolution took actually quite a bit of time.

SE: So it still is something that adds overhead in each release. Like making sure we keep snapshots of most of the savepoint and operator state formats from previous releases. We have tests to see that we can resume them and new releases so that there’s an upgrade forward always. That became a big focus.

SE: Then at some point, I think when we’re just about hitting this point, when we thought, okay, now we actually have streaming more or less stable here figured out. We have the most important things in place. Then came the point when, when Alibaba acquired a company back then did Artisans. What they actually brought in was this massive work on the batch processing and SQL side. So Flink was also doing SQL before, but not in the same capacity and also way more focused on the streaming side.

SE: Alibaba really brought in a lot of stuff on both the batch processing side and also just vastly expanded the scope of streaming SQL. That actually, I would say was a very big part of the 2019 work. There was all this amazing technology, but it was in a fork. Trying to make this available to the public by contributing to the open source.

SE: It basically resulted in rewriting big parts of the scheduler and rewriting big parts of the table API and adding a completely new SQL query engine. I think this part is, you can almost say it’s only now even complete in the 1.11 release. I think the 1.11 release finally has a blank SQL query engine as the default one. It’s concluded the entire type inference system, has all the catalog integrations. It has the data format integrations that we want it to have.

SE: It has this notion of being able to consume change data, capture streams, and all of that on the SQL side. So that was a very big part that I would say was fairly dominant throughout 2019. Then I think in the end of 2019, when we actually saw that certain parts of the community were pretty much done with the batch processing work, we also went back to the stream processing side of things.

SE: One part of this, as you can say, the stateful functions work. The other part of that is basically the next level of, or next level of streaming full tolerance, which we started to work on. You can see the first part in the 1.11 release in the form of underline checkpoints. Which I think on the streaming side, it might be the biggest change since we introduced actually the whole idea of checkpoints.

KG: Stephan, when you guys were acquired, did you kind of see the train coming? Were you just okay, so I can see that the next step is going to be integrating the fork from Alibaba. How much work was that? You just touched on it briefly. It sounds like that was a lot of work, especially around SQL. Was that something you guys kind of knew going in when you were discussing things with them? Were you like, oh boy, this is going to be a lot of work, but it should be. Was that something that you guys were excited about?

SE: I mean, excited about definitely. It was, I think one of the reasons for deciding to do it. It’s not an easy decision to think about selling it. But the idea of saying, okay, this is really a way to strengthen the whole open source project to give it access to all this work that has been done, to gain a lot more corporate support for it. That actually helped making this decision, definitely big time.

SE: I honestly, like always, you underestimate the effort that these things take. Initially we thought, sure, we built an abstraction here. We can change this query engine with that query engine and then you just plug it in.

KG: It’ll be easy.

SE: Of course it always turns out quite different. You could actually see that a lot of stuff in the blink fork was written, let’s say particularly for this specific environment that Alibaba Cloud and their internal production set ups have. They just have certain assumptions they could make that allowed to make certain opinionated choices in the implementation. That you can’t just usually make when you’re developing an open source system that’s supposed to run on a 100 different setups and so on.

KG: It wasn’t abstract enough to be used by the general public. It was specific to their use cases for the most part, is what you’re saying?

SE: Some of the stuff, yeah. So there was a lot of also redeveloping of features, not just merging features. That part also took a big amount of time.

KG: They’ve been a great sponsor of it so far. I mean, as far as we can tell, it looks like they’ve really doubled down on Flink and the community is thriving and the contributions are up. It seems like really that was a big influx of energy into the project when the acquisition happened and kind of their continued support.

SE: I think that’s true. Yeah. So it did turn out well all in all, I think we can say that in hindsight. Yeah.

KG: That’s cool.

SE: Absolutely.

LD: Something that I know we want to dive into on this is stateful functions. We’ve had the wonderful opportunity to talk a lot about Flink with some of the guests that we’ve had on the podcast. But we haven’t yet had somebody that can really talk about stateful functions, what they are? Why they were developed? And kind of what the trajectory for those is going to be?

KG: Yeah. I think the audience would love to hear where your head was at when you decided that stateful functions needed to be a thing. Compare and contrast that with Flink and help us, take us with you through that period where your thinking evolved, and you finally ended up saying, we got to go build stateful functions. What does that look like from your perspective?

SE: I think, like all things, it wasn’t a super straight path. It wasn’t a master plan that we hatched out for two years. It all started with this realization that there’s a big overlap between stream processing for the way stream processing works near land data processing. And what reactive event driven applications do. There’s quite a bit of commonality between those two.

SE: In some sense, I always thought stream processing can think about it as a very extreme form of reactive processing or so our reactive applications. I think you can see this in different parts of the ecosystem. You can see this in the fact that for example, the archives and actor system for event driven application also layers the streaming API on top of that. You can see that from the fact that there are actually users that used Flink to power microservices that backup social network and so on.

SE: So you can kind of see on one side, event driven application started with stream processing. The stream processors try to be the back ends for events with Microsoft. There’s obviously something there. These two ecosystems have an overlap.

SE: In some sense, stateful functions was an attempt to try and build something for this place in between the two. Or maybe even as a way to make stream processing technology better accessible for the more broad spectrum of event driven applications. So that was kind of the original thinking.

SE: What it turned out then in the end is it’s a bit of a weird call for us to try to change two fundamental assumptions that are pretty baked into stream processing. Not only in Flink, but that don’t always have to be like that, in my opinion. It’s a project where we’re trying, what happens actually, if you try to relax those two assumptions?

SE: The first one being that stream processing is this predefined data flow graph. This idea that basically is, in Storm you called it the topology. In Flink, you call it the job graph. I don’t actually know what Kafka streams are or Spot call it. But they all have the same idea of this directed acyclic graph that represents the data flow to some extent.

SE: There’s lots of good reasons to do that. It kind of defines a documented flow of the data. The acyclic nature is actually really important for a simple watermark model and so on. But it also sometimes gets in the way. I think it gets especially in the way, once you start to develop these use cases that are more like dynamic event driven services.

SE: So we wanted to actually look at, explore an API that’s a lot more dynamic, no predefined dag, more dynamic messaging. Also think of it as being able for two parallel instances of a process function to talk to each other, not just send events downstream, but send events right and left as well.

SE: So that was one part. That was basically stateful functions 1.0. When we developed that we tried to relax that restriction. It bears a lot of similarities then to two actors because that’s a powerful model for event driven microservices. At the same time, it adds quite a bit on top, namely this very powerful state maintenance or general powerful state consistency into it around dealing with state that stream processing and Flink in particular offers.

KG: Do you see that as kind of colliding with Lambda from AWS? Tell me about the overlap there?

SE: So I would say the first version of stateful functions didn’t really have much overlap or even much intersection with that. It was stateful, unlike Lambda, but it was not very dynamic, unlike Lambda.

KG: Right, okay. Got it.

SE: Almost the exact opposite of it. The interesting part came I think with stateful function 2.0. Which was where the thought was, okay, let’s try to relax this compute and state co-location, which is physically co-locating that putting that in the same process. Which has been one of the absolutely core ideas of stream processing.

SE: It’s basically the reason why stream processing is consistent. You have a state co-located with computer, you have the single writer abstraction. All state updates go strictly together with the computation on that particular bit of state, that particular key and zone.

SE: Because they’re both in the same process, some cases they’re just like an embedded in memory hash table. That’s also how you get really high performance. But of course this comes at a price and the price is that you have now to manage both of them together. I think state is, per definition, not very elastic. You sometimes don’t see this if the state is in the database. But that database per se, is not terribly elastic. It’s much, much harder to make that elastic than let’s say a compute layer.

SE: Lambda is a perfect example of an extremely well built, very elastic state layer. So what we thought is, can we actually try to relax that assumption that they strictly have to be in the same process? Just play together with these technologies, like Lambda, that have built this highly elastic compute layer also with really cool operational features. That’s what stateful function 2.0 Is basically. It’s an approach to kind of desegregate the stream processor into a stateless, super elastic, compute part and stateful, less elastic, storage part.

KG: Production implementation, in your mind, is that … Kubernetes and containers are really a big part of that now. It sounds like that it’s really designed to have sort of that super distributed microservices kind of feel to it. Is that true? Is that kind of where your guys’ heads are at in terms of deployment and production readiness? And how someone would actually use it in real life, stateful functions in real life?

SE: Yeah. I think there are different ways to do that. I mean, one way is definitely, I mean, deploy it for example, on something like EKS. Deploy the Flink parton something like EKS, on that container engine and run your computer Lambda. Then you don’t have to manage much yourself.

SE: If you want to, like on the Kubernetes side, you would build different deployments for the Flink versus that on the state. Which basically replaced the database processes in this case. Then maybe different deployments for the different services.

SE: You can think of this as a way to go super micro services, but I actually kind of have a slightly different philosophy in my head there. I think it’s actually an interesting way to get out of the microservice hell, to be honest.

SE: Because you can get into a tricky situation if you overdo it with microservices. I think this is a sentiment that it’s bit by bit getting realized. I think microservices have reached across the peak of the hype cycle. There’s a lot of folks these days talking also about how very microservice-y infrastructure gets completely unwieldy over time.

KG: Everything’s a snowflake. Everything’s a unique snowflake with its own run book. It’s impossible to-

SE: Yeah, something like that.

KG: Right.

SE: Yeah. I think this stateful function business is actually an interesting piece in the middle. Because you don’t have things that are completely decoupled so that you have to completely manage them independently. You have to worry about all these like different protocols and assumptions. If something happens, then it’s a big effort tracing what happens where.

SE: But there’s a more coherent philosophy and model between how these different functions from different modules interact with each other. But at the same time, it’s still a quite dynamic and it’s still quite good at separating different parts. So different teams can write and deploy different models and so on. I think it’s an interesting player trying to find-

KG: I do too.

SE: Find something between the distributed monolith and the microservice is something that could actually help with that.

KG: Semi standardized, discrete data processor. To me, if you wanted to frame it that way, at least that’s how it is in my head. Has interesting ramifications, especially for things like ML. Being able to deploy a kind of a standardized … Because that vertical, the data science vertical’s, hurting in a lot of ways around standardization and production-alization and things like that.

KG: Do you guys have machine learning use cases in your head for stateful functions? Is that an area you think is kind of a match made in heaven? Or where’s your head at there? What are your thoughts there?

SE: I think this is one of the use cases where it’s a pretty good start. I think where it can bring something to the table. The demo we showed at Flink Forward, for example, was exactly around that use case. It’s not on the machine learning training side. I mean the training side very much, I think for that, you do want different abstractions. You do want more high level pipelines, APIs, and so on.

SE: But I think where this is actually quite powerful and where we see a lot of good use, is the side of serving and applying the model then. Where an event comes in, you need to enrich the event with different statistics. You also want to update your distributed statistics based on that particular event. You want to hold certain aggregates together to form a feature vector.

SE: Then you want to actually pipe it through one or more models, which typically reside after or behind different services and so on. Then compute output based on that. I think for that, it’s actually really convenient.

KG: Yeah. I could see that for sure. Interesting.

SE: Yeah. I think one part that adds to that is something I didn’t mention before. I mean this desegregated architecture, what it actually gives you inherently is this multi-language portability. As soon as a state in computers doesn’t run in the same process anymore, but they talk through a standardized protocol, then you can pretty much use it with any language you want. It’s the same thing.

SE: Nobody ever talked about my SQL being multi language also. So you can talk SQL to it from Python, from Go, from Java, from whatever you want. That’s the same philosophy we were actually looking for in stateful functions. So the SDKs that we’re building are really more meant as easy helpers or client libraries. I don’t know, JDBC driver libraries. Rather than something big and complex that is an absolute must have for every language.

KG: Yeah. Yeah. That makes tons of sense.

LD: So talk to me a little bit about, and I always ask this and I always love the answers I get, but what is it that you are looking forward to next? Knowing what’s coming up with Flink, knowing what you guys have with stateful functions, or even knowing what other technologies are out there.

LD: We’ve talked about meaning on the podcast, we’ve talked about a lot of different things out there. What, Stephan, is it that you are excited about? That is either you know it’s coming down the pipe? Or is a future state that you were just like, “We’ve got to get there. I’m really excited to get to that point.”

SE: There’s a few things. I mean, in some sense, when we started out with Flink and we started adopting this concept from the data flow model, I always had it in my head that wouldn’t it be cool to actually build a truly unified system there? That has this unified batch streaming API, it has this really strong streaming runtime. At the same time, a really competitive batch run time. We had all these ideas how to do that.

SE: I think still to that day, there’s not a system out there that really does it. There’s some systems that go for unified API, like Beam, but they do need different systems underneath the hood. I think even if you run Beam or Google Cloud, you actually have different systems for batch and streaming.

SE: I think there’s no open, real stream processor that can actually do the critical streaming stuff that’s also really good batch processor. I always had that in my head that it would be cool to actually build that. I think we’re getting damn close to doing that. The streaming side, I think has gotten very strong. There are a few things still to do.

SE: Actually, I’m very excited that we finally get unaligned checkpoints in there. It was one of the last Achilles heels that I wanted to solve in Flink, or get rid of. There’s been a lot of work to actually do the runtime such that it can really excel at batch processing as well. The SQL side has actually proposed a really great unified model. There’s actually more work coming up in even trying to have gradual semantics between batch and streaming. You don’t choose whether this is streaming SQL or batch SQL.

SE: It designate it as a SQL query and I’m only telling you how I’m interested in my results. Do I want results all at the end? Do I want results incrementally record by record? Do I want results every 10 minutes? Do I want results at watermark? And so on.

SE: On the data stream API, we’re also finally starting to actually change it such that we can get rid of the data stream API. Now hope that by the end of the year, we’ll actually have come to a point where we can say this vision that we set out for maybe five years ago, we finally made it. This is now a system that has really achieved that.

SE: It has on the SQL layer and on the data stream API layer. It has true batch streaming unification. The engine underneath the hood is on the streaming side, very efficient, resilient, low latency. It achieves basically best in class streaming. At the same time, it’s actually competitive on the batch side. So it’s getting within reach there going. Getting to be excited.

KG: That’s an interesting one that you brought up because I think that the more we go forward into this as data practitioners, the more batch and streaming come together. I think early on it was, oh, streaming’s completely different.

SE: Yeah. But it’s not, right?

KG: Right. But now we start to see that streaming is made stronger by being able to maybe enrich something with a static data source. Or in real life, people probably should stream something, but it’s in a database because that’s just the way the project was implemented for now.

KG: Or maybe that’s a continuum of going from batch to streaming. They want to stream the data from Kafka or something like this. Or Pulsar or something later on. But right now, hey, it’s in a database. Can I still make use of Flink? Can we still build our systems on it? The answer is, yes, you can.

KG: I think that’s been one of the things that maybe if there was something that Flink didn’t tout enough or put more marketing dollars or energy around, it would have been like, hey, you can do batch and streaming and that’s a really big deal.

SE: Yeah. That’s actually true. These things actually, they go together more and more, the longer you actually work with them. It’s a big deal also, not just because it’s beautiful from a technology perspective. But building your ML feature pipeline once, and then being able to run this over your historic data using the exact same thing for your near-line pipeline. It’s extremely powerful.

SE: It doesn’t only save you tons of time, it also makes actually sure that both things do the same. Which is much harder to achieve in practice than it sounds initially. That definitely is a big deal. I agree.

LD: Well, Stephan, thank you so much for joining us today. This has been, as we said, a fantastic conversation. We obviously are huge fans of Flink. Our listeners are huge-

SE: Thank you.

LD: Yeah. Our listeners are huge fans of Flink. We are loving getting … because like I said, at this point we’ve talked to Marton. We’ve talked to Max Michels, and now you. We’ve had this really great kind of history lesson behind Flink. This has been really awesome.

KG: That’s for sure. That’s for sure.

LD: It’s been fantastic. So thank you so much for joining us. We appreciate it.

SE: Yeah. Thanks for having me. It was an interesting trip down memory lane. Or how do you say it?

KG: That’s it. You got it.

LD: That’s exactly right. Yeah. You got it. All right.

KG: Cool.

SE: All right. Thank you so much.

LD: Big thanks to Stephan for joining us today and we can’t wait to him back on the show to talk about flink, stateful functions and all the other great things this community has going on. Which, if you’re interested in learning more about stateful functions, check out Stephan’s keynote from the spring Flink Forward on youtube, it’s a great talk. Really – all the talks at Flink Forward are fantastic – and the call for papers for this fall’s even ends June 28th, so don’t forget to get your session proposals in!

And as always, if you’re interested in learning more about eventador, you can find us at eventador.io or get started today with a 14-day free trial at eventador.cloud/register. Happy streaming!

Leave a Reply

Your email address will not be published. Required fields are marked *