Podcast: Building a Streaming Data Engine for Killer Apps with Erik Beebe

May 13, 2020 in Eventador Streams Podcast



Podcast: Building a Streaming Data Engine for Killer Apps with Erik Beebe
Podcast: Building a Streaming Data Engine for Killer Apps with Erik Beebe

In this episode of the Eventador Streams podcast and to celebrate Eventador’s fourth birthday (yay!), Kenny and I are joined by Erik Beebe to take a walk down memory lane discussing how and why Eventador came to be, and how the company has evolved to the point it’s at today with the Eventador Platform. No topic was off-limits—we chat about how the early years of Kenny and Erik (oh so long ago at eBay) and how that started a journey that led us to where we all are today.

Learn more about how and why the platform is architected the way it is—Apache Flink vs Apache Storm, the Kubernetes conversation, and just why building a Continuous SQL and materialized view engine for streaming data was important to the company vision from the very beginning in this episode of the Eventador Streams podcast:

Want to make sure you never miss an episode? You can follow Eventador Streams on a variety of platforms including Soundcloud, Apple Podcasts, Google Play, Spotify, and more. Happy listening!

Episode 06: Building a Streaming Data Engine for Killer Apps with Erik Beebe

 

Leslie Denson: You folks are in luck and are in for quite the conversation today. In honor of Eventador’s fourth birthday, Erik Beebe, our co-founder and CTO, joined Kenny and me to talk everything from the Sun E10Ks of the past to the Apache Storm versus Apache Flink discussions of a few years ago and the technology and platform decisions like, “Is Kubernetes really the answer?” that are still debated today. Learn more about the history of Eventador and why these two are so passionate about streaming data in this episode of Eventador Streams, a podcast about all things streaming data.

LD: Hey, everybody. Welcome back to another episode of the Eventador Streams podcast. Kenny and I are here today with a guest that we’re really excited about, and we think this is gonna be a lot of fun. I think you guys are in for a big treat when we get the two of these guys together, but we are here with our other co-founder and our CTO, Erik Beebe. Erik, welcome. How’s it going?

Erik Beebe: Good, happy to be here.

LD: Good. Glad we were finally able to rope you into this. I know a little bit about your background has come out in some of the other podcasts that we’ve done, and I’m sure people who know Eventador know a little bit about it but tell us a little bit about yourself from your perspective.

EB: I’ve known Kenny for probably 20 years now and is largely how I find myself here today.

Kenny Gorman: Sorry about that.

EB: I kinda started out my career early on at eBay and PayPal doing infrastructure work, and it was a great place to kinda grow into those things. It was… There was a lot of… We get to play with everything from enterprise storage to making databases fast and dealing with crazy scalability problems. I kinda took that and moved on to a number of startups. We really got to kinda dig into all sorts of early growing pain sort of problems around infrastructure and all the normal things like moving into a cloud and virtualization and kinda non-traditional data stores, really those were the kinda things that led me to where we are now with Eventador.

LD: Well, I’m glad that it led the two of you to Eventador ’cause then that led me here, and I’m happy with that.

KG: So, I wanna hear about, Erik, the story that you tell just right off the bat ’cause you said eBay, tell us a little bit about, ’cause I know the listeners would like to hear, about when you first came to eBay, what your thoughts were about playing with E10Ks and all that. Tell a little bit of that color ’cause that’s a killer story.

EB: It was actually, it was super exciting. The first real job I had in any sort of technology was in a small ISP back in Virginia named InfiNet. I went from there through kind of a fortuitous meeting with some other folks with a couple of friends and moved out to California to start at eBay and that was in 2000 so eBay was, it was still very early at eBay. The kind of traffic that we saw then compared to what they do now was, eBay was still more or less a startup, although a big one. I kinda went from playing with small DEC Alphas and things to Sun E10Ks literally overnight, I guess over a weekend. It was my first exposure, one of the computers that were taller than I was and also the…

KG: Right.

EB: To managing databases that literally had rows of… At the time there were Sun A3000s, A3500s SCSI arrays. They were super unreliable at the time. It was a real big challenge. We had all sorts of growing pains that made it real fun to… That was really our first experience with sharding before sharding was popular, realizing that you would scale the database to the largest logical conclusion. You’re on a machine that in 2000 had 64 CPUs and 64 Gigs of RAM, you would scale the Oracle as far as it was gonna go, I guess Oracle…

KG: We literally asked them, right? We said, “What’s the biggest computer you have? Send that over and send 10 more just like it.”

EB: Right. And at the time, that was the biggest computer they had.

EB: We brought you that computer. So, yeah, at that point, we realized we needed to find a way to run this on more smaller computers and at the same time find a better storage solution. So that was our first foray into early storage attached… Network SANs, with kinda like the first generation of Brocade switches and… All the growing pains of beta testing everyone else’s HBAs for them and… Right. Yeah, a lot of fun.

KG: Old school but awesome.

LD: So jumping forward with all of the knowledge that you guys have into what we’re doing now, which is streaming systems, Kafka, Flink, continuous SQL, etcetera, etcetera, excuse me. Let’s talk a little bit about Eventador and the evolution of Eventador over the last few years because when I joined the company a year and a half ago now, you guys talked a lot about the last mile problem and what you were trying to solve and it having been kind of a vision of what you wanted to do from the outset. And I am really proud and I know you guys are too that we’re kind of there, and we see places to go, but you guys have done a great job of marching the team forward and executing on that. So let’s talk a little bit about the beginnings of Eventador and what you guys were seeing and what you wanted to do and what you thought was cool and what you may look back on now and go, “Oh, why on earth did we do this this way?”

KG: So many thing, so many things. [chuckle]

LD: So many things.

EB: All the things that we would do different, that is definitely open-ended. It is so many things, right? I mean, it’s… And I guess that’s… I could probably take any… The companies that we started, notwithstanding, I could imagine any technology that we engaged in, work done for a couple of years, and then look back, and I can’t ever think of a situation where with the enlightenment that comes from that, you wouldn’t say, “Okay, I have pages of things that I would do differently now that we’ve had the experience of scaling it and making it work for actual people, that depended on this in production.” And this is definitely no different.

KG: I think some of the things that, if you think about road bumps, and things that maybe threw us a curveball early on, is like first of all, we started off as a managed Flink or a managed Kafka company, and… ‘Cause that was something that was clearly a need when we first started this four years ago. And I think our anniversary was yesterday, right? So it’s been a full four years, yay, a full four years of us messing around in the streaming space. And I’d like to think kind of as, Leslie, you put it. We probably gotten more wrong than we’ve gotten right.

LD: I never said that. That was not what I said. We’re gonna interrupt for the listeners out there…

LD: That is not what I said. What I said was is that we’ve executed really well in the vision that you guys had, what would you change, and what are you guys really excited about? Don’t put words in my mouth.

KG: Like Leslie had said…

KG: No, but that’s indicative of the entire space. And I don’t think that it was that this was an easy or well-understood… I think Confluent, and Jay, and Neha, and those folks were out carrying the flag and leading the charge in Kafka, and that was groundbreaking stuff. And we were genuinely interested in it and understood the value. Always were thinking like, “How do we… ” Well, you mentioned the last mile earlier, “How do we actually help folks make sense of this data?” Because there is a mindset or a mind shift around trying to consume data from streams, and use it in apps, versus just like, “Hey, what is the core infrastructure I need to… ” Sort of like, “Okay, I got Kafka, now what? And how do I make sense of that?” And I don’t know, Erik, if you wanna add some color there, but we went through a lot of iterations of thinking through stream processors like Storm and Flink. And there was a whole evolution there.

EB: Yeah, for sure. Looking back now to where we are now, versus even four years ago, the streaming space was super nascent then. I mean, everyone have been doing it to a certain extent, right? I mean, people have been writing streaming systems, frameworks, maybe more purpose-built ones for their own organization for a couple of decades now.

KG: Right.

EB: But the idea of building general purpose stream processors that were super robust and would handle things like checkpointing for example, robust crash recovery, that kind of thing, like you would expect in a database, is not super… More mature today but it certainly wasn’t four or five years ago. And the big players in the space then, Storm was super… At least by five years ago standards was super advanced. Flink was new in the streaming space and brought a lot of really interesting features along with it, like checkpointing and savepointing, and really rich APIs that at the time the other streaming systems didn’t have.

KG: And we started off with Storm. Lead us through our thinking there. I talked about it a little bit in the last podcast, but I glazed over some of the details. We started off with Storm, because obviously we wanted a processor. I think we were doing this stuff for Hurricane Harvey back then.

EB: Yeah.

KG: I don’t remember the details of the whole timing there, ’cause my memory is terrible, but…

EB: I think we went through a couple of iterations, right? So I mean, one relevant point I guess is we were kind of coming into this knowing that we wanted to build a streaming system, a general purpose streaming system that would make it as easy to plug streams of data into a streaming system and then build useful applications around it as you could with those SQL systems of the day. We had kind of come into this having done a startup before based largely around MongoDB. I think you probably talked about it in previous podcasts some too. One of the things that led us to really diving into Mongo was that it was a huge paradigm shift in terms of usability for developers. Having written things against relational databases for years, moving into a document-based system that was based around largely something that looked like JSON, a query language that was kind of document first. SQL is super robust and super powerful, but I don’t know that anyone would necessarily call it, building applications around SQL, fun. MongoDB definitely changed that, and having…

KG: It’s lots of fun when there’s no schema.

EB: It is. But I think we realized it’s lots of fun if you’re not constrained by a schema until you remember why you had a schema in the first place.

KG: Right.

EB: And the indexes still matter for…

KG: Right.

EB: noSQL databases. And that was kind of the problem with customers too, right? They were like, “This is great. I can write whatever data I want to Mongo and not worry about anything.” But until they did, we were working on solving those problems of, “Hey, how do you… I’ve got a collection with 15 billion documents in it now, and now I know I wanna access it differently. How do I index that?” It’s like, “Well, alright, we should talk about that.”

KG: Right.

EB: And I think that probably drove our interest in, okay, the real solution to problems like this isn’t to build more indexes, it’s to make a streaming system that is as easily accessible to developers as MongoDB or something like MongoDB, any of those SQL databases of the day. But that is also powerful enough that you can get all the value you’d expect out of a database with it. And streaming systems kind of had one or the other. There’s loads of documentation. I mean, CQRS for example, or Event Sourcing patterns aren’t super new, they’ve always had the difficulty of managing them and production associated with it. It was like in many cases they were… CQRS makes for great academic papers for building scalable systems, but it’s pretty daunting for smaller users. I think it’s probably a long-winded response to, “Hey, how do we end up at using a stream processor for solving this problem?” But I think those learnings are probably what got us there.

KG: But we see it all the time, right? We see folks who are still trying to, “Hey, I’ve heard about the streaming thing. I wanna adopt it.” And if you’re in the streaming space, you understand that Kafka and Flink and the other players in the market. You understand who the vendors are and what their positions are, and you could kind of understand the whole thing, and how it’s different from maybe something like Hadoop and Spark and maybe you understand how it’s definitely different than databases, even SQL ones, but there’s still a ton of people out there who are like, “Hey I’ve heard about the streaming thing and I wanna get booted up and using it, and I’ve heard it’s cool and I heard I need it, but I don’t really know how or why.”

KG: And it seems to me that there’s these gigantic bags of tooling. There’s Kafka, there’s Flink, there’s Beam, there’s all sorts of components and obviously many more. Wiring them together and making an actual reliable production system that doesn’t go down, and that services the data constituents, whoever it is, the data scientist or analytics teams or even application teams, and making that thing work and robust and day in and day out, and not keeping you awake with pages and all that kind of stuff, and capacity planning and being an elastic during periods of heavy workload and stuff, when I think back about what you said about eBay, it’s like 10 times harder now. This stuff is supposed to be getting easier, not harder. I think maybe it’s 10 times harder now with these different systems for ops teams and groups of engineers to actually build and deploy these things.

EB: Yeah, yeah, for sure. If you think back to the first iteration where we adopted Storm, I think if we had to state a high level goal, what was the thing we were trying to accomplish for customers? Well, it was going back to that Mongo huge collection analogy. The thing that we really wanted to provide was the ability for a user to provide a stream of data from application, not really care too much about the schema or structure or anything, and then be able to build applications on top of it, in a way where they could evolve the application without having to rebuild the entire streaming system or the entire data store every time.

EB: And Storm was an interesting start there because it did a couple of things really well, reliability and scalability were kind of paramount for it. If you remember the first iteration that we built with that, Storm is obviously based around the JVM. The JVM provides great scalability, great introspection, but at the time, having to write Java code in order to define the pipeline, really alienated a lot of users that we wanted to engage with. So we built the first system around a fork that we had made of streamparse, the Python interface to Storm that Parsely released, which, even today, is a super neat piece of software. Being able to write jobs in Python on top of Storm and get the value out of Storm is really cool. We did have, I think, a lot of success with that. I think ultimately, the problem for us there was that we kinda took that to its logical conclusion that, “You know what, this is great. It’s super accessible, if we want users to be able to express things in a simpler language on top of this.”

EB: But ultimately, the goal isn’t to expose programming language to the user quite so much as specific functionality. And we gave up a lot by abstracting away the JVM. Storm is a great piece of software. I think we had a lot of success with it, but I think that was probably one of the things that led us toward looking for another solution and ultimately towards Flink as the underpinning for this.

KG: Yeah, yeah, and I remember those were pretty big days. Yeah, I think I mentioned in another podcast we went, it was you, me and Jmo, and we were at lunch, and it was like a three-beer lunch or whatever, and we’re like, “You know what? F it. Let’s go use Flink. We need better state management.” And we moved pretty quickly once we… Well, once we had three beers, that helped. And then, second of all, it was, all the things that Flink brought to the table were super exciting and we kinda knew we had to jump on and start to build around it.

EB: Yeah, for sure. Yeah, the APIs there really make sense. And where they’re insufficient, they’re easy to… In general, they’re easy to extend. One of the killer things about Flink, obviously, is the whole checkpoint, savepoint system, which we had kinda gone down the path of inventing our own for Storm and streamparse.

KG: Right.

EB: It’s… Every database on earth now has written some version of this, and it’s always complex. That was a problem, that others had already worked hard to solve, had done so in a really good way.

KG: Yeah, that was…

EB: And in the years since then, it’s gotten even significantly… I think we started with Flink 1.2 and we’re at 1.10 now.

KG: Yeah, it was hard for a while because we’d write a job in 1.2 that would be completely useless in 1.3, and by the time we’re at 1.6, nothing worked the same way as it did before. And we just kept re-writing jobs to be with the latest API. And it’s like this, it’s like you’re excited about it, ’cause it’s got a lot of new functionalities, just nothing you wrote before was useful at all.

EB: Yeah, now that the APIs are more stable…

KG: Stabilized, yeah.

EB: It’s gotten a lot easier. But that was a lot of work for a while.

LD: So, I think that the decision on Storm versus Flink, obviously, was a great one in hindsight. So those are really good beers that you guys had, we should go and do that again sometime. There were also… And I still hear them bandied about and debated on a daily basis for different things, but there were also some other decisions that you guys had to make that I think others probably out there are… Will either listen to you and go, “Yep, we had that same conversation.” Or, “Hmm, we’re having that conversation now, this is good to know” on how to pull this whole thing together. For instance, do we use Kubernetes? What do we do?”

KG: Oh, there you did. You said it. Here we go.

LD: I went for it. I went for it. Talk to us a little bit about some of those decisions, and how we got there, based on experience and what you knew at the time and etcetera.

EB: I knew there had to be a question that would be a good catalyst for a fight at some point…

KG: They can’t all be softballs.

EB: Yeah, I mean, I’ll take that one. And I’ll add some… I guess some nuance to it too. We kinda talk about where we started four years ago, we initially built the Eventador platform around, largely around the AWS ecosystem. We built a control plane that largely used Boto for manipulating things. We would build purpose… Specific AMIs for the functionality that we provided. That was our release model. We used Packer to put them together. It actually it worked fairly well, but as everyone eventually gets to, it’s pretty inflexible and it’s not easy to manage. Well, there are a lot of trade-offs I guess, and that’s one of the things, I guess, that draws you to the container ecosystem.

EB: Looking back, I’m thinking of the whole progression that led us there, I used Xen and KVM for years for virtualization. They’re both great. A significant portion of the world still runs on those technologies, but containers eventually became mature enough that we built our previous company around a lesser known containerization technology now called OpenVZ, which I loved, it was great. The tooling was good, it was just enough abstraction to make it so they were easy to package, an OpenVZ machine was basically just your root file system all tar-ed up. So it was very easy to change. The downside was there was no mainline kernel support for it so you had to run an OpenVZ kernel. If you were using third party drivers like we were using Fusion-io at the time, so you had to make your Fusion-io drivers work with the version of the OpenVZ kernel that you had, which was always kind of out of date. The biggest problem was it had no real API. You kinda had to wrap the OpenVZ tool set, which we did and it worked out pretty well, but it was pretty clunky and hard to maintain.

KG: But ultimately it got the job done. I mean, we scale that to billions of documents, billions and billions of documents running in Mongo on those things.

EB: Right. Ultimately, it got the job done. And it had a lot of great tools too for introspection. It had its own metrics collection, and it… I really liked it. It was…

KG: We like it now. Back then we cursed it. But now we love it, right? It’s one of those things.

LD: Rose-colored glasses and all that.

KG: Yeah.

EB: Right, right. So kind of fast forwarding, we were initially pretty AWS-centric here, so we built the first prototype around just AMIs and running on kinda EC2 bare metal and as the platform matured and we added more and more microservices that we depended on to run bits of the platform, that became both expensive and a little bit harder to manage, so it was clear that we needed to move towards a containerization system.

KG: Right.

EB: By that point, it was 2017 probably that we were moving into a containerized platform, and at that point, Kubernetes had clearly won. We used LXC and LXD for a while, we experimented with that, we even… We ran some multitenant componentry around it for a while. It was alright, but, and it’s great ’cause LXC support is in the mainline kernel so it was easy to plug in. The APIs were somewhere between Kubernetes and where OpenVZ was. Ultimately, Kubernetes and its Docker underpinnings were kind of the clear winners in the space, and so we moved in that direction. Everything else I’ll say about this I’ll color with, I have a love-hate relationship with Kubernetes.

KG: I did. I admitted the other day. On the last podcast I admitted I can’t even get into the containers anymore. It’s like kube control, what? Like, I just… I don’t… Someone help me.

EB: And it’s… I think if you sampled our team, I think discussions around containers and preferences are somewhat like the emacs versus vi discussion today.

KG: Right.

EB: People really love or hate any one thing. Kubernetes is just kinda… It is a de facto solution for things. Every cloud provider has their own Kubernetes solution now. They’ve built APIs around it. Kubernetes itself has a pretty rich API and great libraries for every programming language you can think of. Kube control, I don’t know if I’d say great tools, but it is a tool and it’s adequate for most things. It was the natural fit for what we wanted to build, not just because the simple like, “Hey, wanna make machines more dense, we wanna be able to run more containers on a single machine.” In many ways, the Docker files, the Docker system for packaging releases makes a lot of sense even if you’re not really getting a lot of value out of being able to run a bunch of containers on a single piece of bare metal, and in many cases, we’re running databases or Kafka or these things that have persistent storage requirements that frequently also use all the resources on a machine, it still makes a lot of sense just from the packaging aspect. Being able to use things like Jenkins for release management and Docker files for builds, you can build a pretty reliable CI/CD pipeline for all of your infrastructure, package it up, put it into a repository and Kubernetes makes provisioning it, upgrading it fairly straightforward.

KG: And just so people know, we don’t deploy Kubernetes globally and then jam 100 customers on there or anything, when we deploy, we deploy a Kubernetes cluster per customer.

EB: Right. Important distinction about containers versus traditional virtualization, we provision on bare metal today, bare metal runs Kubernetes, Kubernetes runs the containers. And it’s great because all that kinda fits like the AWS and VPC pipeline pretty well and…

KG: Right. But you get your own Kubernetes, so to speak, to be pretty vanilla about it, you get your own Kubernetes. It’s not like you get jammed into a Kubernetes cluster with a bunch of other people.

EB: Right. Important distinction there. And we kinda took it a step further where we specifically make it so it’s easy to mint a Kubernetes cluster for whatever you wanna do. So it was kind of important here was you could if you wanna physically separate out security concerns, you can say spin up a new environment just for QA or dev or whatever, that gets its own VPC with its own Kubernetes cluster or clusters. And so we kinda run these mini clusters that run the streaming pipelines. And Kubernetes lends itself to that reasonably well. Kubernetes is fairly lightweight in that regard. It gives you a good set of APIs to kind of declaratively do everything from spinning up Kubernetes, to bootstrapping the environment, to testing it, to running jobs against it. It makes it easy to add or remove an atomic set of resources.

KG: Or scale Flink. I remember that one night, we were at the Oracle OpenWorld and we’re at the hotel and we were bored. We were messing with the Oracle Cloud back then, we had just joined the Accelerator, and we had gotten Flink set up on their Kubernetes offering, and I remember we were like, “Hey, let’s just start this Flink job,” and it’s got like three task managers. It’s like, “What if it had 300?” ‘Cause at that point we had basically free cloud resources, and it worked. I mean, it was an exercise in like, “Well, just add another zero on the end of what you would normally scale. Did it actually work?” And it did actually work. We scaled Flink while it was running to 300 task managers or whatever it was, some very large number, and I mean, it took a few seconds or whatever. But then I mean Flink is relatively good at that. The jobs picked up, and things started processing, and voila, we were off to the races. It was awesome.

EB: Yeah, the combination of Kubernetes and completely free cloud services are neat.

LD: And it also sounds like the combination of you guys being at a conference with probably a dinner that had some beers ahead of it led to…

KG: Maybe. Maybe.

LD: I think there’s a common theme here.

KG: But the question was, the big question was like what should the value be? Should it be 30 or should it be 300 or 3000? We didn’t wanna get in trouble or anything. So we didn’t launch 20,0000 task managers or something. But…

LD: Being worried about getting in trouble has never stopped you before, so unclear on why it didn’t.

KG: That’s all Erik. That’s not me.

EB: Still answering the question, how fast can we make this with unlimited computers, is always fun.

KG: Well, we did have the question. We didn’t know how big the Oracle Cloud was at that point, and we thought “Well, I mean, maybe we should just find the end of it, we should just keep scaling Flink until they say no.”

EB: Right. “Do you think this is the whole data center? I don’t know. I guess we’ll find out.” Look what we dealt with.

LD: That would have been in an email we printed and hung on the wall, I’m pretty sure. Well, looking back on where we are now to were we started from, if you, both of you, I want both of you to answer this question: If you could go back and tell oh so young Kenny and Erik, four years ago, something that you know now that you wish you’d known then, what would it be?

KG: Look, it’s still very early for Eventador. We’re still a small team. We’re a small passionate team, and I think we’re, in terms of the craft of streaming systems and understanding, that has been something that’s been important to us since day one. And I think it was with our previous company, and maybe it’s held us back from a growth perspective, in just really trying to dig deep, roll up our sleeves, and understand the craft of building systems, streaming systems, understanding customer problems. Ultimately, running them in production, and that’s no small feat, being an extension of the Ops teams for folks. That’s kind of the core values of our company. And I think going back, would I change that, would I go for broke or try and grow faster or whatever? It’s always been hard because you have an investor community that wants you to grow fast, you have customer demands, and technology things that actually have to be worked on and solved, and you kinda can’t cheat that. You have to be authentic to what’s required to do the hard work of building a system like this.

KG: And we’ve done a ton of work around SQLStreamBuilder, and we’ve done a ton of work around materialized views. And we’ve done a ton of work around operationalizing and making Flink and Kafka work together in a really great way and being able to support it long-term and at scale. And I think from those standpoints, I feel relatively good. I think, in hindsight, sitting here where we’re at right now, we’re starting to see this whole ecosystem mature. And I think the big thing I would have done personally, the big mistake I think I would have… That I’ve made, and that I would change is I would have pitched harder and I would have worked harder to get us to this point without going through kind of the managed service realm. Because I think ultimately we knew that Confluent and Amazon and everybody was gonna come to the Kafka game.

KG: I’m just being super transparent here. We knew everybody was gonna come to that game because we could see it as being awesome already. That’s why we started in this field, this is why we were excited about it, we knew that streaming systems were gonna be awesome and we’re excited by it. And so we moved in that direction, but I think we took baby steps instead of taking giant leaps, and I think that was because maybe we weren’t totally sure of ourselves or weren’t totally sure of the space, and I think now in hindsight, “Hell, yeah! It’s game on.” And Flink is growing crazy. Kafka is already… Confluent is huge. They’re already doing great and good for them. And we’re a small part of that, but we’re a small passionate part of that.

KG: And I think ultimately bad on me for not… And I’ll take… This rests on my shoulders, I suspect, is just driving harder towards that vision. And I think from here on out we’re gonna. And that’s something that we’ve learned, and that’s something that we’re going to and so you’re gonna see… Customers are gonna see us drive towards solutions around their pain points and helping them make sense of data that’s locked away in Kafka and tough to get to without writing Java or Scala code. And that’s our mission in life. And we’re gonna go harder at it even during these tough times. And that’s kind of been my learning through it all. Was that too honest?

LD: No. I was more thinking like, Oh, we’d use Terraform instead of CloudFormation, but… I’ll take that too. I’ll take that too. I will rebut a little bit of what you just said and say, one of the things that we’ve talked about internally was having the managed service, and I think it’s important…

KG: We’ve learned a lot.

LD: Is having the managed service piece of it. We learned so much that informed the platform as it stands today, that has made it so much better.

KG: That’s true.

LD: So, while hindsight is always 20-20, I don’t know that the product would be what it is today, which is awesome without having that in our back pocket. So, alright, Erik.

KG: Erik, what say you, man?

EB: Looking back, I think our priorities have changed a little bit. I think if I have one opportunity to tell myself something from some number of years ago, it would have been start stockpiling paper towels for 2020 right now.

EB: But pandemic aside…

EB: I think yours was more around where is this kind of marketspace going. The first thing that came to mind for me was just kind of thinking about the evolution of data systems in general. If I had the ability to summarize what’s important to me today and tell my previous self this, I can think of lots of occasions over the last 20 years where had I been able to tell myself, “You can build systems without relational databases, that’s okay. You can move beyond these core building blocks.” And I have specific thoughts around, instead of building a system that would be fairly simple and would solve a problem, instead trying to build better middleware in front of Oracle or a better ORM to store non-structured data inside a relational database you already have, or trying to shoehorn… I have specific bad memories around using HornetQ and RabbitMQ to glue pieces of infrastructure together, knowing where the norm for data systems has arrived in 2020…

EB: And ultimately, this is still pretty early I think, there’s still so much growth and innovation happening in streaming systems and databases. And really, the way that containers and container systems and the scalability comes from them relates to the growth of streaming data. Today, in 2020, there’s lots of great tools, but still no one ultimately, no one would confuse a streaming system with a database today. There’s still a pretty big divide between those things. The simplicity of a database hasn’t met the streaming vertical quite yet, and ultimately the systems of the future, there are really interesting things that are gonna happen going forward, especially as the amount of data that everyone, even small organizations generate, grows massively.

EB: Having these more mature systems and building the simplicity into it so that everyone can use this and everyone can leverage streaming systems to build applications on top of, is gonna make a huge difference. So kind of telling my former self, giving my former self cues about, “Hey, when you’re building this, these are the things to think about.” I think it’s a lot of don’t be constrained by what you see as the core building blocks today. This is all gonna change, and it’s okay, it’s alright to leave some of the relational paradigms behind and start thinking about things like de-normalization more and more where it really makes sense. That might have been a long-winded, a roundabout answer to that, I’m not…

LD: No.

KG: No, I think that’s true, I think that’s good. Yeah.

LD: That’s a great one.

KG: Very true.

LD: Very true. And now I’m gonna hit you guys with one that I did not let either of you prep for, because I didn’t want to, ’cause I’m evil, and I can do that. We talked a little, or I mentioned a little bit and then Kenny talked a little bit about at the beginning, the idea of the last mile, which I think everybody to some degree understands that term. For us, it was really delivering what we’ve got with the platform in 2.0 with SQLStreamBuilder having the materialized views really giving folks a place to have a really nice experience with their streaming data and be able to use it in a great way, and we’re all incredibly, incredibly proud of that. But I wanna know now that we… As Kenny mentioned, we hit our four-year anniversary yesterday. Now that we’re four years down the line, what are each of you incredibly proud of with Eventador and what we’ve done? What puts a smile on your face when you think about the last four years?

KG: Does Erik have to go first this time?

LD: Yes.

KG: Perfect.

LD: I’m evil. I know, I didn’t let either of you know I was gonna ask that question.

EB: What am I really proud of? That is a good question. Let me pick one thing here. ‘Cause ultimately, I think a lot of things. I could talk for hours about the things that I think we got wrong, just during the normal course of scientific discovery around how do we solve these problems. But in terms of things that I think we got right, the idea that the problem that we’ve always called the “last mile problem” is the real hard problem to solve in streaming. We spent a lot of time thinking around that, and I think we’ve ultimately arrived at a good solution there. The real challenge to streaming was always that you were taking these huge data flows of disparate data and loosely structured data and data that required enrichment and data that you ultimately had to find a way to partition and scale in order to effectively use. I think that’s been a pretty obvious thing about streaming for a long time, and I think a lot of people have gotten that right.

EB: The thing that was always important to us was how do you use this, if you’re an application developer, this is usually where the story starts to become a lot more complex. If you’re Netflix, or if you’re Uber, you have huge data teams that can take these pipelines, do something with it, and provide it to application developers in a way that makes sense. But if you’re a smaller organization, you’re trying to get value out of a stream of data, it’s very hard to do, because there is this impedance mismatch between the concept of a boundless stream of data on one side and application developers that still expect the semantics of a database, being able to ask it a question and getting a point-in-time set of data back. And ultimately doing that in a way that doesn’t really compromise the fidelity of the data but is still usable and fun for application developers, has turned out to be a pretty challenging problem, and we’ve put a lot of work into that. And today I think we solved that problem in a pretty good way and continue to evolve that, as more use cases come up, and more interesting customer demand from it. So of all the things that I think pop into my mind for that, that’s probably at the top of the list.

KG: Yeah. I kind of dovetail in there. I think the way just for background, I think what we call the last mile may not be known to everybody, but ultimately just to kind of put a bounding box around it, the last mile to us means how are people able to use Kafka data in their applications, whether it be just kind of a reporting like a data analyst type use case, or data science and machine learning that kind of thing, or even just using the data in applications. Like you’re plotting positions on a map or whatever that might be. And so, that last mile is really getting those people way out on the edge, if you wanna call it that, that development person, the capability to easily dovetail and use Kafka data, streaming data. It doesn’t have to be Kafka but for the sake of argument we’ll call it Kafka. Kafka data in their applications.

KG: And that, like Erik pointed out, that there’s an impedance mismatch there, and that’s still not solved. And I think the last mile, I think we’re a quarter mile from the finish of the last mile. I think there’s a lot of work to be done, and it’s getting harder to do it right from a just a computer science standpoint, and reliability and scalability standpoint. There’s still a ton of work for us to do as Eventador to achieve that, where kind of our heads are at. And the finish line will move. ‘Cause this industry, and data science and the requirements and capabilities are only growing from companies. So that’s getting bigger too. And I’d echo that, I think our best work has been figuring out that, that’s a big problem, and writing a solution around that, but I’d also say the thing I’m most proud of, the thing that puts a smile on my face is ultimately, like I said earlier, we’re a small team, a passionate team of experts. I think at this point we know a ton about Kafka, we know a ton about Flink. Are we the end all be all experts in everything? No. But we do live this day in and day out, and from an operations perspective, we do wake up in the middle of the night and fix Flink jobs. We do get pages for partitioning problems, and we fix those and scale systems and help people capacity-plan and do all sorts of things.

KG: So, we do live this day in and day out. This is our life now. And frankly, what puts a smile on my face is kind of the idea that SQL can have, from my background obviously, being a DBA in database engineering and Oracle and all those things for many years, the idea, this kind of beauty in the idea that SQL can have a place in a modern application stack. This last mile almost requires it, it’s a declarative language, it makes it very easy to address data in many forms. It’s been modernized in the form of continuous SQL, so that it works in the streaming paradigm, and it has the promise of bringing this capability of streams and streaming data this very new-school kinda systems to the folks that can make apps out of it and make decisions based on it, and build models on it. That to me is the most exciting part from a technology stack perspective.

KG: And then having a kick-ass team behind it, and those are the things that I think I’m most proud of. And I think from here going forward, there’s gonna be a lot of work to be done, but it’s the most exciting work possible, because we’re in a small field of folks that kinda believe this way. There’s Materialized, and you see ksqlDB in there, and there’s a few other folks that kinda get it too. As this smaller community of folks that kinda gets that grows. I think that’s gonna be exciting and that’s… I wake up every day going, “Hell, yeah, let’s make the next cool SQL statement on a stream” or “Hell, yeah, let’s… The customer has this stream of JSON data and it’s a mess.” And “Look at this magical statement that we helped them craft to unwind that data and join it with some other database. And now their business folks are off to the races,” or whatever. So that to me is the reason to wake up and give it hell every day and with this great team that we have now, like I said, damn experts at what they’re doing. They humble me every day. I’m excited. That is an exciting prospect.

LD: Awesome, I’ll take it. I think those answers were satisfactory.

KG: Alright.

LD: No, they were really good. And I would echo what you guys said it’s… Like I said, I joined a year and a half ago, which seems crazy now, and where we are now, versus what we wanted to be and where we wanted to be when I joined a year and a half ago, it’s really awesome to have seen it all come to fruition. And we’ve got some rock stars on the team that I just don’t know how they do the things that they do. It’s super good, it’s awesome. One last thing, what are you guys excited about moving forward, both with just kind of streaming in general, getting something out to our customers so that they can do even more with their streaming data? Just the field is wide open for you to answer this one, but what’s got you guys excited about this industry and what we’re doing from here on out?

KG: That one’s an easy one for me, my entire professional career has been built around data. I’ve never really done anything else other than data stuff. Whether it was early on with Sybase [chuckle] and SQL Anywhere, to Oracle DBA land and PayPal and eBay, to Mongo days of early Mongo adopter and writing tools for that, and ultimately founding a company with Erik, around that. And then here we are in the stream ecosphere. I don’t really have anything else to do other than just wake up and care about this a lot, and I’ve been doing it my entire professional life. And we’ve evolved with the state of the art, and I think what I’m excited about is that the data sphere if you wanna call it that just continues to grow. Like, there’s just more demand from companies to use data in new and interesting ways. Data science wasn’t a thing back in the day, people just did GroupBys.

KG: Today, you have very sophisticated machine learning models, predictive analysis, and this is all just normal stuff now that people are trying to build into pipelines for their business. That is the coolest thing. And to be a small part of that and to see the open source community grow, to see the Flink Community grow, I’m excited about, obviously, the Kafka communities. We don’t need to be happy for them, they’re doing great. I think that this is just gonna continue to grow and be more awesome. Having our slice of the pie, and building a cool company around a really hard piece of this is intoxicatingly cool, and I’m just excited to continue to evolve and grow and hopefully challenge the norms in a couple of ways. Hopefully, people are looking at Eventador saying, “Wow, these guys really believe in this last mile concept in being able to query streams of data with, as not programming users and being able to build applications quicker and have them scale and be amazing.” And that kind of thinking, hopefully if we’re pushing the boundary there, then that’s great. And I would be very proud if someone ever said that.

KG: So I think that’s what I’m excited about is to continue to push the boundary there, and grow based on this whole notion of SQL inquiries and materialized views and ultimately, like I said, getting those end users to be able to use streams of data in their day-to-day work.

EB: The thing that I’m probably most excited for is the next thing that we don’t know yet. The one thing that I’ve learned over the last almost exactly four years now, is that the ecosystem is moving fast. There is a next new thing that will come along quickly, either through customer demand or however we happen to encounter these things, but there’s always a… By virtue of the fact that many of these technologies are still kind of new and feature sets are still evolving, the way people are using them is still evolving, amount of data generated just in terms of volume and diversity is huge and increasing every day. There’s always a next new thing, and this is the really fun part about building a platform around streaming data is the challenges are huge and pretty constant. So, for someone who’s been a data nerd for a long time, looking at the next unknown and looking to solve that, it’s always exciting.

LD: Well guys, I guess, we’ll have Erik back on these at some point. I guess he passed the test. I guess he gets to come back.

EB: This was fun. This is my first ever podcast, but I appreciate the opportunity, it’s been a lot of fun.

LD: Of course, I mean, you’ll have many more opportunities where you will get sick of seeing my name on your calendar. But, I will. Gonna happen. Alright, thanks, you guys. Have a good one.

KG: Thank you. Awesome.

EB: Yeah.

LD: Well folks, there you have it. Now you have a little taste of the fun conversations we like to have on a daily basis. If you just can’t get enough of hearing Kenny and Erik chat about streaming data or any of the other topics that come to mind, you can always drop us a line at hello@eventador.io or connect with us on Twitter at @EventadorLabs. Or to learn a little bit more about us, the company as a whole, and the Eventador platform, visit eventador.io. And as always, if you want to go ahead and try it out for free, you can get started at eventador.cloud/register. Happy streaming!

Leave a Reply

Your email address will not be published. Required fields are marked *