GOTO - Today, Tomorrow and the Future

CockroachDB: The Definitive Guide • Ben Darnell & Guy Harrison

August 19, 2022 Ben Darnell, Guy Harrison & GOTO Season 2 Episode 32
GOTO - Today, Tomorrow and the Future
CockroachDB: The Definitive Guide • Ben Darnell & Guy Harrison
Show Notes Transcript Chapter Markers

This interview was recorded for the GOTO Book Club.
gotopia.tech/bookclub

Read the full transcription of the interview here

Ben Darnell - Co-Author of "CockroachDB: The Definitive Guide" and CTO at Cockroach Labs 
Guy Harrison - Co-Author of "CockroachDB: The Definitive Guide" and CEO at alwaysNFT.cloud, CTO at ProvenDB 

DESCRIPTION
How do modern data platforms integrate into today’s world? Join Guy Harrison and Ben Darnell, the authors of "CockroachDB: The Definitive Guide", to learn about the different use cases and unique functions of CockroachDB. Take a deep dive into the migration to the cloud and the different requirements for analytical and transactional data platforms.

The interview is based on Ben & Guy's book "CockroachDB: The Definitive Guide".

RECOMMENDED BOOKS
Darnell, Harrison & Seldess • CockroachDB: The Definitive Guide
Guy Harrison • Next Generation Databases
Guy Harrison & Steven Feuerstein • MySQL Stored Procedure Programming
Guy Harrison & Michael Harrison • MongoDB Performance Tuning
Kishen Das Kondabagilu Rajanna • Getting Started with CockroachDB
Regina Obe & Leo Hsu • PostgreSQL
Simon Riggs & Gianni Ciolli • PostgreSQL 14 Administration Cookbook

Twitter
LinkedIn
Facebook

Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at gotopia.tech

SUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily.

Dev Interrupted
What the smartest minds in engineering are thinking about, working on and investing in.

Listen on: Apple Podcasts   Spotify

Catalyst 360: Health, Wellness & Performance!
Your trusted resource for engaging, evidence-based health, wellness & performance

Listen on: Apple Podcasts   Spotify

Twitter
LinkedIn
Facebook

Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket: gotopia.tech

SUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily

Intro

Guy Harrison: Hi, everyone, welcome to the GOTO Book Club. My name is Guy Harrison. I'm a database author and professional. And over the past year, I've had the pleasure of working with Ben and Jesse on the "CockroachDB: The Definitive Guide Book” by O'Reilly," which you should be seeing on your screens now. It's available for pre-order as we record this and it should be available for purchase in April. And today, I've got the pleasure of interviewing Ben Darnell, the CTO and Founder of Cockroach Labs, I guess, co-founded to be strictly accurate. We're just gonna talk about Cockroach DB a little bit about the book and just give you a bit of a teaser of what CockroachDB is about, and why you might wanna buy the book. So, Ben, hi.

Ben Darnell: Hi.

Guy Harrison: Good to see you as always.

Ben Darnell: Good to see you.

Guy Harrison: Ben's in New York, by the way, and I'm in Melbourne, Australia. So this is an international call. So Ben, why don't we start with a bit of a life story from you to understand how you got into the industry and what you've been doing before founding CockroachDB.

Ben Darnell: Sure. So I started my career about 20 years ago now at Google. And that's also where I happened to meet Spencer Kimball and Peter Mattis, who would go on 15 years later to be my co-founders at CockroachDB. But anyway, at Google, I worked a lot on different sorts of infrastructure projects, including writing an ORM for Google's charted database that powered their advertising system at the time. Then I spent a good chunk of my time there on Google Reader where I was on the founding team. I built the original backend for that system. I left Google in 2009 and went on to a number of startups, including FriendFeed, which was acquired by Facebook, and then Brisley, Dropbox, auto startup called Viewfinder, where I got back together with Spencer and Peter, and then that sort of got acquired by Square.

So through all of those different companies that I named, they all had one thing in common, which was that they were all using sharded MySQL as their main...as one of their primary data stores. So I get to see, in spite of the limitations of that approach, and as did Spencer and Peter. While we were building Viewfinder, we found ourselves kind of dissatisfied with all of the database options that we were seeing out there have been kind of spoiled by Google's GFS, Google File System, Bigtable, and other highly scalable and automated storage, database data storage systems.

We were kicking around this idea for something that Spencer decided to call a Cockroach, that would be a scalable and indestructible data store. We didn't start working on it, then because we had our hands full building, what we hoped was going to be the next big mobile photo-sharing app. But when that didn't pan out, and we found ourselves at Square facing a lot of the same difficulties around sharded databases, we decided that it was time to dust off that old design document and actually build it and subsequently build a company around it and bring it to the market.

The story behind CockroachDB

Guy Harrison: That's fascinating. That's not the first time I've heard someone say they were building an application and realized that they needed a better database, and in the end sort of pivoted around to making the database, the product, not the application. In fact, that's the MongoDB story in a way, not that we still necessarily love MongoDB. At CockroachDB, but they also were building a sort of a platform and then decided that they would spin off the database side of it. And by the way, Ben, I've never mentioned this to you, but I was an avid Google Reader user, I almost cried when it was deprecated. Every day, I'd use Google Reader to keep up with things. That was a sad day for me, but people won't remember that now. This is an RSS reader that was sort of collating everything from the web that you might wanna sort of be interested in. Even pre-Twitter, I guess it was right? It was a...

Ben Darnell: Yes. Google Reader was launched in 2005. Twitter was up a couple of years after. So yeah, Reader, never had the scale of fan base to really hold the attention of a behemoth like Google but does have its very devoted fans. I do hear from time to time even today about people that may say that now that's gone.

Google Spanner: The precursor of CockroachDB

Guy Harrison: Yeah, so you were saying that at Google, obviously, Google pioneered a lot of stuff that led to open-source and other database products that do sort of like flowed out of some work at Google. But you were sort of inspired more by Google's Spanner database. Was it called Spanner at the time? Or were we talking about something sort of pre-Spanner?

Ben Darnell: That's right. So I wanna distinguish, that what I actually used at Google was Bigtable. And then towards the end of my time, at Google, they were starting up this new project called Spanner. In 2012, they published a paper about a lot of the details of Spanner, and that really became a big inspiration for CockroachDB. So in Spanner, you see a lot of the same design elements around using data that's been broken up into different segments and doing distributed transactions across those segments to give you a very high level of consistency in a very broadly distributed system. Then subsequent additions to Spanner added SQL capabilities. And that was kind of our first guiding light as we were building CockroachDB was trying to build Spanner for the rest of the world.

Guy Harrison: Yes, and correct me if you think I'm wrong about this, but really the sort of the key idea behind Spanner was that it was possible to keep consistency in a distributed system, sort of roundabout 2008, 2009, there are a lot of databases that through consistency away. So we've got the term eventual consistency, it was just too difficult to maintain everything in sync, in a sort of like an always app globally distributed network, like the internet. And so Amazon pioneered Dynamo and Dynamo, sort of like concepts in Dynamo chemical databases, like Cassandra, and the idea was- it doesn't matter if things aren't right, as long as they're fast and available. And Spanner, I guess that Google sort of rebuttal to that was, we'll just make it so available, that it won't really matter, you know, yes, it still technically might fall over, in order to preserve consistency, but we'll make that so unlikely that no one will care. Do you think that's a fair sort of summary of what the idea was?

Ben Darnell: Yes, I think that's exactly right. I think looking at another couple of Google Storage Systems, Bigtable, which I've already mentioned, was an eventually consistent system. And one thing that a lot of teams that Google found is that sometimes you do need a high level of consistency. And so there was a kind of an add-on called Megastore that ran on top of Bigtable and then give you transactions, but because it was this layer on top, it was pretty inefficient. It was something that you had to really think about whether it was going to be worth it to use it. But when Spanner came along, I think the key thing, I think, for Spanner, and for CockroachDB that both of these products bring to the table is the idea that if you embrace the need for the strong consistency and distributed transactions from the beginning, and build them directly into the database, you can do a much better job on it than if you start with an eventually consistent system and then try to bolt on some sort of lightweight transaction mechanism on top of it.

And as a lot of the, you know, recent non-SQL systems have been learning. And so, you know, speaking for CockroachDB, we have our full distributed transaction, implementation providing serializable isolation, that is competitive with the Read committed isolation levels that you see in a lot of other products, because we, you know, really made that a key design goal was to make the best in class, consistency and isolation and other transactional properties. We made that a priority from the beginning.

Guy Harrison: Right. So when you kicked off in 2016, is it fair to say that no one was offering a distributed sort of consistent database, you know, other than inside of Google? Were you sort of like unique at that point?

Ben Darnell: No we were not totally unique. I think the closest kind of peer competitor to what we were trying to do at the time was FoundationDB. And so I believe, in 2017 Apple did us a huge favor by acquiring FoundationDB, which both gave us the idea that we were pursuing a lot of credibility because it showed that this kind of distributed transactional database was a really valuable thing. So they gave us a lot of credibility for that and removed what could have been one of our bigger competitors on the market. So thanks Apple for that little move.

Guy Harrison: Yes, there's a lot of luck in software startups, right, you can be completely right, but in the wrong time, or some little event like that will remove a barrier that otherwise might have gotten in the way. I can't imagine how ambitious you need to be to launch a database company, you know, given products like Oracle have been around for 40 years, more than that, really. Oracle's probably coming up to its 50th anniversary of version one or something like that. Is that right? Yes, it is. It's now 78 maybe? Anyway, a long time, right? And they've got layers, and layers, and layers.

Ben Darnell: Certainly it is 44 years from now. So getting close.

Early adoption of CockroachDB

Guy Harrison: So you sort of seem to hit the ground running, I guess is that because you started with a sort of a simple concept, and then sort of laid your way to where you are now? Or, you know, you can't come out of the gate running with all the features that you need? So how difficult was it to sort of, you know, launch and get early adoption and so forth?

Ben Darnell: As you say, databases are hugely complex products, the incumbents in the marketplace have had decades of experience building and refining their systems, and people have really high expectations, especially when you're aiming to be kind of a transactional and operational database, that's really in the most mission-critical parts of your system. So it's been a difficult hill to climb. But I think that from the beginning we were able to offer some compelling and unique features, from the kind of self-healing and enhanced administrative capabilities for example the ability to run in a geographically distributed fashion, which is really one of our more unique capabilities.

We were also inspired by the history of the NoSQL databases, they showed that you can strip your feature set way down and still get something that a lot of people will be able to make use of. So early on, we just had to be really transparent about what our limitations were, early versions of CockroachDB, for example, had warnings in the documentation about how you probably shouldn't use joins, which are a core capability of SQL. But because early on, we didn't have an optimizer that was able to execute joins efficiently. And we had to warn people to stay away from them when performance matters. But, you know, that's, something that you can live with, it's not ideal, but if you value the other parts of the product enough, then it's something you can live with and workaround. NoSQL showed that because most of the NoSQL systems didn't have anything like joins, at least not at first, and you can work with those. It's the product management adage, if you're not embarrassed by your version one, you ship too late, you know, you just have to ship something basic that ticks all the boxes for one type of user and then expand from there.

CockroachDB use cases

Guy Harrison: Yes that's fascinating. So in a way, if you tried to launch, before the NoSQL databases that have come out, you might have been laughed out of town. But when you launch and you're comparing yourself against, I guess, in 2016, MongoDB, had no joins, and Cassandra still doesn't have joins. And so you're in a position where you can...which is what you did, you basically started out as a NoSQL, you know, it's really you didn't have a SQL language implementation. So you're able to, sort of grow into that niche, whereas I think, prior to the sort of 2010s, you really had to have a complete SQL implementation from the beginning to get any adoption.

But anyway, we're sort of like we're dwelling a bit on ancient history, you know, five years later, CockroachDB has an absolutely complete SQL implementation. In fact, it's as feature-complete as any database I've worked with, you know, there's no sort of obvious gaps in what you can do with it. And you guys are powering along with adoption. I'm not a member of the CockroachDB company, but I think you're most outside observers would have made sure CockroachDB is one of the leading database innovators anyway, around today. So when we look at CockroachDB today, a distributed SQL-based operational database with very high availability and complete SQL language implementation, when do you think developers...when they're starting when do you think they should be thinking, "This is a job for CockroachDB versus I don't know, this is a job for PostgreSQL, for instance?" Is there some class of application that you're particularly focused on? Or do you feel like, it's a general-purpose database that can be used for anything?

Ben Darnell: So my engineering answer is that it's a general-purpose database that can be used for anything, but I know that that's not a very useful answer to a lot of people. So that, you know, talk about some of the types of customers that we have the most success with, I think, our emphasis on strong consistency makes us kind of a natural fit for financial services, anything where data inconsistencies can have real financial effects. But what we see especially that banks and other big financial services are very heavily regulated, they're very risk-averse, and they're very slow to adopt new technologies. But what you see is actually very strong adoption in kind of adjacent fields. One area that has been a surprise for us has been in the space of gambling applications, which is, you know, it makes sense when you think about it. Because what a gambling application does, is broadly very similar to a lot of things that a bank would have to do in the sense that it's maintaining ledgers and moving money from one account to another. Everything is very important in terms of getting all the bits of data right and not having money disappear, or if you're out of thin air.

They're regulated very heavily, but they're regulated in a very different way. Banks, they're regulated, to be very risk-averse. So if you're a bank, you can't ever go down. Because that's obviously very bad for your customers. The regulations around gambling are much less risk-averse in that sense. There are still regulations around data,  data accuracy, and that sort of thing. But, you know, in some ways, it's a looser environment. We've been able to get kind of a foot in the door in this industry, and use it to kind of hone the features of the product that are appealing and necessary to do banking and other kinds of bigger names, financial applications, and companies. That's been a really interesting pathway that we kind of stumbled across this segment of the industry and it's been very fruitful for us.

Guy Harrison: What about general web apps are people using CockroachDB just as a general-purpose database? If I'm building a sort of like a web application, I can pretty much build it on anything that is...and what we haven't mentioned, for those who don't know, CockroachDB is PostgreSQL wire protocol compatible, which means that you can use all of the Postgres drivers. There's no sort of specific CockroachDB application stack that you need to use, it's broadly compatible with a lot of stacks that support PostgreSQL. So, it's pretty easy to use, I can just install it on my laptop, or I can get a Cloud service and I can start building apps on it. Just as I could with, you know, Postgres itself or MongoDB, or MySQL or what have you. Is there any sort of like reason not to use Cockroach as a sort of like in those scenarios? It's got a lot of features, but if those features holding you back when you just want a sort of like a simple database service that supports SQL?

Ben Darnell: No, I think that one of the great things about Cockroach is that it's a database that can be a good fit for you on day one, and can scale out with you even as your application or business expands worldwide. I think from day one setting up CockroachDB can be a lot easier than setting up, you know, so it may not be as easy as just doing an apt get install Postgres to get a basic single node Postgres server running. But by the time you add in replication and backups, and all the sorts of things that you need to do to make a Postgres instance, really production-ready, all of those things are a lot simpler when you're dealing with CockroachDB, because we have replication built-in as a core first-class concept. And you can do auto-scaling, rebalancing, and things like that automatically.

So even from the very beginning, it can simplify your operations. Or we don't scale down quite as small on minimal levels of hardware, as a lot of the kind of traditional monolithic databases do. And we're certainly not competing in the same space as SQL Lite, or something like that. But when you look at our Cloud offering with CockroachDB Serverless, it can be an incredibly affordable way to go even for very small applications, CockroachDB Serverless has a fairly generous free tier, where you can get a cluster set up for free and just start using it. Then once you exceed the limit of that free tier you just put in your credit card number and get billed based on your actual usage. And so that can be a very affordable option in comparison to a lot of options that would have you running a dedicated virtual machine 24/7 and paying for those costs.

Guy Harrison: Yes. I'm building an application now that it started off using MongoDB as the backend, but right after,  working on the book with you, I thought, "Well, I might as well just try Serverless CockroachDB." And honestly, it was the least complicated thing about setting up the environment for the application, I just had to sign up, download a certificate and connect to the server. I don't notice the database, I just send these SQL requests, and I get data back. I don't have to worry too much about all of the complexities that we've been talking about. But I know that if I needed to scale globally, I'm in a position to do so.

The Cloud and CockroachDB

Guy Harrison: I think that, when you think about cloud databases, a lot of the latency is gonna be in the network anyway. So you might have a database at the backend in the cloud that's sort of like got a little bit extra overhead, because it's doing so much more in terms of replication and availability, and so forth. But it's going to be sort of like minimal compared with the amount of time it takes to get data across the network. So you're not going to notice it, at least that's been my experience. It's performed great. Okay, well, yeah, so the move to the cloud, I guess, when you guys started, like most vendors five years ago, your offering was on-prem. Now, there are sort of two flavors of cloud offering, do you wanna talk a little bit about the cloud strategy and CockroachDB.

Ben Darnell: Sure. So first, just to lay out the product lineup explicitly, there are four basic ways to get CockroachDB. First is CockroachDB core, which is essentially our open-source edition, in which you download the source code or the binary and just run it yourself. And that this is always free. We're actually, open-source with an asterisk, we do have a license restriction that prevents the premise of using CockroachDB core in a commercial database as a service. But for any other purpose, you can download it and use it for free as much as you want sources all day or you can modify it recompile it whatever. We also have CockroachDB enterprise which adds additional features for a licensing fee. This includes features like geo partitioning, and other features that are especially useful in geographically distributed deployments, and includes Change Data Capture for streaming changes from the database out to Apache Kafka and from there into other parts of your company's data systems. And also integrations with single sign-on and other things that may be important in corporate environments.

So those are our two self-hosted options for CockroachDB. And then in the Cloud, we also have two options. Here we have cockroachDB dedicated, which is similar it's essentially equivalent to what you would get in the self-hosted product where we just run that, you know, you get dedicated VMs in a totally isolated environment running CockroachDB enterprise just for you. And so that's kind of our premium Cloud offering since it gets you all of the, you know, all of the best experience you could have from hosting yourself, except we do it for you. And then we have, as I mentioned earlier, CockroachDB Serverless, which is the kind of entry-level product, which is currently in beta, but it lets you run on a small scale and then grow out as you need to. This is implemented as one or more large CockroachDB clusters run by Cockroach Labs, and then a separate front-end process per customer cluster, which gives you access to a slice of the resources on the backend cluster.

And so this is how we're able to keep the operating costs low enough that we can offer a generous free tier in the serverless product and then you can scale up and pay as you go from there. So that's sorry...I went on to a long feature and I actually forgot what the question is.

Guy Harrison: I guess the interesting thing, from the evolution of databases, is if you've got your single monolithic database running on your own hardware, which is where I started an industry, and probably you too, you know, and you're responsible, I was a DBA, for a long time, you're responsible for every aspect of that operation. Then incrementally, you moved into the cloud, where you're less responsible for a little bit the hardware, and then you get to a cloud system in which the scaling is sort of managed for you. So a lot of your DBA tasks, and backup tasks are gone. Then you get to serverless. And where, you know, amazingly, just about all the tasks are gone, you know, like, as long as the implementation is good at the backend, it will scale for me, as my demand grows, and I don't even have to lift a finger, you know, I guess I have to lift a credit card. But that's all. And it's a very attractive option, I think.

And you guys have pulled it off, so far, as I can tell very well, right? You know, like the experience, I haven't run a service at scale yet. But there are a few cases I imagine where you're a sort of like a very large organization where you'd wanna dedicated cluster. So you can be sure that you had control over exactly how many resources you were applying that you could be really prepared for spikes in load and stuff like that. But for your average application, I can't imagine that serverless isn't the future way that almost everyone will consume database services because it's just eliminated so much of the human cost of using a database. I'm giving you...I'm not really asking a question. I'm saying something I'm sure you agree with. But it's a very exciting move. CockroachDB is one of a couple of companies that are offering truly serverless options for consuming databases. And it's well worth checking out.

I'll just say from my personal opinion, if you're gonna get started with CockroachDB, as a developer, you can either sort of like do if you're on a MacBook install, or something similar if you're on Linux. But the serverless option makes a lot of sense. And it's just as easy to get started with. That's the way I prefer to work with it at the moment. Having sort of, like, you know, gone through the wringer with you writing the book, experiencing all of the overhead of running a self-hosted cluster. There's a lot of fun setting up clusters if your mind works that way, but it's not productive to be trying to replicate the expertise that the Cockroach team has at the backend.
Sorry, I know, that's a little bit of a sort of like an ad for serverless, no charge for that, Ben.

Analytical vs transactional databases

Guy Harrison: So one other thing I'm interested in, as someone who sort of like hangs around databases. A lot is the sort of like the perennial split between the operational databases and the analytic databases. So for a while, we sort of had one size fits all databases, up until about 12 years ago, 15 years ago, data warehouses, there were some specialized data warehouses, but they're also sort of people running data warehouse as an Oracle, SQL Server. Oracle SQL Server, we use these operating systems as well.

We had this sort of period where SQL went out of fashion and now SQL is definitely back in fashion. But we're still got this split between operational and analytic and in the modern world, I'd say that that split is sort of like best illustrated by SnowflakesDB, which just recently had one of the largest software IPOs of all time. I think it actually is technically the largest software on RPR of all time, which is an analytic database running in the cloud that fully supports SQL. And then we've got operational SQL, distributed databases, CockroachDB being sort of probably flagship representative of that class of database. But they've split. There's sort of we have different workloads that each are sort of like optimized. Well, they both can do a bit of each other's sort of processing, but they're clearly optimized for one or the other, do you think those two worlds are gonna sort of pull back together at any time soon? Or do you think we're, for the time being, you want to choose a different platform for when you're analyzing data, than from when you're sort of like during high-speed transactional processing of data?

Ben Darnell: I think that this division is going to continue to evolve. I mean, already it's not a bright line between the two classes of applications. But this is something that we saw early on when we were developing CockroachDB. We wanted to be very explicit, telling people how things are. Early versions of CockroachDB had warnings not to do too much with joins. Even after that limitation was lifted we continued to tell people, "Well, we're not an analytics database, don't expect us to be able to be best in class performance at these analytics types of queries."

But as a database vendor transactional databases, and analytics, databases are different categories. To a customer, it's not that clear, some customers are very rigorous about separating their analytics off into separate systems, others are not, they'll just run there, you know, analytic style queries on their main database. So there's still an expectation that those things will work reasonably well. We've done our best to try and do that kind of query the best we can. But when it comes down to the kinds of technologies that you want, analytics processing is very different from what you want for transactional processing. For transactional processing you want, for example, row-oriented storage, while analytics tends to want columnar storage.

Now that said, even though the technologies may be different, you may have a single database product that provides both. I think that there's definitely a possible future in which, you know, CockroachDB replicates all of your data, in multiple ways anyway. If Cockroach could keep some of those replicas in columnar format and some in row format, and then send queries to one or the other depending on what that query needs. You know, that would be a way to get kind of a best of both worlds under one roof kind of situations. So I think there's definitely room for evolution like that. But the history of this industry is littered with companies that have said, "Okay, we're going to be the only database solution that you need, we're going to be good at everything. And therefore, we don't need to be good about integrating with other kinds of database systems. That has just been demonstrated time and time, again, to be the height of hubris, and that leads to a fall. So I don't wanna say that CockroachDB is going to be the ultimate unification of transactional and analytic workloads. But I do think there's going to be a continual kind of blurring of the lines and evolving to handle, more and more kinds of specialized workloads.

Guy Harrison: Yes. So there are a couple of companies that are trying to do a sort of hybrid job. And it seems to me, that they have to double up on their engineering effort to be good at both and even then, they're usually not the best that either of those two things so they tend to come into competition. There's usually one of these two requirements, which is the primary one, that's the decisive requirement, and they fail at that decisive requirement. It's all very well...Let's say you've got to focus on software engineering, you can't afford to sort of have too many threads of effort because you'll just end up sort of failing at some of them and...

Ben Darnell: Yes, so you have to focus kind of on both sides of the divide. So as the database vendor, we chose to focus on operational transactional needs. But also as a, you know, customer of a database, you have to choose what areas of your data you're going to try and optimize and where you're going to essentially spend your operational budget. So when I said, some companies are just fine running their lightweight analytical queries on CockroachDB. You know, if you look at it from that company's perspective, if you're a little startup and you're running CockroachDB, are you going to wanna set up Kafka and Snowflake and deal with integrating these three systems?

I mean, sure, you'll get a better analytics experience with Snowflake than you will with CockroachDB today. But it's a much higher operational cost to just have to deal with that many different moving parts. And so that's another factor that I think you have to kind of pick your battles and decide when you're going to spend that operational complexity. And I think that when you look at the amount of money that some companies get, your larger enterprises spend on data of all kinds. Yeah, it makes sense there to really optimize and make sure that you're getting best-in-class performance for every, every kind of subcategory of data. If you're not operating at that scale, then that's when the more hybrid solutions are appealing, even if they're not individually best in class at any particular task.

Guy Harrison: That's a good point, you're always you know, data is never just there to be processed, there's always some analytic reporting or similar function. And in CockroachDB for people who aren't familiar with it. One sort of thing I learned while participating in the book is that when you want to read large amounts of data from Cockroach, you can use the AZ of the system time clause to read it from a snapshot that avoids all of the transactional consistency overhead. And which results in a sort of like a much better outcome, because otherwise CockroachDB will, actually, I might be able to bet back to MongoDB but CockroachDB will try to give you an absolutely consistent view of the database as it is right now. And if there is concurrent work going on, then CockrochDB may continuously sort of like bump that statement to try and sort of get the most recent data. But if you say as of system time, five seconds ago, it can just sort of ignore all that and just go with the MVCC snapshots that are consistent, but not absolutely up to date. That's one of the interesting things about CockroachDB, that the use of that clause to optimize your processing is something that's fairly unique to Cockroach, at least something I haven't seen in other databases, but it's an important sort of technique. It's a bit nuanced, but it's an important way to serve optimized processing.

Ben Darnell: Yes, this particular technique is of system time clause, which we informally call the time travel clause, to let you run a query as of, you know, 5 seconds or 10 seconds ago. That particular feature, as far as I know, is unique to CockroachDB. But a lot of what you can do with it is similar to what you can do in other systems marking their transaction as read-only. So it's common in a lot of systems. If you microtransaction is read-only, maybe that sends it to a read-only replica instead of to the primary. And that is kind of functionally equivalent, or functionally similar to what to do with the system time clause. The nice thing about the system time clause is that...especially with the new feature that we just introduced called Bounded Staleness, where you can say, "Give me the latest data you have, as long as it's no more than 10 seconds old," for example, that would let it send your query to the physically nearest replica to you get whatever data it has. But if it seems that that replica happens to be really far behind, for some reason, then it will fall back to getting fresher data from a remote replica. So it gives you a way to fine-tune exactly how much stale data you're willing to tolerate for your application.

Guy Harrison: Yes, as a system time is awesome too because it lets you do sort of like a Get Out of Jail Free card type thing for screw-ups, you can delete all the rows out of a table, and, "Oh, whoops," and then you can just use as a system time to get them back. You can even do a backup of the database after you've ruined it. I thought I had that when I was a sort of a young DBA how much easier my life would have been. Yeah, that's another story.

Ben Darnell: Yes, this is one of the features that blow people's minds when they first see it.

What developers should know about CockroachDB

Guy Harrison: Yes. What are the other things in CockroachDB that developers should know about? Or if you sort of want to let people know a few things that they might not know about that would help them in their life, what would they be?

Ben Darnell: Well, the thing that comes to mind right now talking about as a system time as a mind-blowing feature, the way that...So we have these multi-region features for running a database distributed across multiple physical locations. As a part of that feature set, you have the ability to mark tables as either global or regional. The way that global tables work is actually really, really neat. There's a lot of magic under the hood. But when you write to a global table, it actually...it effectively writes to that table kind of as of system time in the future, instead of the normal uses of as a system time to write into the past. So the right goes into the future, and then it kind of takes effect when real-time catch up with when the right was scheduled to happen. So as we mentioned, as a system item as one way to make read queries not interfere with rights, this is another way you can have, instead of pushing your reads into the past, you can let your read stay at the present and push your rights into the future.

How is the book different from the documentation?

Guy Harrison: Wow, that's fantastic. So I guess we're coming towards the last part of the conversation, maybe we should talk a little bit about the book. For people who have read O'Reilly's Definitive Guides, I don't think you'd be surprised at the kind of layout of the book, it tells you everything that you need to know to run CockroachDB locally, in the Cloud, sort of reference to the SQL language, programming practices. A lot of stuff for administering if you're administering on-prem, and for that matter if you're administering in the Cloud. But there's, as we talked before, there's a lot less. What do you think? Ben, what do you think the book sort of offers that isn't there in the documentation? Because the documentation for coverage is pretty good. So what's the role of technical books in today's sort of online age?

Ben Darnell: That's a tricky question because...

Guy Harrison: It's a tough question because there is like one and then there is no one.

Ben Darnell: Our docs are really good and they're always getting better. So it is a tricky question to answer. But I think the big thing that a book has, that most online documentation doesn't, is it's just a kind of linear flow. Like, you can't very well go to the CockroachDB website and read through all of our docs in order. But you can get the book, and you can identify the section that you want, whether that's kind of the introduction or the deep dive into CockroachDB architecture, or, we have a run of chapters for the application developer and a separate run of chapters for a database administrator. You can pick up the section that kind of fits your needs, read that straight through, and get pretty comprehensive coverage, which is a little harder to do in most online documentation, it's harder to figure out which parts you need to read and when you can be sort of done. You know, that's a big thing that's missing that there's, you know, the online documentation never ends, you can always find more. But with a book, you have a, you know, at least a reasonable estimation of, you know, what the important part is? And then when you've finished that much, then you're in a pretty good place.

Guy Harrison: I'd agree. I think this is more for when want to learn the core techniques. And we tried hard to future-proof it by not bogging down in this sort of like the latest syntax. But it's sort of like runs you through what you need to know, in a sort of like, a well-organized manner.

 I've bought a lot of technical books. And when I started, the technical book was there because there was no internet. And so I had Unix in a nutshell, so I could look up Unix syntax commands. That's not the role of books today. It's more I think, you want to sort of like fast track yourself into a new technology. And the book has all the information collated, curated, and presented, in a nice way for you to master the topic in the minimum amount of time and I think that's what I'd recommend. What I'd recommend people to get it. So Ben Darnell have you got any questions you'd like to ask me as co-author, or we're just wrapping up anything you'd like to say, that we haven't mentioned?

Ben Darnell: I come to this as the founder of the CockroachDB project and company. And so I have very much an insider's view on how all of this works. But I'm curious to get your perspective as an outsider coming to CockroachDB for the first time, for the purposes of this project, do you have any comments about CockroachDB from that perspective, anything that surprised you, whether it was good or bad?

Guy Harrison: Yes. So just to sort of put that in a bit of context, I've been working with databases since the '80s. And, you know, it's been a long time with Oracle, MySQL. Firmly tied with MongoDB, over the past few years, and I'm a real sort of fan of databases. I don't really care what sort of model they have, or, whether they're simple or complex. I'm generally sort of like locked into what's good and sort of try and gloss over what's not so good. I was really happy to see CockroachDB emerge as a leader because it was pretty obvious that the SQL language had been thrown out of products like Cassandra and others. Not because there was anything wrong with it. But because it was associated with strict transactions, which, the founders thought needed to be discarded in order to achieve some availability and performance goals.

I was happy to see SQL micro resurgence. And I was even happier to see that I can come back to a modern database that was respecting consistency, and transactional integrity, because anyone who's worked with non-transactional systems knows you can get things done, but you have to write an enormous amount of code just to protect yourself against inconsistencies. You don't know for sure that you're gonna read your own write, or you don't know for sure that something that you've written is even going to persist in the database. And that makes a lot of fragility in code. So you have to have extra testing, redundancy, and so forth. That sort of stuff is stripped away with CockroachDB. I was very impressed with how easy it was to get started.

It is very sophisticated. So I was impressed with two things. One, I was impressed with what you guys have achieved over a relatively short period of time in terms of having a sort of architecture that's best in class for distributed systems. And at the same time, I was impressed with how easy it was for me to get started. So the fact that it was Postgres compatible, meant that I didn't have to learn, really any sort of like new programming paradigms they're very few anyway. The SQL language was complete, there weren't sort of like big missing chunks of SQL that I had to sort of workaround.

So I could hit the ground running, and develop an application. And at the same time, I had this sort of  very sophisticated, scalable thing behind me that was ready to go anywhere. I know it sounds a little bit like congratulating myself on having  collaborated with you on the book, but I am an outsider, I've worked with all the different databases or a lot of different database platforms. And I think CockroachDB is a fantastic achievement, you should be very proud of it. And I think that that's only going to get better. I'd like to think we did a good job with the book. If you're wanting to get started with CockroachDB. You check out the book, you can have a look at it, the O'Reilly website. I'm not sure when this podcast will drop, it'll probably drop around about the time the book is fully available. But if not, you can pre-order. And you can look at the early adopter version, which contains pretty much the first half of the book, which is the developer stuff, getting started installing, you know, SQL language features and things like that. So probably, you know, by the time you've consumed that early adopted bit, the rest of the book will be available.

Outro

You can visit the Cockroach Labs website to get a free CockroachDB Serverless account if you wanna play around or you can download it using the sort of typical download things like Brew or apt-get on Linux. And so yeah, I'm sorry I rambled away from your question into a sort of a wrap-up. But I've really enjoyed working on the book you know, writing books is not necessarily a sort of a lucrative financial thing. But it is a great way to learn about new technology. And I'm all about learning. So I've really enjoyed working with you on the book Ben Darnell. And you know, I recommend it to anyone who's interested in CockroachDB it's a pleasure working with you.

Ben Darnell: Thank you, it's been a pleasure working with you as well. I also wanna recognize, take a moment to recognize here, our third co-author, Jesse Seldess, who's not with us on the podcast today, but he also worked very hard with us on this book over the course of the last year.

Guy Harrison: Absolutely. And Jessie is responsible for the excellent online documentation and educational content at Cockroach Labs website. We couldn't have done it without him, that's for sure. Okay, well, that's a wrap, I guess. Thanks, Ben. Enjoy the rest of your day. And thank you Guy Harrisons for tuning the podcast and let us know what you think. Bye.

Ben Darnell: Thanks, Guy.

Intro
The story behind CockroachDB
Google Spanner: The precursor of CockroachDB
(Cont.) Google Spanner: The precursor of CockroachDB
Early adoption of CockroachDB
CockroachDB use cases
(Cont.) CockroachDB use cases
The cloud and CockroachDB
Analytical vs transactional databases
What developers should know about CockroachDB
How is the book different from the documentation?
Outro