GOTO - Today, Tomorrow and the Future

Erlang, the Hidden Gem: Solving Problems at Scale for 30+ Years • Francesco Cesarini & Preben Thorø

February 04, 2022 Francesco Cesarini, Preben Thorø & GOTO Season 2 Episode 4
GOTO - Today, Tomorrow and the Future
Erlang, the Hidden Gem: Solving Problems at Scale for 30+ Years • Francesco Cesarini & Preben Thorø
Show Notes Transcript Chapter Markers

This interview was recorded for GOTO Unscripted 2021.
https://gotopia.tech

Read the full transcription of this interview here:
https://gotopia.tech/articles/erlang-solving-scaling-30-years

Francesco Cesarini - Founder & Technical Director at Erlang Solutions
Preben Thorø - CTO at Trifork Switzerland

DESCRIPTION
There is an entire language ecosystem behind Erlang programming, and Francesco Cesarini, founder and technical director at Erlang Solutions, has been using it to solve problems at scale for more than 30 years. Find out how you can leverage Erlang to your own benefit.

RECOMMENDED BOOKS
Francesco Cesarini & Steve Vinoski • Designing for Scalability with Erlang/OTP • https://amzn.to/3uCB43V
Francesco Cesarini & Simon Thompson • Erlang Programming • https://amzn.to/3FEko1F
Saša Jurić • Elixir in Action • https://amzn.to/2RZh5eN
Joe Armstrong • Programming Erlang • https://amzn.to/3fzY53g
Dave Thomas • Programming Elixir ≥ 1.6: Functional • https://amzn.to/34Dw3O5
Simon St. Laurent • Introducing Erlang • https://amzn.to/3pbIni6
Logan, Merritt & Carlsson • Erlang and OTP in Action • https://amzn.to/3pjZqP7
McCord, Tate & Valim • Programming Phoenix 1.4 • https://amzn.to/3zcUqj4

https://twitter.com/GOTOcon
https://www.linkedin.com/company/goto-
https://www.facebook.com/GOTOConferences

Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at https://gotopia.tech

SUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily.
https://www.youtube.com/user/GotoConferences/?sub_confirmation=1

Twitter
Instagram
LinkedIn
Facebook

Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket: gotopia.tech

SUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!

Intro

Preben Thorø: Maybe even before I introduce you, we're using Riverside here. Is Riverside running on Erlang?

Francesco Cesarini: I'm not aware of [that]. But I think there are quite a few streaming platforms and frameworks written in Erlang, and Elixir, which can be used... I think we were working with Oval which was number two after Skype many years ago. All of the connections of the video stream were set up by Erlang. Cisco and Ericsson's video systems are all Erlang-based. There is a membrane which is a video streaming framework, which can be integrated and is written in Elixir. There are many, many, many others out there as well. So, at the end of the day, I think video streaming, all you're doing is instead of connecting to phone calls, you're connecting a few video streams. So, you know, the business logic is very much the same.

Preben Thorø: That makes sense. Before we get too far here, may I ask you to introduce yourself?

Francesco Cesarini: So I'm Francesco Cesarini, the founder and technical director at Erlang Solutions. I've been working with Erlang, you know, back since the '90s, the mid-90s. And I'm very fortunate to have seen a programming language becomes an ecosystem of languages. If you would have asked me back in '95, if I was still working with Erlang in 2022, I would have said probably not, but I still am, and we're still kind of solving problems that were relevant then and probably even more relevant today.

Erlang solving problems since 1995

Preben Thorø: Yes. Well, welcome to our Little Unscripted series. You said '95, the mid-90s, Erlang is way older than that, isn't it?

Francesco Cesarini: Well, Erlang, the language itself, started working in the late '80s. What the computer science laboratory was trying to do is figure out, how do we program the next generation of telecom switches? It took them a few years. I think the first real fast virtual machine was ready in '91. Then in 1992 they just started developing the first product, which was then released in '94. So I'd say '94, '95 is when it was ready to be used outside of the lab and it started becoming mainstream and started being used within some of the major projects within Ericsson.

Preben Thorø: Ok. I thought it started in the '80s, but I was wrong. Is it a coincidence that Erlang I suppose, has something to do with the Ericsson language… is it a pure coincidence that there was a Danish professor, I think his name was Agner Krarup Erlang or something like that, who invented some queueing theory? Is there a connection there?

Francesco Cesarini: There's a connection. Erlang was named after Agner Krarup Erlang, the Danish mathematician. So for those of you who don't know him, he was a founder of a kind of tele theory, the telephony theory. He created the Erlang formula, which is the formula used to figure out if, you know, all of the lines within a particular call center are busy at any point in time. But as Ericsson management was paying for the development of Erlang, they made Ericsson management believe that it was named after Ericsson. So Eric Lang, Erlang, you know. So, management thinks it was named after Ericsson. Those on the inside know it was named after the mathematician.

The deep secrets of the Erlang language

Preben Thorø: Interesting. Interesting. So Erlang, Ericsson language, that's more like marketing. Interesting. How does it work? Actually, what are the deep secrets of the language?

Francesco Cesarini: Well Erlang itself is just a programming language. I think there are three things, which when put together give you the secret sauce. One of them is the BEAM virtual machine.

It's a virtual machine that is highly optimized for large-scale concurrency. It's been optimized to scale multiple architectures. And recently they've added that just in time the JIT compiler. So that's one-third, I think, of the power. The other third is something we call OTP. OTP is a middleware way of abstracting from the concurrency models, which increases the programmer's productivity. But also on top of increasing the programming productivity, they hide all of the tricky parts of dealing with full tolerance and with concurrency. So by using OTP and by using the programming principles, your systems will scale and by default, be resilient.

Then the third is I would not even say Erlang itself, but the semantics of the programming language of the Erlang programming language. These are semantics, which most languages running on the BEAM today, so including Elixir, by default, inherit.

Those three together, that's when you get the real power of the ecosystem. And just to quote Joe Armstrong, you can copy the libraries — which is what's happened with OTP on the JVM or in dotnet, and many others... Let me say, I've seen it being copied in Java and many other programming language ecosystems — so you can copy the libraries, but if it doesn't run on the BEAM, you cannot emulate the semantics. It's the three put together which give you the full power. And the semantics of the language have a very tight one-to-one mapping with the operations of the virtual machine. Then OTP is built on top of that to facilitate and hide complexity from the programmer.

The BEAM Languages

Preben Thorø: So the idea that Elixir is the new generation of Erlang, that's not true. It's another language running on the same VM.

Francesco Cesarini: That is correct. Well, Elixir compiles to Erlang. That was a choice, I think, because Everlean did consciously, to be able to utilize all of the tooling and libraries, which existed in the Erlang ecosystem when he went in and created Elixir. And so, I would almost call Elixir a new version of Erlang with a slightly different syntax, different tooling and a different development approach to what we're used to in the Erlang world. And by doing this, by improving the tooling, by providing a framework, which was specific to certain types of problems, he opened, you know, the power of Erlang to a wide range of developers, for which, you know, it wouldn't have been accessible otherwise.

Preben Thorø: Yes, that's true because I have a feeling that Elixir, as you say, it's addressing a completely new audience as compared to let's call it the original Erlang.

Francesco Cesarini: Correct. Correct. Absolutely. You're perfectly right there. He did a fantastic job. I always ask programming language inventors, why did you invent language X, Y, or Z? And when asked that question, his answer was, I wanted to open up the power of Erlang, and Erlang virtual machine, so the BEAM to a much larger, a wider range of programmers. And more specifically, I think the first time asking that question, his focus was on web developers. So how do I bring the power of Erlang to the web development world?

Web developers and Erlang developers, it's telecom versus web, it's two completely different problems we were solving. These two different problems require completely different approaches. So, they require different toolings, different libraries, different frameworks. That also explains why our attempts of trying to bring Erlang to the web failed back in the mid-2000s. You know, there were a lot of web frameworks written in Erlang web — I think there are web servers — but none of them addressed the requirements of the web developers at the time. Instead, what they did is they address the requirements of those near developing telco infrastructure.

Fault tolerance in OTP

Preben Thorø: How does fault tolerance work in OTP?

Francesco Cesarini: Yes, so more than OTP, I think the fault tolerance is a very simple notion that you know, you've got processes and processes do not share state. They do not share memory. So what that means is you can have many processes running at the same time, and if a process... If something goes wrong in a process, so if there's a bug in the code, the process is running or the data gets corrupted, you just terminate that particular process. By terminating that process, all the other processes around it, which are not dependent on it, are not affected.

So imagine that you've got thousands of phone calls going through your system, each phone call is a process. And if something goes wrong with one particular phone call, you lose that phone call, you lose that connection, the other phone calls aren't affected. So that's a core principle of processes and processes not sharing state.

We then take these processes and re-group them into what we call supervision trees. A supervision tree is a process whose only task is to supervise other processes. When supervising these other processes, if a process fails, the supervisor is immediately notified of it and can react. It can decide how to go in and deal with that failure. Could we try to restart that process and reconnect that phone call or do we just ignore it, or are all of the other processes somehow... I mean, maybe it was a group call.

It was the host process that terminated, and it goes in and decides maybe we should terminate all of the other connections, you know, and then restart them. By doing that, what you're doing is you're removing failure and error handling from the hands of the programmer, and you're generalizing it.

So, you might have heard the whole let it crash approach. That's what we refer to in the Erlang world. When we let processes crash, we don't mean that you ignore a failure or we encourage it. It's just that we had to handle these errors in a slightly different way. And by handling them more generically, that's how we create this fault tolerance. We isolate failure and then we escalate it only when necessary, and we control it centrally, in a generic way. This greatly simplifies the code base comparisons with Erlang's C++ code, where they went in and implemented the same problem in Erlang in C++ resulted. Well, in the C++ codebase, about 25 percent of the codebase was error handling and fault tolerance. The equivalent in the Erlang codebase was about 1 percent. So, there's a huge difference in the codebase. So just by going down the Erlang route, your system becomes fully tolerant, but you'll also reduce your codebase by around 25 percent. I don't know if that makes sense but...

Preben Thorø: Well, it does if that's like exception handling is just being propagated out to the bookkeeper.

Francesco Cesarini: That is correct. Exactly. So we just pushed all of the exception handlings to the supervisor. The supervisor handles it in a standardized way, instead of letting the programmer deal with exceptions because again if you have an exception, you don't know why you got that exception, how do you deal with it? You don't know how to deal with it because if you knew, it wouldn't be there in the first place. So by generalizing how exception handling is managed, yeah, you get rid of exceptions or you become a very, very rare occurrence.

Preben Thorø: Akka, like Akka.NET or whatever frameworks there are.

Francesco Cesarini: Yes.

Preben Thorø: They're very much inspired by this, right? That is like coming directly out of the Erlang world even.

Francesco Cesarini: That is correct. I mean, Jonas Bonér...

Preben Thorø: So that is replicating... Go ahead.

Francesco Cesarini: That is correct. So Jonas Bonér started implementing Akka when he was working as a consultant on the customer project, and the customer wouldn't allow you to use Erlang to solve a particular problem. So he got so frustrated that he took OTP and the whole error handling in OTP and started porting it to the JVM. I think he did an amazing job at bringing it to the JVM. It's not for the faint of heart because the JVM wasn't built for... The JVM was built for parallelism. And what he did is he bought lightweight concurrency in green threads, which used to exist in Java, but, you know, got removed early on to the JVM.

It's almost like, you know when I was reading your original Java white paper, I had a sense of deja vu, which was a virtual machine and a concurrency model built-in memory management and a garbage collector. So this was in JVM, and I was working on the Erlang virtual machine at the time. But I think there's still a big difference between the Java Virtual Machine and the BEAM today because to bring Akka, you know, to the JVM, you wanna had to emulate a lot of the semantics and a lot of the functionality which exists in the BEAM, which a BEAM is highly optimized for, which doesn't exist in the JVM.

Preben Thorø: Yes. All the protection spaces around processes need to be replicated into a threat model instead.

Francesco Cesarini: Exactly. So you had to create all of that layer. And even there, I think he wasn't able to fully create an emulator semantics, because the Akka actors have to yield. You're putting that in the hands of the programmers versus Erlang, where your processes are given a certain number of operations and are allowed to execute, after which they automatically get suspended and the next process gets to execute. So you run the risk of an actor in Akka starving all of the other processes, all of the other actors. And that's not a risk you have on the BEAM, because, again, they've removed that from the hands of the program. The program doesn't even know how the processes are being scheduled and managed. They shouldn't know. They should just program your thinking concurrently and then the rest is all abstracted away from them.

Preben Thorø: So now, it becomes really nerdy and really interesting because these processes, yes, in the BEAM, are they process supported by the hardware by the CPU or is it some middle thing between look, a real process, and a thread?

Francesco Cesarini: Yes. So what happens is, when you start the BEAM, it will start a scheduler in a separate thread for every core. So assume you're running on a quad-core machine, the BEAM will start four threads. Each thread will run a scheduler. Each scheduler will have its fair share of processes. So you're assuming you're running you've got 400 processes, each schedule will have about 100 processes each. And then there's migration logic, which ensures that the different schedulers remain fairly balanced. Your processes might be migrated from one scheduler to another. From one thread to another, if most of the threads on a particular scheduler terminate.

Erlang on iOS

Preben Thorø: I don't know if it still exists, but there used to be a sub-project, some library framework called Lua. Does that still exist? This is something for the audience here. That back then at least allowed you to run Erlang on an iPhone. Does that still exist?

Francesco Cesarini: So yes, they still exist. It never really made it into production. I'm not aware of any Erlang or Elixir development on iPhones. I think you're right, that's the subject which I think was handled very well by these technologies. For the same reason as to why we failed to get Erlang into the web development space, at least us historically, we were all server-side back-end systems. Those are the types of problems we solved. So, even though you could develop Erlang on the phones, the types of toolings and frameworks you needed were very, very different.

Preben Thorø: I guess it goes very much against the idea of having a phone but that's another discussion. It's like a client and another server. But anyway, that's great.

Francesco Cesarini: We're seeing a lot happening now for Elixir making its way into an embedded space through your nerves, you know, they're graphical packages, which can run on handheld computers. And also to the point where I think we were running Erlang, and controlling the canvas with Erlang in cars, probably 15 years ago, yeah, almost 15 years ago. But it's now making it mainstream, becoming mainstream now where, you know, we're collecting more and more data in the cars themselves, and well, in all of the IoT devices themselves. And it's not becoming feasible anymore to go out and push this data, you know, to the Edge network and the cloud because of the large volumes of it. So you start analyzing it in the devices themselves or some cases also in the Edge network, where feasible.

So we're seeing Erlang and Elixir being used in those spaces more and more these days. And I think with the work which has been done in the JIT compiler, which has had a huge performance increases, and the work which has happened with numerical Elixir, which is then enabling the whole Axon framework, which is very similar to say to PyTorch, will, I suspect you'll be seeing machine learning now, moving on to the devices, onto the better devices, in cars, in IoT devices, and to a certain degree also in the radio base stations and Edge networks. So I think it's still early days but I think there are a lot of exciting things, you know, happening in that space. And all the components are being put in place for it to become viable and an alternative approach in technology for machine learning.

Erlang’s recent evolution

Preben Thorø: Yes, how much is the Erlang universe evolving right now? If we could isolate plain Erlang from Elixir, how much does the original plain thing evolve for the time being?

Francesco Cesarini: Very little. Very, very little. So, in Erlang itself, at least the programming language, there are very few changes happening. Most of the work I think is done around the libraries, the frameworks, but also on the BEAM virtual machine. That's where I think a lot of the effort is going in today.

Preben Thorø: Yes

Francesco Cesarini: But the language itself, I mean, Ericsson, once her Erlang is open source, Ericsson is the benevolent dictator. They've always been very conservative about introducing new changes for two reasons, A, they've got millions and millions of lines of code in production. So, any, you know, backward-compatible changes would have a huge impact on all the code they've got in production. But B, you know, if they start pushing out new features, they need to support and maintain their products. So yes, they're very, very careful over what gets released.

But a lot of the work and a lot of the focus is on really making the BEAM scale multiple architectures, making it fast, making it lock-free. You're seeing it with every release, all you need to do is go and... In some cases, you don't even need to recompile your BEAM code. You just need to rerun it. In some cases, you might have to recompile it. But back in the days, we used to joke that, you know, if your program wasn't fast enough, wait 15 months and then you're buying a new computer is gonna run twice as fast. Now, yes, it will run fast to the more cores you throw at the problem, but now all you need to do these days is just wait for a new version of the BEAM and recompile your Erlang code and it's gonna run faster than that. It's gonna run faster.

Preben Thorø: Thank you. It's been fascinating talking about this and I think we could go on all day.

Intro
Erlang solving problems since 1995
The deep secrets of the Erlang language
The BEAM Languages
Fault tolerance in OTP
Erlang on iOS
Erlang’s recent evolution
Outro