On November 29, 2018 at AWS re:Invent, the Software Engineering Daily team spoke with AWS technologists Adrian Cockcroft, VP of Cloud Architecture Strategy, Deepak Singh, Director of Compute Services, and Abby Fuller, Technical Evangelist, about the latest news at AWS.
CHANGES FOR AWS SINCE RE:INVENT 2017
During the last re:Invent, AWS launched SageMaker, a fully managed machine learning service. SageMaker made AI available to developers and data scientist. The AI projects are mostly within the realm of the IT infrastructure. One of the main projects Adrian Cockcroft has been involved with over the past year is the RoboMaker project and self-driving car technology. So now SageMaker and RoboMaker have come together and AWS uses SageMaker to simulate a robot running the algorithm and then people can deploy it into the car. AWS is now looking at bringing the technology into high schools and universities and using it for teaching.
“I think that’s one of the themes we have repeatedly is taking something that’s hard and complicated and making it easy enough that almost anybody can use it. What’s something that people are finding hard, and how can we make that easy?” – Adrian Cockcroft
Deepak Singh reflected that a theme that started at the last re:Invent and extended into 2018 is an emphasis on serverless computing and serverless technologies in general. Lambda announcements were made revealing AWS investing in low-level technologies to make serverless computing more effective with Firecracker. There are also a number of technologies like the DynamoDB automatic tables, which effectively make DynamoDB truly serverless where users automate everything.
“I think those are trends that will continue and a whole lot more focus on continued innovation on the machine learning side. And of course on robotics both with RoboMaker and the car the so I think there’s there’s a lot of interesting and continued growth in those areas,” said Singh.
Abby Fuller noted, “I think it from an abstract side, more focus on what lets developers just develop and less on, ‘how can I manage instances or compute?’ It’s just, ‘how can I build the code?’ That’s what powers customers’ actual businesses.” AWS strives to build efficient and fast solutions for developers.
Wide adoption of AWS Lambda is influencing U.S. product strategy. More of the AWS services are now emitting Lambda functions when something happens. For example, EC2 has been around for years but occasionally there’s an EC2 instance that is decommissioned and some cleanup is required. As Adrian Cockcroft summarizes the challenge, “Where do you run the code to clean up after a resource that has disappeared?” One of the ways this can be handled is by triggering a Lambda function which can tidy up ECS volumes, IP addresses, or whatever it needs to do, but it’s triggered by the fact that something went away. AWS is building integrations between Lambda and the other services.
At re:Invent, AWS announced step-functions, which have a lot of integrations, so people can build entire applications now using step-functions, a visual way to glue together the pieces that Lambda provides in different services but with a more stateful backend.
“AWS is driven by customers. Our customers wanted certain capabilities in Lambda. What they’re asking us to do with Lambda and Fargate are great catalysts. How can we provide more tooling to allow them to be more effective?” Singh poses.
Fargate improved development with Lambda as well as a lot of advancements in the software development tooling, such as cloud development kit deployment support for Fargate. AWS has created more tools that help customers move faster and focus on the code in their applications. A couple of examples are CloudMap and App Mesh. As customers run more Lambda functions and services, they want more visibility and control over how the services interact with each other. Explained Singh, “I think those are services that make it much much easier for customers to wrap their arms around these microservices-based architectures with these small things that are loosely coupled and I think it’s a great trend that helps our customers, but it’s there because that’s what our customers have asked for.”
Some people encounter difficulties when trying to do large-scale data processing with Lambda, even trying to implement mapreduce on top of Lambda. However, one example of a group successfully using Lambda for large-scale data processing is The Financial Industry Regulatory Authority (FINRA). They operate at a very large scale and find workflow simplicity in Lambda and other serverless compute offerings. Mapreduce can be run with a heavyweight management system, but nimbleness is lost. All the data is in an S3 data lake, where formation helped build those data lakes. The idea is that the data is there and users focus on writing the functions that allow data processing and aggregation. That’s a change in the programming model. There’s no cluster to run. There’s no file system to worry about. Users don’t have to wonder, ‘How do I replicate my data?’ The outcome is a simpler process with the same result.
EKS, THE MANAGED KUBERNETES SERVICE
In 2017 AWS launched EKS, the managed Kubernetes service. About six months later, EKS went to GA, so it’s running at a large scale for large customers and it’s rolling out in more regions around the world. New versions of Kubernetes are being released all the time, so AWS is tracking the new versions, adding support for things like Istio, so features that are emerging in the Kubernetes ecosystem can be run on top of Kubernetes.
AWS also ended up hiring one of the core committers from one of the projects that is a key piece of Kubernetes. With that engineer, AWS has sponsored that project into the CNCF. So it’s a combination of working with the community to develop different pieces of Kubernetes, working to integrate Kubernetes with other parts of the AWS service ecosystem, and then learning to run it all at an increasingly large scale with large clusters and customers depending on this at high volume as AWS works on the rollout.
Firecracker is a micro-virtualization system. It configures the KPM virtualization that most systems currently use, to provide a lightweight way to create a virtualized container in a few milliseconds. That means that you can use them for every Lambda function or every container that you want to start. Lambda functions don’t have great isolation, so that is something to keep in mind. What Firecracker lets users do is put all the customers on the same underlying machine with the isolation and security that you’d need to do that because it’s a virtual machine. AWS has managed to shrink the virtual machines to be individual functions.
Firecracker is deployed on a bare metal instance, on AWS or on-prem. The underlying component, the runtime piece, has been abstracted out of Docker: ContainerD. This runtime piece is shared by ECS and Kubernetes and is one of the Cloud Native Computing Foundation (CNCF) projects. AWS is working with the ContainerD project to plug Firecracker into it. Wherever you’re running containers, if you’re running ContainerD, you now have the option of using Firecracker underneath to create those containers.
Firecracker is in production use at AWS and was open-sourced at the start of re:Invent. There are a number of tools in the broader container and function community which could benefit from this kind of isolation model and the efficiency gains that Firecracker gives users. AWS wants to work with the community to make sure that customers are able to run their applications without making those kinds of compromises.
During KubeCon, AWS planned to continue the conversations in the community ecosystem surrounding ContainerD to further the implementation. ContainerD is not something people will interact with directly, but it will be working under the surface to make container and serverless workloads more efficient.
AWS developed Firecracker as a way to deliver Lambda and Fargate to customers in a way that’s secure but also efficient: before Firecracker users had to trade off on those choices. As a lightweight VMM, it’s optimized and purpose-built for those kinds of architectures. Singh summarizes, “Firecracker allows us to provide you the level of isolation that we believe meets the bar that customers should be at and meet their expectations but allowing us to be more efficient and provide fast boots. You can launch hundreds of them in one go.”
It may be best explained with how Fargate ran before Firecracker. Every Fargate task ran in a virtual EC2 instance. Instances take time to boot, and often waste space on that instance, but AWS made that choice because they did not want to ever run two tasks on the same instance. They made the VM the hard isolation boundary. Firecracker actually takes that one step further because it’s a smaller VMM. Since it’s a smaller surface area, it gives you the best of both worlds: the isolation of VMMs and the performance in speed and efficiency of containers. The programming model from a customer standpoint using Lambda is that they don’t have to think about it. They don’t have to touch Firecracker.
This is what enables AWS to give them a better service, as people incorporate Fargate into upstream container technologies, because it’s open source. It’s written in Rust. For ContainerD, which is a continuous runtime, there is a Go SDK and initial integration of Firecracker with ContainerD. “I’ve already seen the pull requests, as issues start coming in people are super excited about making those things happen. And so hopefully we’ll see a lot of engagement and we look forward to working with more people,” said Singh.
“Everyone always has the same goal, right? It’s how can I be more and more secure but also more and more performant without giving up one or the other. I think that things like Firecracker, we put them underneath Lambda or Fargate, but once people see that it’s possible, then everyone wants the same thing, right?” — Abby Fuller
For deploying infrastructure, users have several options including Kubernetes, functions-as-a-service, and container instances like Fargate. The selection relies on considering the use cases and whether users want to control the nodes that they’ve put the containers on. If a specific configuration is desired, or use various kinds of monitoring or security into the base instance is needed, then use ECS or Kubernetes to run those services. But to run a container and have AWS handle everything else, then Fargate is just less work. Run whatever size containers you want. To run just a few containers it’s overkill to have a complete Kubernetes cluster and multiple nodes there. Instead you can use Fargate to run the things you want to run, it’s much more about the layers of automation.
Fargate is really the extension of ECS and it’s tightly integrated there so you can use a combination if you want to. Customers who are deploying microservices in containers where they have decided to continue their good app packaging system where they have existing code, but not architectures that they’d like to continue using, may have some experience with container orchestration like ECS or Kubernetes or EKS. For those customers, Fargate’s a great way to stop thinking about the classic cases: ‘Which two apps live in which cluster?’ or ‘How do I choose what types of instances to have in a cluster with Fargate?’ You don’t have to think about that with Fargate, you’re just packaging your application and deploying it and the service does all the rest.
There are customers like Turner, running hundreds of services with Fargate, or KPMG, who has chosen Fargate because it actually gives them a lot of immutability in how they are running their infrastructure. With Lambda, users need to rethink how they’re programming and how they’re architecting applications. With Fargate, it’s a more familiar architectural model but in the end the value is the same. There’s no infrastructure to manage and you can move quickly. So there are customers of all types, from Cox Automotive to large enterprise companies, to smaller startups, all adopting serverless computing quite aggressively.
APP MESH, A SERVICE MESH FOR AWS
Service mesh is important because if you’ve got a microservices model with lots of services calling each other, you need a common way to instrument what’s happening. People are finding that they’re building microservices with lots of different languages and runtimes.
As Cockcroft explained, “back when I was at Netflix, we did have a service mesh but we were doing everything in Java and everything was open-sourced in libraries. The concept of a service mesh is that these libraries have all the instrumentation and traffic routing, etc. as part of that service mesh.” The system and discovery of services and the traffic management are all part of the system. When you write code, you just link to this library and it works. LinkerD and Envoy came along and since we wrote these services in a bunch of other languages, AWS wanted to abstract that piece out into its own codebase.
With App Mesh, AWS took Envoy and used it as the data plane where all the traffic flows through. Envoy is called and figures out what to do with the traffic and passes that packet or that route connection or request onto another Envoy somewhere else, and then that calls into the application that you were trying to call. With Envoy, all of the instrumentation is common, you can see what’s going on. With App Mesh, the service mesh is two parts:
- The data plane, like Envoy or LinkerD is connecting everything together
- The control plane, which actually stitches everything together and sets it up
App Mesh is related in some ways to Istio, but with Istio, an Istio control plane per Kubernetes cluster is required, and it’s a singleton, so it only knows how to manage that one cluster. AWS builds things that are horizontally scaled, like EC2 and ECS, where there is one control plane that everyone can use and it’s much more efficient to run that way. The result is the same data behavior that Istio has but you can use it on EC2, ECS, or Kubernetes but using the same horizontal control plane. App Mesh was created to solve this different need and create a mixture of the two technologies.
Service mesh is useful in any size application where you want some consistency in how you measure, monitor, and control things. If you have lots of small applications, it would be just as useful as one big application. The decision would be more about your development practices and whether you want to standardize how you make things observable. For any application that matters, you can see how it’s working.
Future applications of service mesh could include intelligent routing between applications including version-aware routing, getting the trace-level implementation so you can see the flows through the system.
“A lot of the value of Kubernetes is the projects that have been built on top of it that everyone is sharing, building more and more things on top. And one of the difficulties of Kubernetes is it’s hard to figure out which of these projects are science projects or half-built things and which are mainstream.” – Adrian Cockcroft
While there are many unsolved problems in serverless computing, several were solved at this year’s re:Invent. Customers had been talking to AWS a lot about how they had to package all dependencies into a Lambda function. So if there were some functions that every application was using you still had to package it with each individual application and with the launch of Lambda layers suddenly, you don’t have to do that anymore!
Customers also asked: How can I move more quickly? How can I use Lambda in any language and with a custom runtime? So AWS added language support for Rust and Ruby. AWS built custom runtime technology to implement those but now they have partners implementing Erlang and more so it increases the surface area, making it so more people can now use Lambda. And there’s also more efficiency with sharing. Singh reflected, “those are some of the biggest asks our customers had that we’ve addressed and now the next set of asks will come in and we’ll see what those are.”
Another solved issue was application load balancers being able to use different kinds of target groups. Once people had some familiarity with Lambda and containers, they want to be able to use the same functionality that AWS had for EC2 instances. People started asking for things like, ‘How can I use Lambda functions as just another target group?’
AWS also built App Mesh to solve some big problems in serverless. Now systems are so big with many functions, containers, and services. How do you control them and get more visibility into those systems? App Mesh is just in preview now, but AWS expects to be hearing a lot of feature requests in the next few months as customers start using it. Customers are still trying to figure out how they can use the service mesh concepts and AWS sees that there’s a lot of interest and they anticipate learning a lot from customers as they as they apply different use cases and share their feedback.
ADRIAN COCKCROFT ON SCHEDULING
When Cockcroft was on Software Engineering Daily, we interviewed him about scheduling. “On that particular podcast I tried to cover the history and theory behind scheduling, so it’s much more of an analysis of scheduling over the years and different types of scheduling.” While the applications and implementations change, the types of scheduling remain the same. Typical optimizations include working for better latency, deadline schedulers, cost, or trying to separate things out so you don’t get nearest neighbor effects. There is all kinds of scheduling including deadline, batch, and container. There isn’t a deadline scheduler in Kubernetes just yet but having a deadline can help to optimize the whole system.
THIS IS THE EDGE: EDGE COMPUTING
IoT is a very specific thing from an AWS perspective. There was a set of announcements at re:Invent in the industrial IoT area: machine sensors, how you collect telemetry and data from those, and how companies can use the data to make better product decisions, learn more about how other products behave, and provide all the infrastructure through analytics and insights.
Edge computing could be something like a Snowball, which could be installed in an imaging system where data is then collected. The initial model was to collect the data and then ship it back to AWS. But now that users have GPU, there are snowball edges with compute and GPU, the data can be processed right there with technologies like Greengrass that AWS has developed to do more Edge processing. Users suddenly have the ability to do more reasoning in a smaller footprint, while still sending the data back for large-scale analysis and get more insights into it, but they can do things right there instead if that’s what they want to do. There’s a subtle difference between the two, and both will evolve as new devices and new form factors emerge. So that’s why AWS is thinking about both use cases quite a bit.
There are many applications of edge computing being created with AWS. There is a lot of interest in autonomous vehicles. People are collecting data from cars, from test vehicles, and sending it back into the cloud for more detailed analysis across all the container services, and across AWS Batch, which is a batch processing service.
“Autonomous vehicles, autonomous anything, has become very popular. And obviously the industry is thinking hard about that space but also requires tons and tons of data and you have to process it. So especially GPUs and applying machine learning models. AI, machine learning, autonomous vehicles are all evolving at the same time. And that’s one area of it — all the sensor data that people are collecting — people have to act on it. And that’s an area where we see a lot of progress,” said Singh.
During re:Invent, AWS opened up all of their machine learning university courses. It’s an internal course started at Amazon: the goal is everyone should be a machine learning engineer. Last year at re:Invent AWS released camera, and this year they created a car. Singh stated, “People are having fun with programming and teaching the car to be an autonomous vehicle and doing races. Making it fun is kind of cool and those are things we are doing. Obviously there’s a lot more to do.”
IoT Edge AWS Customer Examples:
Amazon is a master of long-term thinking. It’s easy for technologists to get wrapped up in day-to-day requests for products, and things that are just around the corner, and a little bit harder to take a step back and prepare a long-term strategy.
“I tend to look for things which are trends that you can extrapolate out. The thing about predictions is that you can predict what or when but not both. I have one personal prediction that came true this week. In 2008, I did a conference keynote where I said, ‘these armchips? are getting really powerful, we should put them in the datacenter and run general purpose things on them. And we finally announced them.” – Adrian Cockcroft
In the next 5-to-10 years, a lot of legacy code maintainers of hardware and software will be retiring. This is one of the reasons people are looking at re-architecting mainframe applications. As a long-term prediction, everything you need to do around mainframe migration and the kind of applications people are running on those. For example, Korean Airlines has a plan to move to AWS for all their data centers in three years. There will also be lots of banking and financial systems doing these cloud migrations.
For Abby Fuller’s work as a technical evangelist, she talks to a lot of developers. She said it’s not just about listening to the request that people want immediately (‘Can you support the language that I like writing in?’), but also things that give them long-term impact down the road. “No one would know to ask for Firecracker, but what they know they want is, ‘How can you always make this more and more and more performant?’ So I think being able to have the flexibility to work on things like that, to distill from all the little requests: what are they really looking for?” The container community is a vibrant ecosystem and AWS wants to both listen and have a voice in the community and also deeply understand what the community is interested in.
The customers of AWS evolve as more capabilities and services are provided. Deepak Singh said that his favorite part of being at re:Invent is talking to customers and getting exposed to new things that they have come up with in AWS services that are not how the creators originally thought they interact with them. He emphasized the importance of keeping an open mind and being receptive: “Remember that what your initial ideas were may not be where the customers take you. Keeping an open mind has always been very effective and helps us reprioritize and is one of the nice things about the AWS model.”
AWS MOVING FAST: STAY TUNED
It was a big year for many new announcements at AWS re:Invent. What will happen in the year ahead? Software Engineering Daily will feature more episodes on serverless, machine learning, cloud computing, and IoT. Check out our previous podcast episodes with Adrian Cockcroft and Deepak Singh. Keep your finger on the pulse at aws.amazon.com and follow @AmazonNews. Click here to watch videos of talks from re:Invent.