How to incorporate AI into DevOps opens the future – Gigaom

Stay up to date with enterprise technology trends

Get updates that impact your industry from our GigaOm Research Community

Join the community!

So often, tech practitioners can feel like technology kids – we talk about how development and operation can be automated, however, the possibilities that we have at our disposal often imperfect, fragmented, complex and for a long time the vision of what tools might be.

So is there any hope for the future? Recently, I spoke with Eran Kinsbruner, Head of Evangelization at Perfecto and Justin Reock, Chief Architect, OpenLogic at Perforce Software, about DevOps and AI, as well as how DevOps processes will be migrated. change over the next 5 years by automation. Eran just released the book—Software Quality Improvement: Machine Learning and Artificial Intelligence in DevOps Era (available from all good stores), so he should have a few answers.

What did I just learn? First, the importance of focusing on value in DevOps; second, the role of AI and ML in accelerating DevOps; and third, opportunities that exist today for AI-based innovation. Our chats have been edited for clarity, but here are the key points:

Jon Collins: What does DevOps really mean to you and what makes it work?

Eran Kinsbruner: DevOps is not a closed term that people perfectly understand. I like Microsoft’s definition of DevOps: it’s a combination of people, processes, and products that deliver constant value to their customers. Fantastic. But it’s still pretty vague.

So how do I do? I have people, I have signature teams, I have technology. I am building up these features for a short period of time. But what is the value? How do I know if I’m really adding value to my customers?

Perhaps execution speed is important to them? In my opinion, the value isn’t just about the execution speed, but more than that, you need to listen to your end user. What do they really want to get out of your product? Sometimes the late developer doesn’t even know how his feature will be used there.

Jon: Yes, I agree – you have to ask: What does value mean to your customers? Suddenly you have a conversation: How do we value? What are the benefits our customers are getting? And what have they prepared to get those benefits? It becomes a higher level conversation that can direct everything else.

Without that high-level conversation then you just pump everything out without a clue. It’s like making cars. Here’s another one. Here’s another one. Here’s another one. Does anyone drive them? I have no idea! So, and how does that relate to quality on your mind? What role does quality play throughout the life cycle?

Eran: So I’m looking at DevOps from the end-user point of view. Will end users use my product? What do they think about my product? And how do I understand all the feedback so I can improve and create more value for these users?

So quality is not just about functionality: putting something in, taking something out. You want value equal to quality by definition. As you put your costs into value distribution, you’ll learn what it really means: functionality, performance, response time, availability.

You discover it by testing what’s relevant from an end-user perspective because if that’s not something your customers are dealing with, then you are not testing or providing quality for what. important. So both in terms of development and quality assurance, you need to be very focused.

What do I need to cover? What should I check? On what platform? What scenario? It’s the most eloquent feature someone has actually touched on the previous code commit and the like. This is when you find results: features or products that are valuable to your end users.

Jon: Great, make it worth it first. But how do I map this onto the DevOps process, from a pipeline perspective? And how can AI help?

Justin Reock: When I think of DevOps, it goes back to The Theory of Constraints and applies the idea of ​​reducing the amount of friction involved in converting value to throughput. For me, that’s the essence of DevOps, at least from a business perspective. We are doing everything we can to reduce the “backlog” inventory, ie the code has not yet been converted to money.

The more we can do to reduce the friction between transforming inventory throughput and organizational costs, the faster every line of code the developer commits to the source control repository. becomes flux, or money, as quickly as possible. And if you refine it back to that birth, then I think if you look at the AI, its location becomes very clear.

The ideal DevOps pipeline is a completely friction-free one: a developer tests his code, and it then runs in production 5 seconds later, as soon as it passes a series of tests. experience without any human being involved. The customer is buying something, and you’ve converted that code into throughput in just a few seconds. It’s brilliant, beautiful, and elegant, that’s the goal of DevOps and software, as well as AI.

Jon: Let’s go down to the practical problem – can we see an example?

Justin: Surely, for example software testing? There are many points that we can eliminate not only the slowness that having humans being part of the process causes, but if we do it right, we can eliminate more and more prejudice. of testers out of that system, which means we test less and less. In many ways, we still forcefully force the way we deal with that problem. We do A / B testing and release Canary, in case we don’t think about a viable path.

But we still have a goal here: DevOps is all about continuous feedback loops. So you have to get feedback on your product and you have to integrate that into new features and of course you have to fix the bug. The more we can mitigate those problems and prevent them from seeing daylight, the faster through things like Fuzzing and AI, the faster we can get that code out of the way.

All this is linked together. In a world where it’s all connected software, it opens the door to surrounding services, self-driving cars or fully automated retail locations. It helps to create our fully realized and augmented reality where everything is a digital asset and scarcity is proven through blockchain, but that blockchain matters only if quality is possible. enforcement.

Jon: Gosh. It is a huge leap!

Justin: Yes, you’re right, but I don’t think people really understand the molecular level at which software is about to blossom, due to AI in the DevOps process. Reducing friction in pipes is of greatest necessity and it opens up all sorts of opportunities.

Jon: Okay, let’s dive into this – what’s the lowest hanging fruit? What will change in DevOps over the next few years, because of AI and ML?

Eran: Let’s rethink the feedback loops. Sometimes DevOps developers and managers think they did the right thing and are doing it right, but then a machine learning algorithm comes along and frees them up, providing feedback that is different from what they thought they were. will receive. ML can help provide unbiased, unbiased feedback, not really reviewing a product roadmap or anything like that, but it does look at the end user, which is pretty clear.

Then, when you consolidate it with your product decisions and software delivery cycle, you’ll probably get something more solid and more relevant to your customers. That’s what I see as the biggest opportunity right now.

Jon: All that sounds great, it’s great theory. But what should I do to resolve these comments?

Eran: That’s a good question. You don’t have to throw everything away, and real AI can’t solve everything right away. But we need this software quality increase. Noise reduction, priority. Obviously we can apply them in the whole process, but let’s just focus on testing.

The most unreliable test case is a good one. We call them flaking. They are showing in red in your CI / CD path and you did nothing with them because you don’t know why they failed. AI can look at these failures and categorize them into different groups. And all of a sudden we can see that 80% of all of these errors are not the real ones. They are only due to the poor coding skills of a test engineer. Now we can zoom in by 20% that’s the real error, those are the problems that can really affect the value for my clients. Now I have something I can prioritize. I know where my developers need to focus.

Therefore, reducing noise and prioritizing testing can lead to accelerated software delivery. As you apply that to your existing processes, you can move forward much faster.

Jon: Great thank you! Therefore, AI and ML can open up enormous value in the digital world. The key right now is to look for direct opportunities to remove friction from the process itself, in testing, and across the process. Eran and Justin, thank you so much for your time!

Route mapping from ITOps to AIOps – Gigaom

Stay up to date with enterprise technology trends

Get updates that impact your industry from our GigaOm Research Community

Join the community!

IT (ITOps) operations are always rooted in data collection and analysis. Artificial intelligence (AI) and machine learning (ML) are being applied to allow a new class of Ops tools to actually learn and improve from the data they collect. Advances are not coming too soon, as the IT crisis created by the COVID-19 pandemic forced organizations to stand up for widely distributed applications and infrastructure. The emerging layer of ITOps tool called AIOps promises a solution to sudden complexity.

David Linthicum, in his recently published report “Best practices in moving from ITOps to AIOps”, explores the journey IT organizations face as they seek to leverage ML and interact. Autonomous systems to speed diagnostics, reduce downtime, optimize infrastructure and predict challenges.

Linthicum divides this journey into four phases: ITOps, Emerging AIOps, Advanced AIOps and Future AIOps. This process begins with a traditional approach built around IT monitoring, automating scripts and Manual Operations processes, and ends with a process-driven, automated workflow. business prediction and automation.

“Note that we shifted from the limitations of the traditional approach to using the emerging AI Activity,” writes Linthicum in the report, citing the infographic in Figure 1. “This has some core attributes, such as the ability to monitor systems using correlated data, automate the manual, the ability the AI ​​engine learns from the data, and the ability to deliver the bulk of the data. This function is on demand, according to the needs of the Ops groups. “

Figure 1: Stages of applying AIOps

Ultimately, the goal is to apply the concepts of automated computing, which deal with the self-managed properties of distributed computing resources and their ability to adapt to unpredictable changes in when hiding complexity from both the operator and the user. In other words, as Linthicum notes, “the ability to remove humans from the basic operational complexity.”

In the report, Linthicum offers best practices to help IT organizations embark on the AIOps journey. Tutorials begin with planning and measurement issues – review of business problems, course mapping to achieve AIOps, and enabled value processing. From there, he explored motion: Transforming into advanced concepts like predictive analytics and self-sufficiency while implementing a continual improvement process for AIOps and ensuring integration with other Ops tools. Ultimately, he urges introspection, value evaluation, and performance classification, and adopting a continuous cycle of the AIOps effort is going on.

As Linthicum observes, the use of the operational automation tool is a “must-have conclusion”, but that doesn’t mean it will come in time to address the facility’s soaring complexity. IT infrastructure. He urges organizations to map their AIOps journey early as a way to avoid being surprised.

Learn more: Best practices for switching from ITOps to AIOps

Enhanced hardware and the future of software-defined storage – Gigaom

Always updating business technology trends

Get updates that impact your industry from our GigaOm Research Community

Join the community!

One of the most exciting announcements at VMworld this year is Project Monterey. In a nutshell, VMware wants to offload some high-demand operations to smart network interface cards (SmartNIC) and other types of accelerators. That way, the virtualization stack becomes more efficient and feature-rich, while also allowing for resource separation and aggregation. This description might seem a bit vague, but has a number of real-world benefits for users, including increased security and efficiency, as well as better resource management. And all of these happen simultaneously improving the cluster’s overall performance!

The idea is quite simple and the same approaches have been used many times in the past. Today’s multi-purpose CPUs are complex and can do many things in parallel, but like it or not, there are limitations and some shared resources. The more diverse the tasks CPU is required to perform, the more context it has to switch context, creating internal cache errors that slow down processing and decrease system performance.

From a storage, encryption, compression, protocol transformation and all the math around data protection put a strain on the CPU, leading to poorer overall system performance and performance. . In this regard, Project Monterey will bring these tasks off-load to specialized accelerators and hardware, allowing the system CPU to focus potential and increase efficiency to run applications.

Unfortunately, Project Monterey isn’t here yet, and we’ll need to wait a few months to see a production version of it. On the other hand, the technology is already available and several vendors in different sectors are working on the exact same model as Project Monterey.

Software + Hardware defined memory

Software is great, but software that can get the most out of hardware is better. For many years, we had hardware-based storage systems powered by CPUs and ASICs built to purpose, and this was the only way to deliver the power needed to get everything fast enough to works normally. The storage array’s operating system is specifically designed to work with the hardware and exploit every bit of it. Over time, thanks to the increase in the power of CPUs and network components, we discovered that ASICs (and other esoteric accelerators) were practically useless and more and more system architectures increased. The system is starting to focus on standard hardware. This has brought to the market an increasing number of “software-defined” solutions.

Everything worked pretty well with the hard drive until it appeared, in order of appearance, flash memory, NVMe and storage. It doesn’t happen quickly, in fact, adopting flash is quite slow because of its price at first, but things have changed completely over the past few years.

We now have 100Gb / s Ethernet (if not more!), NVMe and NVMe-oF (shorter and large parallel data paths) and faster memory than ever can be configured. looks like a RAM extension (Intel Optane). The amount of data these devices can manage is enormous. In order to keep the storage system balanced and efficient, we need to ensure that every component can receive high data flow without congestion. It’s a classic example of a history of repetition, some say.

A storage system defined using standard software (i.e. no acceleration) may use CPUs for general purposes because:

  • Slow hard drive (hundreds of IOPS)
  • Flash is faster but still manageable (up to tens of thousands of IOPS)
  • Relatively slow Ethernet (10Gb / s)
  • Protocols are designed to handle hard drives in a serial fashion (SCSI, SATA, etc.).

The day we unleash the power of next-generation memory options thanks to NVMe and faster networks (100Gb / s or more), all-in-one CPUs have become the bottleneck. At the same time, expanding the storage system requires more powerful and more expensive CPUs to function. At the end of the day, everyone wants to go faster, but no one wants to give up on data protection, data services, security, and data footprint optimization. Would you do that?

Some SDS vendors quickly got this and started building the next generation of systems that leverage accelerators to do more (and better) with less (power and space). .

Software + Hardware Optimized

I want to give you an example of the Lightbits Laboratory here. Lightbit’s LightOS is an innovative NVMe-based scaling software solution that aggregates NVMe devices on storage nodes and displays NVMe / TCP as its end-to-end protocol. It combines low latency and high performance of NVMe-oF (NVMe over fabric) storage with data services over standard Ethernet TCP / IP networks.

The company recently announced a partnership with Intel to take advantage of most of the latest hardware technologies from the giant chip maker:

  • Intel Optane. For fast, non-volatile handling of write buffers and metadata
  • Intel Ethernet 800 Series NIC. For optimization low latency NVMe / TCP is optimized
  • Intel QLC 3D NAND SSD hard drive. For $ / GB better.
  • And more…

Lightbits has shown incredible performance on versatile hardware. By adding these technologies to its solutions, it is possible to further optimize performance while increasing efficiency and overall system costs. On the other hand, Lightbits is offloading a bunch of tasks for Intel SmartNIC (with specific optimizations done by ADQ technology) while leveraging the latest memory options for performance, capacity, and cost. charge better than other solutions. For users, that means better performance, more capacity and higher overall efficiency along with reduced data center area, making it easier to switch to better TCO.

It’s worth noting that these accelerators can be considered specialized hardware, but they’re not custom hardware. Actually, we are talking about built-in, non-ASIC components designed by Lightbits. This is particularly important and gives Lightbits a huge long-term advantage as it can focus on software development instead of managing an ASIC design. This is also beneficial for Lightbits customers, as they will have additional options. In fact, they can choose between defined software (fast, efficient, cost-effective) and defined software with hardware acceleration (faster, more efficient, TCO-focused) .

Close the circle

If I said it once, I said it a million times. Modern data centers are no longer just x86. More and more advanced infrastructure today relies on dedicated hardware and accelerators such as GPUs, TPU, FPGA, smart NIC, etc.

The software was able to take advantage of these components and Lightbits is a great example of this. Its solution may work on general-purpose hardware, but it can work better with these components and provide better TCO and faster return on investment at the end of the day.

From a user perspective, hardware upgrade (lack of better terminology) is just software identified on steroids and it offers additional options for designing solutions that better meet business needs.

Disclaimer: Lightbits Labs is a customer of Gigaom