
The ML system was built on the following Google Cloud products and services: Our Google Cloud AI Services team (Professional Services), along with Accenture, helped Seagate build a proof of concept based on the two most common drive types. Reducing risk and costs with a predictive maintenance system To help solve this issue, we created a machine learning system to predict HDD health in our data centers. When you consider the number of drives in an enterprise data center today, it’s practically impossible to monitor all these devices based on human power alone. That’s hundreds of parameters and factors that must be tracked and monitored across every single HDD. This includes billions of rows of hourly SMART(Self-Monitoring, Analysis and Reporting Technology) data and host metadata, such as repair logs, Online Vendor Diagnostics (OVD) or Field Accessible Reliability Metrics (FARM) logs, and manufacturing data about each disk drive. There are millions of disks deployed in operation that generate terabytes (TBs) of raw telemetry data. Managing disks by the millions is hard work Together, we developed a machine learning (ML) system, built on top of Google Cloud, to forecast the probability of a recurring failing disk-a disk that fails or has experienced three or more problems in 30 days. That’s why we teamed up with Seagate, our HDD original equipment manufacturer (OEM) partner for Google’s data centers, to find a way to predict frequent HDD problems.

It required draining the data from the drive, isolating the drive, running diagnostics, and then re-introducing it to traffic. But this procedure was expensive and time-consuming.

In the past, when a disk was flagged for a problem, the main option was to repair the problem on site using software. We are responsible for running some of the largest data centers in the world-any misses in identifying these failures at the right time can potentially cause serious outages across our many products and services. According to IDC, stored data will increase 17.8% by 2024 with HDD as the main storage technology.Īt Google Cloud, we know first-hand how critical it is to manage HDDs in operations and preemptively identify potential failures. Data centers may be in the midst of a flash revolution, but managing hard disk drives (HDDs) is still paramount.
