In the last post I setup the problem of predicting remaining useful life of turbofans based on information about the number of operational cycles and snapshot sensor measures. In this post I will do some exploratory analysis for the simplest dataset: one operating mode and one failure mode.
First up was some preprocessing. The data presents as the cycle number, incrementing by one for each cycle a turbofan is operated. For example, turbofan 1 will have operating setting and sensor measurements for cycle 1, 2, 3, … until the last cycle number in the remaining useful life.
I wanted (needed) all turbofans in the training data set to be counting down the number of cycles they had in their remaining useful life. I wanted it because it makes some the exploratory data analysis easier to interpret, and I needed it because it was the target variable. A little function does the trick. As a side benefit, this function demonstrates a common pattern of doing feature engineering: do some grouping on the data (for this, by unit number), calculate some metric on each group (for this, the maximum number of cycles completed), and then joining (merging) the metric back to the original frame as a new feature. This new feature took a couple extra steps, but the overall pattern is common.
Finally, for this post, we will look at how a few of the sensor measures for a few of the machines proceed over time. Hear I took a sample of 10 turbofan units from H2O to a Pandas data frame and used Seaborn for the plotting.
Some sensors show trending over time (e.g., SensorMeasure4), some are constant (e.g., SensorMeasure1), some trend over operating cycles but in different trajectories (e.g., SensorMeasure9), and others seem to change but in no discernible trend (e.g., SensorMeasure6). Other measures take on discrete values, like SensorMeasure17, and do demonstrate a trend over cycles. (See below).
The good news is that there are trends over the target variable, RemainingUsefulLife. This gives us some promise that we can build a model to predict how many cycles are left in each turbofan’s RUL. This will be in the next post, along with tips needed to cross validate the models.
All of the code for this work is available my GitHub repository for this project. I have previously presented this material and MLConf Atlanta and Unstructured Data Science Pop-Up Seattle, with the support of H2O.ai.