Leveraging Machine Learning to Predict Test Coverage

[article]
Summary:
Test coverage is an important metric within test management, and as technology evolves, we‘re able to leverage new trends to predict coverage. Weka, an open source suite of machine learning software, can take your test management beyond spreadsheets to the latest AI technologies, letting you predict your test coverage earlier with greater accuracy.

Testers should be involved in the requirements collection phase of the software development lifecycle because it benefits both the QA and business teams to understand the requirements better. In test management, we analyze those requirements, prepare test cases, execute test cases, do bug tracking, and get QA to sign off on test coverage.

At my company, a digital commerce agency, we work on e-commerce technologies and serve enterprise clients. Test coverage is an essential metric to measure the quality delivery of our projects. Our test management was initially handled in spreadsheets, but multiple versions of different files ended up underused. Maintaining these spreadsheets and extracting value became complex, and the data was quickly obsolete.

Next, we tried a shared Google spreadsheet to work in a collaborative environment. However, this led to work products being accidentally modified or deleted. Preparing reports and presenting projects to stakeholders consumed more and more of the test leads’ time.

We decided to explore the world of test management tools. These tools have evolved over the years, and they resolved some of our problems with features like a centralized repository of test work products, role-based access, reuse of work products, monitoring test activities, and tailor-made reports. They helped us maintain best practices and helped our test leaders manage more effectively at a higher level.

However, they still lacked the ability to predict tasks such as test estimation, test coverage of a release, and quality issues. Test managers were still spending too much time having to manually combine the predictions made using the reports from these test management tools with statistics. We had to find another technology to monitor and track the progress of the project’s delivery timeline.

We discovered Weka: the Waikato Environment for Knowledge Analysis. Weka is an open source collection of machine learning algorithms for data mining tasks developed by the University of Waikato, New Zealand. It is available with an easy GUI, and the library is simple to learn compared to other tools for a beginner in machine learning and AI. The algorithms can either be applied directly to a dataset or called from our own Java code.

There are four main tools in the Weka GUI Chooser:

  • Explorer is the primary graphical user interface that gives you access to most of the functionality
  • Knowledge Flow allows you to process, view, and visualize your data
  • The Experimenter helps you answer questions, like whether one classifier is better than another on a particular dataset
  • The Workbench is the unified UI for Weka

Weka has multiple options for generating predictions of any kind of data. Here’s how we use its libraries to predict test coverage.

Test Coverage Prediction in Weka

The Weka Explorer is an easy-to-use GUI that harnesses the power of the Weka machine learning software. Each of the major Weka packages—Filters, Classifiers, Clusterers, Associations, and Attribute Selection—is represented in the Explorer, along with a Visualization tool that allows datasets and the predictions of Classifiers and Clusterers to be visualized in two dimensions.

The flow of test coverage prediction in Weka Explorer looks like the snapshot below:

Workflow of test coverage prediction in Weka Explorer

We have historical data of test coverage metrics in Excel spreadsheets. For evaluation of the data, we use a .ARFF file as a data source and prepare the .ARFF file with the available existing data of our projects. After inputting the file, we do the classification. This process is known as training, with two data sets. We analyze the training results and track the accuracy. Once we train the system, then we have prepare the .ARFF file with the data that needs to be predicted.

In initial execution, the accuracy was poor—10 percent or so. We had a discussion and analyzed the results. After that, we performed the training with multiple data sets, and it improved the accuracy up to 70 percent in prediction of test coverage results.

At this point we save this model to reuse, which saves time. When the existing saved model is readily available, we load it and then upload the .ARFF test set file to be predicted.

We can load saved models in Weka and reuse them with new datasets, informing updated predictions. It helps to present the data to the stakeholders of the project earlier in the schedule, and we also can make predictions around project estimation and defects with the same accuracy.

If you’re struggling with your test management and spending too much time compiling test coverage information, download and explore Weka. Being able to predict your coverage earlier in the software development lifecycle allows you to better prepare for your project delivery.

User Comments

1 comment
Seema Bhardwaj's picture

Bhavani, I am trying to understand the meaning of "predicting test coverage". Different organizations use terms differently, so I may not be understanding things the way you mean them. Please clarify.

Thanks.

January 10, 2019 - 7:23am

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.