Sky Solutions

Smarter Claims Processing with Machine Learning

Presenting the second idea in our Sky AI Hackathon series.  

As we discussed previously, during our 2024 Sky Solution AI Hackathon, our teams explored how artificial intelligence (AI) could address real-world challenges faced by federal agencies. This led to the inception of our Talk to Me About the Data initiative to develop technology that leverages AI to make government data more easily usable by journalists, researchers, decision-makers and other members of the public. 

Driving Budget-Boosting Efficiency in Claims Processing 

Another exciting program our teams developed during the Hackathon focuses on transforming the Medicare claims review process. Using a machine learning (ML) model, users can leverage intelligent automation to bring enhanced efficiency and accuracy to an often slow, manual, yet mission-critical workflow.  

The Challenge: Tackling Inefficiencies in Claims Processing 
The claims review process for federal programs such as Medicare is typically data-intensive and complex. Multiple review cycles, missing documentation and inconsistent decision-making often result in delays and strain resources for both reviewers and applicants. To address these issues, the team set out to design a solution that improves accuracy, shortens turnaround times and eases the burden on human reviewers. 

The Concept: A Machine Learning–Powered Decision Assistant 
The model proposed by our team equips reviewers with intelligent analytical tools that draw on historical data from similar claims, empowering them to make better informed decisions quickly. 

By quickly identifying missing documentation, flagging incomplete submissions and predicting likely outcomes based on past patterns, the system proposed by Sky’s team acts as a real-time guide throughout the review process. The advanced, ML–driven assistant offers data-backed insights to streamline the review process, minimizing errors and reducing the need for reevaluation. 

The Approach: From Raw Data to Actionable Insights 
Our team’s proof of concept used a publicly available Medicare dataset from Data.gov, comprising over 33,000 records covering outpatient claims, disease prevalence, demographic attributes and regional data. After cleaning and filtering the data, the team selected key variables such as year, location, disease type and ethnicity) to feed into the model. 

Using Python and Google Colab, the data was prepared through a multi-step process: 

  • Cleaning: Null and duplicate records were removed to ensure data quality 
  • Normalization: Outcome values were standardized into percentage units for consistency 
  • Encoding: Label encoders were applied to transform categorical data such as location or year into numerical values easily understood by ML algorithms 
  • Feature Selection: To optimize performance, we identified the most impactful variables using a feature importance filter 

The team tested various regression algorithms, including linear regression, random forest and gradient boosting. The gradient boosting model emerged as the most accurate, making it the preferred choice for this application. 

Visual Analysis: Making the Model Transparent 
In parallel with model development, the team created visualizations such as box plots and line graphs to track trends across years, regions and demographic groups. These visuals helped validate that the model reflected real-world behaviors, demonstrating both transparency and reliability.  

The Impact: Aligning Technology with Mission Outcomes 
This new solution will be able to support federal agencies in expediting claims processing, minimizing manual checks, and ensuring greater decision consistency. By anticipating the information that will be needed and suggesting documentation based on historical patterns, this solution has the potential to reduce both time and effort in claims processing. 

Moreover, its modular architecture will allow it to be deployed across multiple types of domains. From healthcare and benefits processing to immigration services, it can bring speed and accuracy to any scenario where evaluating claims is a critical challenge. 

What’s Next? 
The model is now moving through the proof-of-concept stage, where it will be tested using simulated claims and practical workflows. Future enhancements could include natural language processing to handle the fast-growing volumes of unstructured data, including images, videos, sensor data and more. 

Share