Anomaly Detection in Retail Data
To ensure optimal data quality, we developed a model for our retail client that automatically detects and corrects unusual data points in sales data.

Challenge
Optimally prepared data is the foundation for all inquiries in analytics, reporting, and data science. Data often requires extensive preparation and cleaning before actual analysis. Our client, an international retail company, aimed to perform a fully automated daily verification and cleaning of sales data from connected stores to ensure error-free reporting systems.
Approach
Based on the sales time series from the past two years (approximately 500 million data points), we collaborated with the client to develop a statistical model that compares the actual data with the empirically observed distribution of each KPI for each product-store combination, automatically detecting unusual data points. The model can also smooth anomalies to the expected values, avoiding the need to completely delete the observation. The algorithm was fully developed in R and deployed on the existing analytics server within a database.
Result
Since deployment, the model automatically detects anomalies and unusual data points daily. The model's application successfully implemented automatic data preparation and cleaning of daily store data deliveries, providing reliable and stable results. Additionally, the use of the open-source software R incurs no licensing costs.