Observability should be inexpensive and purposeful. While countless open-source tools exist, it's easy to drown in excessive telemetry data. The key is knowing what you actually need: sending only essential, business-critical data and building intelligence around it. Focus on metrics aligned with SLAs, operational costs, and technical requirements, continuously adapting to support both incident prevention and response across your entire operation.
We design CI/CD pipelines that deliver software fast, cost-effectively, safely, and with easy rollback capabilities. Our approach optimizes for execution time and reliability through comprehensive testing, applies 12-factor principles for dynamic configuration across the organization, and leverages proven open-source tools to create pipelines aligned with production-tested processes.
This research explores using Apache Flink as a powerful alternative to overloaded OpenTelemetry processors for telemetry data processing. The project demonstrates real-time processing at scale, including sensitive data removal, ML-based anomaly detection, metric aggregation, advanced filtering, and analytics, offloading complex processing logic from OpenTelemetry collectors to a dedicated streaming platform built for high-throughput data transformation..
This research explores building a natural language interface for telemetry data analysis, enabling users to query their systems and applications through conversational interactions. Instead of writing complex queries or navigating dashboards, users can ask questions in plain language about system health, performance, and behavior, receiving intelligent reports generated directly from telemetry data. This approach democratizes observability insights across technical and non-technical teams.