There’s a famous saying in engineering: "Fast, Cheap, or Reliable. Pick two." But when you enter a 48-hour hackathon to build a distributed incident management system, you are forced to pick all three.
This is the story of how my team and I built SmartPager from scratch, moving from a blank IDE to a production-grade alerting system in a single weekend.
The Problem with Incident Management
Incident management isn't just about sending an email when a server goes down. It's about concurrent event handling, real-time escalation, and ensuring that alerts trigger in sub-seconds. If the alerting system itself fails, it's useless. We needed a system that could handle failure scenarios gracefully.
Why Microservices?
The easiest route in a hackathon is a monolith. But we wanted to build something that mirrored real-world production environments. We chose a microservices architecture using Spring Boot, sitting behind an Nginx reverse proxy, backed by PostgreSQL, with a React frontend.
- Incident Service : Ingests and processes incoming simulated incidents.
- Notification Service: Handles the real-time routing and escalation of alerts.
- Auth & Gateway: Handles security and load distribution.
Engineering Under Pressure: The Trade-offs
Senior engineering is about understanding trade-offs. With only 48 hours, we didn't have the luxury of spinning up an entire Kafka cluster for event streaming.
Instead, we engineered an event-driven escalation system using lightweight Spring Boot event listeners and optimized PostgreSQL indexing to process the state of incidents. We prioritized low-latency alerting over perfect eventual consistency, ensuring that when a simulated incident fired, the on-call engineer was notified in milliseconds.
The Outcome
When the judging phase arrived, we didn't just show them a PowerPoint. We bombarded the system with 100+ concurrent simulated incidents.
SmartPager didn't flinch. The distributed nodes handled the ingestion, the event-driven escalation triggered perfectly, and we achieved sub-second alert latency.
Conclusion
Building SmartPager taught me that system resilience isn't something you add at the end of a project; it's a feature you have to architect from minute one. You don't need infinite time to build distributed systems—you just need a solid architecture and the discipline to stick to it.
You can check out the source code for SmartPager here : [https://github.com/mohamedmabrouk09/incident-microservices]