Building Microservices at Scale: Lessons from Production

After working with microservices at Agoda and CP Axtra, handling millions of transactions daily, I’ve learned valuable lessons about what works and what doesn’t when scaling distributed systems.

The Challenge

When you’re processing millions of payment transactions or managing inventory across thousands of retail locations, every millisecond counts. Here are the key challenges we faced:

Service Communication Overhead: Network latency between services
Data Consistency: Maintaining consistency across distributed databases
Service Discovery: Dynamic service registration and discovery
Fault Tolerance: Graceful degradation when services fail

Key Architectural Patterns

1. Event-Driven Architecture with Apache Kafka

@Service
public class PaymentEventPublisher {
    
    @Autowired
    private KafkaTemplate<String, PaymentEvent> kafkaTemplate;
    
    public void publishPaymentCompleted(Payment payment) {
        PaymentEvent event = PaymentEvent.builder()
            .paymentId(payment.getId())
            .amount(payment.getAmount())
            .status(PaymentStatus.COMPLETED)
            .timestamp(Instant.now())
            .build();
            
        kafkaTemplate.send("payment-events", event);
    }
}

2. Circuit Breaker Pattern

@Component
public class PaymentServiceClient {
    
    @CircuitBreaker(name = "payment-service", fallbackMethod = "fallbackPayment")
    @Retry(name = "payment-service")
    @TimeLimiter(name = "payment-service")
    public CompletableFuture<PaymentResponse> processPayment(PaymentRequest request) {
        return paymentServiceClient.process(request);
    }
    
    public CompletableFuture<PaymentResponse> fallbackPayment(Exception ex) {
        return CompletableFuture.completedFuture(
            PaymentResponse.builder()
                .status(PaymentStatus.PENDING)
                .message("Payment queued for retry")
                .build()
        );
    }
}

Performance Optimizations

Database Sharding Strategy

We implemented a sharding strategy based on customer segments:

High-volume customers: Dedicated database shards
Regular customers: Shared shards with load balancing
Geographical sharding: Asia-Pacific, Europe, Americas

Caching Strategy

@Cacheable(value = "customer-profiles", key = "#customerId")
public CustomerProfile getCustomerProfile(String customerId) {
    return customerRepository.findById(customerId);
}

@CacheEvict(value = "customer-profiles", key = "#customerId")
public void updateCustomerProfile(String customerId, CustomerProfile profile) {
    customerRepository.save(profile);
}

Results

After implementing these patterns:

30% reduction in response times
99.9% uptime across all services
50% reduction in database load
Zero data loss during peak traffic periods

Key Takeaways

Start with a monolith, then extract services based on business domains
Invest in observability from day one - logging, metrics, tracing
Design for failure - assume services will fail and plan accordingly
Automate everything - deployment, monitoring, scaling, recovery
Team ownership - each team owns their services end-to-end

What’s Next?

In my next post, I’ll dive deep into our Apache Kafka implementation and how we handle millions of events per day with zero message loss.

Have questions about microservices architecture? Feel free to reach out on LinkedIn or email me.

Building Microservices at Scale: Lessons from Production#

The Challenge#

Key Architectural Patterns#

1. Event-Driven Architecture with Apache Kafka#

2. Circuit Breaker Pattern#

Performance Optimizations#

Database Sharding Strategy#

Caching Strategy#

Results#

Key Takeaways#

What’s Next?#

Choose Portfolio Style

🖥️ Terminal Style

💼 Modern Style

Portfolio Assistant