Payment System Optimization: A 30% Performance Case Study

Payment systems are the heartbeat of any e-commerce platform. At Agoda, processing millions of transactions daily, every millisecond of latency directly impacts revenue and customer experience. Here’s how we achieved a 30% performance improvement in our payment processing pipeline.

The Challenge

Initial State

  • Average processing time: 850ms per transaction
  • Peak load: 15,000 transactions/minute
  • Database connections: Constantly maxed out
  • Customer complaints: Payment timeouts during peak hours

Business Impact

  • Revenue loss: $2M annually due to payment timeouts
  • Customer experience: 15% abandonment rate during checkout
  • Operational cost: High infrastructure scaling costs

Performance Analysis

Identifying Bottlenecks

We used a combination of tools to identify performance bottlenecks:

// Application Performance Monitoring
@Timed(name = "payment.processing.time", description = "Payment processing duration")
@Counted(name = "payment.processing.count", description = "Payment processing count")
public PaymentResponse processPayment(PaymentRequest request) {
    // ... processing logic
}

Database Query Analysis

-- Slow query identified
SELECT p.*, pp.provider_name, pm.method_name, c.currency_code
FROM payments p
JOIN payment_providers pp ON p.provider_id = pp.id
JOIN payment_methods pm ON p.method_id = pm.id  
JOIN currencies c ON p.currency_id = c.id
WHERE p.customer_id = ? AND p.status = 'PENDING'
ORDER BY p.created_at DESC;

-- Execution time: 450ms average
-- Called: 15,000 times/minute

Optimization Strategies

1. Database Optimization

Index Optimization

-- Added composite index
CREATE INDEX idx_payments_customer_status_created 
ON payments(customer_id, status, created_at DESC);

-- Result: Query time reduced from 450ms to 45ms

Query Optimization

-- Optimized query with selective joins
SELECT p.id, p.amount, p.status, p.created_at,
       pp.provider_name, pm.method_name, c.currency_code
FROM payments p
JOIN payment_providers pp ON p.provider_id = pp.id
JOIN payment_methods pm ON p.method_id = pm.id
JOIN currencies c ON p.currency_id = c.id
WHERE p.customer_id = ? 
  AND p.status = 'PENDING'
  AND p.created_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
ORDER BY p.created_at DESC
LIMIT 50;

2. Caching Strategy

Redis Implementation

@Configuration
public class PaymentCacheConfig {
    
    @Bean
    public RedisTemplate<String, Object> redisTemplate() {
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(jedisConnectionFactory());
        template.setDefaultSerializer(new GenericJackson2JsonRedisSerializer());
        return template;
    }
}

@Service
public class PaymentCacheService {
    
    @Autowired
    private RedisTemplate<String, Object> redisTemplate;
    
    @Cacheable(value = "payment-methods", key = "#customerId")
    public List<PaymentMethod> getCustomerPaymentMethods(String customerId) {
        return paymentMethodRepository.findByCustomerId(customerId);
    }
    
    @Cacheable(value = "exchange-rates", key = "#fromCurrency + ':' + #toCurrency")
    public ExchangeRate getExchangeRate(String fromCurrency, String toCurrency) {
        return exchangeRateService.getLatestRate(fromCurrency, toCurrency);
    }
}

Cache Warming Strategy

@Scheduled(fixedRate = 300000) // 5 minutes
public void warmCache() {
    // Pre-load frequently accessed data
    List<String> activeCustomers = getActiveCustomers();
    
    activeCustomers.parallelStream().forEach(customerId -> {
        paymentCacheService.getCustomerPaymentMethods(customerId);
    });
    
    // Pre-load exchange rates
    currencyPairs.forEach(pair -> {
        paymentCacheService.getExchangeRate(pair.getFrom(), pair.getTo());
    });
}

3. Connection Pool Optimization

# HikariCP configuration
spring:
  datasource:
    hikari:
      maximum-pool-size: 50
      minimum-idle: 10
      connection-timeout: 20000
      idle-timeout: 300000
      max-lifetime: 1200000
      leak-detection-threshold: 60000

4. Asynchronous Processing

@Service
public class AsyncPaymentProcessor {
    
    @Async("paymentTaskExecutor")
    @Retryable(value = {Exception.class}, maxAttempts = 3)
    public CompletableFuture<Void> processPaymentNotification(Payment payment) {
        // Send notifications asynchronously
        notificationService.sendPaymentConfirmation(payment);
        auditService.logPaymentEvent(payment);
        analyticsService.trackPaymentMetrics(payment);
        
        return CompletableFuture.completedFuture(null);
    }
}

@Configuration
@EnableAsync
public class AsyncConfig {
    
    @Bean(name = "paymentTaskExecutor")
    public TaskExecutor paymentTaskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(20);
        executor.setMaxPoolSize(50);
        executor.setQueueCapacity(500);
        executor.setThreadNamePrefix("Payment-");
        executor.initialize();
        return executor;
    }
}

5. Circuit Breaker Implementation

@Component
public class PaymentGatewayClient {
    
    @CircuitBreaker(name = "payment-gateway", fallbackMethod = "fallbackPayment")
    @TimeLimiter(name = "payment-gateway")
    @Retry(name = "payment-gateway")
    public CompletableFuture<PaymentResponse> processPayment(PaymentRequest request) {
        return CompletableFuture.supplyAsync(() -> {
            return paymentGatewayService.process(request);
        });
    }
    
    public CompletableFuture<PaymentResponse> fallbackPayment(Exception ex) {
        return CompletableFuture.completedFuture(
            PaymentResponse.builder()
                .status(PaymentStatus.PENDING)
                .message("Payment queued for processing")
                .build()
        );
    }
}

Results

Performance Improvements

MetricBeforeAfterImprovement
Average Response Time850ms595ms30% faster
95th Percentile1,200ms800ms33% faster
Database Connections90% usage45% usage50% reduction
Cache Hit Rate0%85%New capability
Error Rate2.5%0.8%68% reduction

Business Impact

  • Revenue increase: $3.2M annually due to reduced payment failures
  • Customer satisfaction: 92% payment success rate (up from 85%)
  • Infrastructure cost: 40% reduction in database resources needed
  • Checkout abandonment: Reduced from 15% to 8%

Monitoring and Alerting

Key Metrics Dashboard

@Component
public class PaymentMetrics {
    
    private final MeterRegistry meterRegistry;
    
    public void recordPaymentProcessingTime(Duration duration) {
        Timer.Sample.start(meterRegistry)
             .stop(Timer.builder("payment.processing.time")
                        .register(meterRegistry));
    }
    
    public void incrementPaymentCounter(String status, String method) {
        Counter.builder("payment.processed")
               .tag("status", status)
               .tag("method", method)
               .register(meterRegistry)
               .increment();
    }
}

Alert Configuration

# High payment processing time
- alert: HighPaymentProcessingTime
  expr: payment_processing_time_p95 > 1000
  for: 5m
  labels:
    severity: warning
    
# High payment error rate
- alert: HighPaymentErrorRate
  expr: rate(payment_processed{status="error"}[5m]) > 0.02
  for: 2m
  labels:
    severity: critical

Lessons Learned

1. Measure Everything

  • Implement comprehensive monitoring before optimization
  • Use APM tools to identify real bottlenecks
  • Set up alerting for key business metrics

2. Cache Strategically

  • Cache frequently accessed, rarely changing data
  • Implement cache warming for predictable load
  • Monitor cache hit rates and adjust TTL accordingly

3. Optimize the Database Last

  • Application-level optimizations often yield better results
  • Index optimization can provide immediate wins
  • Connection pooling is critical for high-concurrency systems

4. Async Where Possible

  • Move non-critical operations to background processing
  • Use message queues for decoupling
  • Implement proper error handling for async operations

What’s Next?

Our next optimization phase focuses on:

  • Machine learning for fraud detection optimization
  • GraphQL implementation for flexible API responses
  • Kubernetes deployment for better resource utilization

Working on payment system optimization? I’d love to hear about your challenges and share experiences. Connect with me on LinkedIn.