- Published on
Data Loading Strategies: Architectural Patterns for High-Performance Systems
- Authors
- Name
- Gary Huynh
- @gary_atruedev
Data loading strategies fundamentally impact system performance, scalability, and user experience. While the lazy vs eager loading debate often focuses on ORM configuration, architects must consider these patterns in the broader context of distributed systems, API design, and microservices architecture.
Understanding Loading Strategies at Scale
The N+1 Problem in Distributed Systems
The classic N+1 problem extends beyond ORMs into API design and microservices:
// Anti-pattern: N+1 in microservices
@RestController
public class OrderController {
@Autowired private OrderService orderService;
@Autowired private CustomerService customerService;
@Autowired private ProductService productService;
@GetMapping("/orders")
public List<OrderDTO> getOrders() {
List<Order> orders = orderService.findAll(); // 1 query
return orders.stream().map(order -> {
// N queries to customer service
Customer customer = customerService.getCustomer(order.getCustomerId());
// M queries to product service per order
List<Product> products = order.getItems().stream()
.map(item -> productService.getProduct(item.getProductId()))
.collect(Collectors.toList());
return OrderDTO.from(order, customer, products);
}).collect(Collectors.toList());
}
}
// Solution: Batch loading with data loaders
@Component
public class CustomerDataLoader {
@Autowired private CustomerService customerService;
private final BatchLoader<Long, Customer> batchLoader;
public CustomerDataLoader() {
this.batchLoader = new BatchLoader<>((ids) ->
CompletableFuture.supplyAsync(() -> {
Map<Long, Customer> customers = customerService.getCustomersByIds(ids);
return ids.stream()
.map(customers::get)
.collect(Collectors.toList());
})
);
}
public CompletableFuture<Customer> load(Long customerId) {
return batchLoader.load(customerId);
}
}
Advanced Loading Patterns
1. Projection-Based Loading
// Define loading strategies through projections
public interface LoadingStrategy<T> {
T load(LoadingContext context);
}
@Component
public class OrderLoadingStrategy {
// Minimal projection for list views
public interface OrderSummary {
Long getId();
String getOrderNumber();
BigDecimal getTotal();
@Value("#{target.customer.name}")
String getCustomerName();
}
// Full projection for detail views
public interface OrderDetail extends OrderSummary {
List<OrderItemDetail> getItems();
CustomerDetail getCustomer();
ShippingDetail getShipping();
}
// GraphQL-style field selection
@Query("""
SELECT o FROM Order o
LEFT JOIN FETCH o.customer c
LEFT JOIN FETCH o.items i
LEFT JOIN FETCH i.product p
WHERE o.id = :id
AND (:includeCustomer = false OR c IS NOT NULL)
AND (:includeItems = false OR i IS NOT NULL)
""")
Order findWithSelectedAssociations(
@Param("id") Long id,
@Param("includeCustomer") boolean includeCustomer,
@Param("includeItems") boolean includeItems
);
}
2. Predictive Loading
@Component
public class PredictiveLoader {
@Autowired private AccessPatternAnalyzer analyzer;
@Autowired private CacheManager cacheManager;
public <T> T loadWithPrediction(Long entityId, Class<T> entityClass) {
// Analyze historical access patterns
AccessPattern pattern = analyzer.getPattern(entityId, entityClass);
// Predict which associations will be accessed
Set<String> predictedAssociations = pattern.getPredictedAssociations();
// Build optimized query
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<T> query = cb.createQuery(entityClass);
Root<T> root = query.from(entityClass);
// Add fetch joins for predicted associations
for (String association : predictedAssociations) {
root.fetch(association, JoinType.LEFT);
}
query.where(cb.equal(root.get("id"), entityId));
T result = entityManager.createQuery(query).getSingleResult();
// Pre-warm caches for predicted next accesses
pattern.getPredictedNextAccesses().forEach(nextId ->
CompletableFuture.runAsync(() ->
cacheManager.preload(nextId, entityClass)
)
);
return result;
}
}
3. Reactive Loading Strategies
@Service
public class ReactiveDataLoader {
private final WebClient.Builder webClientBuilder;
public Mono<OrderDetailsDTO> loadOrderWithDependencies(Long orderId) {
// Core order data
Mono<Order> orderMono = loadOrder(orderId).cache();
// Parallel loading of associations
Mono<Customer> customerMono = orderMono
.map(Order::getCustomerId)
.flatMap(this::loadCustomer)
.cache();
Mono<List<Product>> productsMono = orderMono
.map(order -> order.getItems().stream()
.map(OrderItem::getProductId)
.collect(Collectors.toList()))
.flatMap(productIds -> loadProducts(productIds))
.cache();
Mono<ShippingInfo> shippingMono = orderMono
.map(Order::getShippingId)
.flatMap(this::loadShipping)
.cache();
// Combine all data
return Mono.zip(orderMono, customerMono, productsMono, shippingMono)
.map(tuple -> OrderDetailsDTO.builder()
.order(tuple.getT1())
.customer(tuple.getT2())
.products(tuple.getT3())
.shipping(tuple.getT4())
.build()
);
}
private Mono<List<Product>> loadProducts(List<Long> productIds) {
// Batch load products
return webClientBuilder.build()
.post()
.uri("/products/batch")
.bodyValue(productIds)
.retrieve()
.bodyToFlux(Product.class)
.collectList();
}
}
Performance Optimization Strategies
1. Smart Fetching with Entity Graphs
@NamedEntityGraphs({
@NamedEntityGraph(
name = "Order.summary",
attributeNodes = {
@NamedAttributeNode("customer"),
@NamedAttributeNode("total")
}
),
@NamedEntityGraph(
name = "Order.detail",
attributeNodes = {
@NamedAttributeNode("customer"),
@NamedAttributeNode(value = "items", subgraph = "items")
},
subgraphs = {
@NamedSubgraph(
name = "items",
attributeNodes = {
@NamedAttributeNode("product"),
@NamedAttributeNode("price")
}
)
}
)
})
@Entity
public class Order {
// Entity definition
}
@Repository
public class OptimizedOrderRepository {
@PersistenceContext
private EntityManager em;
public List<Order> findOrdersOptimized(OrderCriteria criteria) {
// Dynamic graph selection based on use case
String graphName = criteria.isDetailedView() ? "Order.detail" : "Order.summary";
return em.createQuery("SELECT o FROM Order o WHERE o.status = :status", Order.class)
.setParameter("status", criteria.getStatus())
.setHint("javax.persistence.loadgraph", em.getEntityGraph(graphName))
.getResultList();
}
}
2. Batch Loading and Connection Pooling
@Configuration
public class DataLoadingOptimization {
@Bean
public HikariDataSource dataSource() {
HikariConfig config = new HikariConfig();
config.setMaximumPoolSize(50);
config.setConnectionTimeout(3000);
// Optimize for batch operations
config.addDataSourceProperty("rewriteBatchedStatements", "true");
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
return new HikariDataSource(config);
}
@Bean
public BatchLoadingExecutor batchLoadingExecutor() {
return new BatchLoadingExecutor(
Executors.newFixedThreadPool(10),
1000, // batch size
100 // batch timeout ms
);
}
}
@Component
public class BatchLoadingService {
@Autowired private BatchLoadingExecutor executor;
public CompletableFuture<Map<Long, Customer>> loadCustomers(Set<Long> customerIds) {
return executor.batch(customerIds, (List<Long> batch) -> {
// Execute batch query
return jdbcTemplate.query(
"SELECT * FROM customers WHERE id IN (?)",
ps -> ps.setArray(1, connection.createArrayOf("BIGINT", batch.toArray())),
(rs, rowNum) -> mapCustomer(rs)
).stream()
.collect(Collectors.toMap(Customer::getId, Function.identity()));
});
}
}
3. Query Result Streaming
@Component
public class StreamingDataLoader {
@Transactional(readOnly = true)
public void processLargeDataSet(Consumer<Order> processor) {
// Use cursor-based streaming for large result sets
try (Stream<Order> orderStream = em.createQuery(
"SELECT o FROM Order o WHERE o.createdAt > :date", Order.class)
.setParameter("date", LocalDate.now().minusDays(30))
.setHint("org.hibernate.fetchSize", 1000)
.setHint("org.hibernate.readOnly", true)
.getResultStream()) {
orderStream
.parallel()
.forEach(order -> {
try {
processor.accept(order);
} finally {
// Detach to prevent memory accumulation
em.detach(order);
}
});
}
}
// Reactive streaming for real-time processing
public Flux<Order> streamOrders(OrderCriteria criteria) {
return Flux.create(sink -> {
ScrollableResults results = session.createQuery(
"SELECT o FROM Order o WHERE o.status = :status")
.setParameter("status", criteria.getStatus())
.setFetchSize(100)
.scroll(ScrollMode.FORWARD_ONLY);
while (results.next()) {
Order order = (Order) results.get(0);
sink.next(order);
session.evict(order); // Prevent memory leak
}
sink.complete();
}, FluxSink.OverflowStrategy.BUFFER);
}
}
Microservices Data Loading Patterns
1. API Gateway Aggregation
@RestController
public class ApiGatewayController {
@Autowired private WebClient.Builder webClientBuilder;
@GetMapping("/api/orders/{id}")
public Mono<OrderComposite> getOrderComposite(@PathVariable Long id) {
// Parallel service calls with circuit breakers
Mono<Order> orderMono = callOrderService(id);
Mono<Customer> customerMono = orderMono
.map(Order::getCustomerId)
.flatMap(this::callCustomerService);
Mono<List<Product>> productsMono = orderMono
.flatMapMany(order -> Flux.fromIterable(order.getProductIds()))
.flatMap(this::callProductService)
.collectList();
return Mono.zip(orderMono, customerMono, productsMono)
.map(tuple -> new OrderComposite(tuple.getT1(), tuple.getT2(), tuple.getT3()))
.timeout(Duration.ofSeconds(5))
.onErrorReturn(OrderComposite.empty());
}
}
2. Event-Driven Data Loading
@Component
public class EventDrivenDataLoader {
@Autowired private KafkaTemplate<String, Event> kafka;
@EventListener
public void handleOrderCreated(OrderCreatedEvent event) {
// Trigger async data loading
CompletableFuture.allOf(
preloadCustomerData(event.getCustomerId()),
preloadProductData(event.getProductIds()),
warmCaches(event.getOrderId())
).thenRun(() ->
kafka.send("data-ready", new DataReadyEvent(event.getOrderId()))
);
}
}
Key Architectural Decisions
- Loading Strategy Selection: Base on access patterns, not assumptions
- Consistency Requirements: Strong consistency vs eventual consistency
- Performance SLAs: Define acceptable latency for different operations
- Resource Constraints: Memory vs network vs CPU trade-offs
- Monitoring: Track loading patterns to optimize strategies
Conclusion
Effective data loading in modern architectures requires thinking beyond simple lazy vs eager loading. Architects must consider distributed system challenges, API design, caching strategies, and user experience. The key is to choose loading strategies based on actual usage patterns, performance requirements, and system constraints. Remember that the best loading strategy is often a combination of techniques, adapted to specific use cases and continuously optimized based on production metrics.