Working with large datasets in Java often involves batch processing β whether you're making service calls, persisting data, or transforming records. But how you batch that data can have a huge impact on performance, memory usage, and code maintainability.
In this post, weβll explore two popular strategies:
- π The traditional
subList()
loop method - π A more modern, lazy batching approach using Java Streams
Letβs dive in and see which fits your use case better.
π Traditional Approach: subList()
Batching
The most straightforward way to batch a list is using a simple loop with List.subList()
:
int batchSize = 1000;
for (int i = 0; i < largeList.size(); i += batchSize) {
List<Item> batch = largeList.subList(i, Math.min(i + batchSize, largeList.size()));
processBatch(batch);
}
β Pros
- Easy to understand and implement
- No third-party libraries or helpers needed
β Cons
- Requires the entire list in memory
- Not friendly to functional or stream-based pipelines
- No built-in support for parallel processing
π Lazy Batching with Java Streams
When you need a more stream-friendly, composable, and potentially parallel approach, consider using lazy batching via Java Streams.
π οΈ Utility Method
Here's a helper method to lazily partition a list into batches:
public static <T> Stream<List<T>> partitionList(List<T> source, int batchSize) {
if (source == null || source.isEmpty() || batchSize <= 0) {
return Stream.empty();
}
int totalSize = source.size();
return IntStream.iterate(0, i -> i < totalSize, i -> i + batchSize)
.mapToObj(start -> source.subList(start, Math.min(start + batchSize, totalSize)));
}
π¦ Usage Example
partitionList(largeList, 1000).forEach(batch -> {
processBatch(batch);
});
β Pros
- Functional and stream/lambda-friendly
- Supports lazy evaluation
- Easier to adapt for parallel processing
β Cons
- Slightly more code overhead (requires utility method)
- Still assumes full list is in memory β not suitable for truly massive datasets
βοΈ Quick Comparison: subList()
vs Lazy Streams
Feature | subList() Loop |
partitionList() Stream |
---|---|---|
Simplicity | β Simple | β οΈ Requires helper method |
Memory Efficiency | β Full list in memory | β οΈ Same unless paged |
Lazy Evaluation Support | β No | β Yes |
Functional Style | β Imperative | β Functional, Composable |
Parallelism | β Manual effort | β
Built-in with .parallel() |
π¨ Dealing with Truly Massive Lists?
If you're working with millions of records or database-scale datasets, neither approach is ideal. Instead:
- Use paging queries (e.g.,
OFFSET
/LIMIT
, cursors) from your database or API - Avoid loading everything into memory β consider using streaming APIs or reactive programming
π‘ Best Practice Tips
- Use
subList()
for small-to-medium lists where simplicity matters. - Use
partitionList()
with Streams when you need composability, cleaner code, or parallelism. - For large-scale systems, paginate from the data source itself β donβt rely on in-memory lists.
π Conclusion
Choosing the right batching strategy in Java can significantly impact your application's scalability and performance. Whether you're modernizing legacy code or designing new data pipelines, understanding these techniques will help you process large datasets more efficiently.
Happy coding! π¨βπ»π
Keywords: Java list batching, subList example, Java lazy streams, batch processing in Java, memory-efficient Java code, parallel stream processing
Posted on May 6, 2025