Working with large datasets in Java often involves batch processing β€” whether you're making service calls, persisting data, or transforming records. But how you batch that data can have a huge impact on performance, memory usage, and code maintainability.

In this post, we’ll explore two popular strategies:

  • πŸ” The traditional subList() loop method
  • 🌊 A more modern, lazy batching approach using Java Streams

Let’s dive in and see which fits your use case better.


πŸ” Traditional Approach: subList() Batching

The most straightforward way to batch a list is using a simple loop with List.subList():

    int batchSize = 1000;
    for (int i = 0; i < largeList.size(); i += batchSize) {
        List<Item> batch = largeList.subList(i, Math.min(i + batchSize, largeList.size()));
        processBatch(batch);
    }

βœ… Pros

  • Easy to understand and implement
  • No third-party libraries or helpers needed

❌ Cons

  • Requires the entire list in memory
  • Not friendly to functional or stream-based pipelines
  • No built-in support for parallel processing

🌊 Lazy Batching with Java Streams

When you need a more stream-friendly, composable, and potentially parallel approach, consider using lazy batching via Java Streams.

πŸ› οΈ Utility Method

Here's a helper method to lazily partition a list into batches:

    public static <T> Stream<List<T>> partitionList(List<T> source, int batchSize) {
        if (source == null || source.isEmpty() || batchSize <= 0) {
            return Stream.empty();
        }

        int totalSize = source.size();
        return IntStream.iterate(0, i -> i < totalSize, i -> i + batchSize)
                        .mapToObj(start -> source.subList(start, Math.min(start + batchSize, totalSize)));
    }

πŸ“¦ Usage Example

    partitionList(largeList, 1000).forEach(batch -> {
        processBatch(batch);
    });

βœ… Pros

  • Functional and stream/lambda-friendly
  • Supports lazy evaluation
  • Easier to adapt for parallel processing

❌ Cons

  • Slightly more code overhead (requires utility method)
  • Still assumes full list is in memory β€” not suitable for truly massive datasets

βš”οΈ Quick Comparison: subList() vs Lazy Streams

Feature subList() Loop partitionList() Stream
Simplicity βœ… Simple ⚠️ Requires helper method
Memory Efficiency ❌ Full list in memory ⚠️ Same unless paged
Lazy Evaluation Support ❌ No βœ… Yes
Functional Style ❌ Imperative βœ… Functional, Composable
Parallelism ❌ Manual effort βœ… Built-in with .parallel()

🚨 Dealing with Truly Massive Lists?

If you're working with millions of records or database-scale datasets, neither approach is ideal. Instead:

  • Use paging queries (e.g., OFFSET/LIMIT, cursors) from your database or API
  • Avoid loading everything into memory β€” consider using streaming APIs or reactive programming

πŸ’‘ Best Practice Tips

  • Use subList() for small-to-medium lists where simplicity matters.
  • Use partitionList() with Streams when you need composability, cleaner code, or parallelism.
  • For large-scale systems, paginate from the data source itself β€” don’t rely on in-memory lists.

πŸ”š Conclusion

Choosing the right batching strategy in Java can significantly impact your application's scalability and performance. Whether you're modernizing legacy code or designing new data pipelines, understanding these techniques will help you process large datasets more efficiently.

Happy coding! πŸ‘¨β€πŸ’»πŸš€


Keywords: Java list batching, subList example, Java lazy streams, batch processing in Java, memory-efficient Java code, parallel stream processing


Posted on May 6, 2025