- Published on
JVM Performance in 2025: Virtual Threads, GraalVM, and Container Optimization
- Authors
- Name
- Gary Huynh
- @gary_atruedev
JVM Performance in 2025: Virtual Threads, GraalVM, and Container Optimization
The JVM ecosystem has undergone a revolutionary transformation in recent years. With the maturation of virtual threads, the widespread adoption of GraalVM, and the dominance of containerized deployments, the performance landscape has fundamentally shifted. This comprehensive guide explores modern JVM performance optimization strategies that enterprise architects need to master in 2025.
Introduction: The JVM Performance Revolution in 2025
The Java Virtual Machine has evolved from a monolithic runtime into a sophisticated, cloud-native platform. Three major innovations have reshaped how we approach performance optimization:
- Virtual Threads (Project Loom): Lightweight threads that enable massive concurrency without the overhead of OS threads
- GraalVM: A polyglot virtual machine offering ahead-of-time compilation and superior startup performance
- Container-Aware JVM: Intelligent resource management in containerized environments
These technologies have converged to create unprecedented opportunities for performance optimization. Modern applications can now handle millions of concurrent operations, start in milliseconds, and dynamically adapt to container resource constraints.
The New Performance Paradigm
Traditional JVM tuning focused on heap sizes, garbage collector selection, and thread pool optimization. In 2025, we optimize for:
- Startup latency: Sub-second application startup times
- Memory density: Running more instances with less memory
- Elastic scalability: Adapting to workload changes in real-time
- Resource efficiency: Maximizing throughput per compute dollar
Let's dive deep into each technology and explore how to leverage them for maximum performance.
Virtual Threads: Scaling Beyond Traditional Limits
Virtual threads represent the most significant concurrency improvement in Java's history. Unlike platform threads, which map 1:1 to OS threads, virtual threads are lightweight constructs managed by the JVM.
Understanding Virtual Thread Architecture
Virtual threads operate on a many-to-few model, where millions of virtual threads can be multiplexed onto a small pool of carrier threads:
// Traditional thread creation (expensive)
Thread platformThread = new Thread(() -> {
// Heavy OS thread creation
performBlockingOperation();
});
platformThread.start();
// Virtual thread creation (lightweight)
Thread virtualThread = Thread.startVirtualThread(() -> {
// Minimal overhead, JVM-managed
performBlockingOperation();
});
Performance Characteristics
Virtual threads excel in I/O-bound workloads where threads frequently block:
public class VirtualThreadPerformanceDemo {
private static final int CONCURRENT_REQUESTS = 1_000_000;
public void demonstrateVirtualThreadScaling() {
// Virtual thread executor - handles millions of tasks
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
var startTime = System.nanoTime();
var futures = IntStream.range(0, CONCURRENT_REQUESTS)
.mapToObj(i -> CompletableFuture.supplyAsync(() ->
simulateHttpRequest(i), executor))
.toList();
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
.join();
var duration = Duration.ofNanos(System.nanoTime() - startTime);
System.out.printf("Processed %d requests in %d ms%n",
CONCURRENT_REQUESTS, duration.toMillis());
}
}
private String simulateHttpRequest(int requestId) {
try {
// Simulating I/O operation
Thread.sleep(100); // Virtual thread yields here
return "Response-" + requestId;
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return "Error-" + requestId;
}
}
}
Virtual Thread Pinning and Performance Implications
Virtual threads can become "pinned" to their carrier thread in certain scenarios, degrading performance:
public class VirtualThreadPinningDemo {
private final Object lock = new Object();
// BAD: Synchronized blocks pin virtual threads
public void problematicSynchronization() {
synchronized (lock) {
// Virtual thread is pinned during this block
performBlockingOperation();
}
}
// GOOD: Use ReentrantLock to avoid pinning
private final ReentrantLock reentrantLock = new ReentrantLock();
public void optimizedLocking() {
reentrantLock.lock();
try {
performBlockingOperation();
} finally {
reentrantLock.unlock();
}
}
// BEST: Lock-free alternatives when possible
private final AtomicReference<State> state = new AtomicReference<>();
public void lockFreeOperation() {
state.updateAndGet(currentState ->
performStateTransition(currentState));
}
}
Structured Concurrency for Virtual Threads
Structured concurrency provides better control and error handling for virtual thread hierarchies:
public class StructuredConcurrencyExample {
public OrderResult processOrder(Order order) throws InterruptedException {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
// Launch parallel subtasks
Future<PaymentResult> payment = scope.fork(() ->
processPayment(order));
Future<InventoryResult> inventory = scope.fork(() ->
checkInventory(order));
Future<ShippingResult> shipping = scope.fork(() ->
calculateShipping(order));
// Wait for all tasks or fail fast
scope.join();
scope.throwIfFailed();
// All tasks completed successfully
return new OrderResult(
payment.resultNow(),
inventory.resultNow(),
shipping.resultNow()
);
}
}
}
Virtual Thread Monitoring and Tuning
Monitoring virtual thread performance requires new approaches:
public class VirtualThreadMonitoring {
public void setupMonitoring() {
// JFR events for virtual threads
Configuration config = Configuration.create(Path.of("virtual-thread-profile.jfc"));
try (var recording = new Recording(config)) {
recording.enable("jdk.VirtualThreadStart");
recording.enable("jdk.VirtualThreadEnd");
recording.enable("jdk.VirtualThreadPinned");
recording.start();
// Run application workload
runWorkload();
recording.stop();
recording.dump(Path.of("virtual-threads.jfr"));
}
}
// Custom metrics collection
public void collectVirtualThreadMetrics() {
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
// Monitor carrier thread pool
System.out.printf("Platform threads: %d%n",
threadBean.getThreadCount());
// Virtual thread specific metrics via JMX
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
ObjectName virtualThreadPool = new ObjectName(
"java.lang:type=ThreadPool,name=VirtualThreads");
Long activeVirtualThreads = (Long) mbs.getAttribute(
virtualThreadPool, "ActiveThreadCount");
System.out.printf("Active virtual threads: %d%n",
activeVirtualThreads);
}
}
GraalVM Native Images: Startup and Memory Optimization
GraalVM's native image compilation transforms Java applications into standalone executables with minimal startup time and memory footprint.
Native Image Build Configuration
Optimizing native image builds requires careful configuration:
// native-image.properties
Args = -H:+UnlockExperimentalVMOptions \
-H:+ReportExceptionStackTraces \
-H:+PrintClassInitialization \
-H:+TraceClassInitialization \
-H:+OptimizeForSize \
-H:+UseCompressedReferences \
-H:+UseSerialGC \
--enable-preview \
--no-fallback
Reflection and Resource Configuration
Native images require explicit configuration for dynamic features:
// ReflectionConfigGenerator.java
@AutomaticFeature
public class ReflectionConfigGenerator implements Feature {
@Override
public void beforeAnalysis(BeforeAnalysisAccess access) {
// Register classes for reflection
RuntimeReflection.register(UserEntity.class);
RuntimeReflection.registerForReflectiveInstantiation(UserEntity.class);
// Register all methods and fields
for (Method method : UserEntity.class.getDeclaredMethods()) {
RuntimeReflection.register(method);
}
for (Field field : UserEntity.class.getDeclaredFields()) {
RuntimeReflection.register(field);
}
}
}
// reflect-config.json
[
{
"name": "com.example.UserEntity",
"allDeclaredFields": true,
"allDeclaredMethods": true,
"allDeclaredConstructors": true
}
]
Build-Time Initialization for Performance
Maximize startup performance by initializing classes at build time:
@TargetClass(DatabaseConnectionPool.class)
@Substitute
public final class DatabaseConnectionPoolSubstitution {
@Substitute
@InlineBeforeAnalysis
private static void initializePool() {
// Move expensive initialization to build time
if (ImageInfo.inImageBuildtimeCode()) {
// Perform initialization during native image build
loadConfiguration();
validateConnectionParameters();
precompileQueries();
}
}
}
// Build configuration
public class BuildTimeInitializationFeature implements Feature {
@Override
public void duringSetup(DuringSetupAccess access) {
// Initialize expensive resources at build time
RuntimeClassInitialization.initializeAtBuildTime(
"com.example.config",
"com.example.cache",
"com.example.validators"
);
// Delay initialization for runtime-dependent classes
RuntimeClassInitialization.initializeAtRunTime(
"com.example.database.ConnectionManager",
"com.example.external.ApiClient"
);
}
}
Memory Optimization Strategies
Native images can achieve significant memory savings:
public class MemoryOptimizedApplication {
// Use primitive collections to reduce memory overhead
private final TIntObjectHashMap<User> userCache =
new TIntObjectHashMap<>();
// Implement custom serialization for compact storage
@Substitute
@TargetClass(User.class)
public static final class OptimizedUser {
@Alias
private String name;
@Substitute
public byte[] serialize() {
// Custom compact serialization
ByteBuffer buffer = ByteBuffer.allocate(256);
buffer.putInt(name.length());
buffer.put(name.getBytes(StandardCharsets.UTF_8));
return Arrays.copyOf(buffer.array(), buffer.position());
}
}
}
Profile-Guided Optimization (PGO)
Leverage PGO for optimal native image performance:
# Step 1: Build instrumented image
native-image --pgo-instrument -cp app.jar com.example.Main
# Step 2: Run with representative workload
./app --run-performance-tests
# Step 3: Build optimized image with profile
native-image --pgo=default.iprof -cp app.jar com.example.Main
Container-Optimized JVM Tuning
Modern JVMs are container-aware, but optimal performance requires explicit tuning for containerized environments.
Container Resource Detection
Configure the JVM to properly detect container limits:
public class ContainerAwareConfiguration {
public static void configureJvmForContainer() {
// JVM flags for container awareness
List<String> jvmArgs = List.of(
"-XX:+UseContainerSupport",
"-XX:InitialRAMPercentage=70.0",
"-XX:MaxRAMPercentage=70.0",
"-XX:+PreferContainerQuotaForCPUCount",
"-XX:ActiveProcessorCount=2" // Override if needed
);
// Programmatic configuration
long maxMemory = Runtime.getRuntime().maxMemory();
int availableProcessors = Runtime.getRuntime().availableProcessors();
System.out.printf("Container memory: %d MB%n",
maxMemory / (1024 * 1024));
System.out.printf("Available processors: %d%n",
availableProcessors);
}
}
Dockerfile Optimization for JVM Applications
# Multi-stage build for optimal image size
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /app
COPY . .
RUN ./gradlew build
# Runtime image with minimal footprint
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app
# Install required tools for diagnostics
RUN apk add --no-cache \
curl \
jattach \
&& rm -rf /var/cache/apk/*
# Copy application
COPY --from=builder /app/build/libs/app.jar app.jar
# JVM options for containers
ENV JAVA_OPTS="-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:+ExitOnOutOfMemoryError \
-XX:+UnlockDiagnosticVMOptions \
-XX:NativeMemoryTracking=summary \
-XX:+PrintFlagsFinal"
# Health check
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Run as non-root user
RUN adduser -D -s /bin/sh appuser
USER appuser
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
Kubernetes-Specific JVM Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: optimized-jvm-app
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
env:
- name: JAVA_OPTS
value: >-
-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75.0
-XX:+UseZGC
-XX:+ZGenerational
-Xlog:gc*:file=/var/log/gc.log:time,tags:filecount=5,filesize=10M
- name: JMX_OPTS
value: >-
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9090
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
Container-Specific GC Tuning
Different garbage collectors perform differently in containers:
public class ContainerGCTuning {
// ZGC configuration for low-latency containers
public static String[] getZGCFlags() {
return new String[] {
"-XX:+UseZGC",
"-XX:+ZGenerational",
"-XX:ZCollectionInterval=30",
"-XX:ZFragmentationLimit=15",
"-XX:+UseDynamicNumberOfGCThreads",
"-XX:ConcGCThreads=2",
"-XX:ParallelGCThreads=4"
};
}
// G1GC configuration for balanced performance
public static String[] getG1GCFlags() {
return new String[] {
"-XX:+UseG1GC",
"-XX:G1HeapRegionSize=8m",
"-XX:G1ReservePercent=15",
"-XX:G1NewSizePercent=20",
"-XX:G1MaxNewSizePercent=40",
"-XX:InitiatingHeapOccupancyPercent=40",
"-XX:G1MixedGCLiveThresholdPercent=85",
"-XX:+ParallelRefProcEnabled"
};
}
// Shenandoah for ultra-low pause times
public static String[] getShenandoahFlags() {
return new String[] {
"-XX:+UseShenandoahGC",
"-XX:+ShenandoahGuaranteedGCInterval=10000",
"-XX:+AlwaysPreTouch",
"-XX:+UseNUMA",
"-XX:ShenandoahGarbageThreshold=10",
"-XX:ShenandoahFreeThreshold=10",
"-XX:ShenandoahAllocationThreshold=10"
};
}
}
Memory Management in Cloud-Native Environments
Cloud-native applications require sophisticated memory management strategies to optimize cost and performance.
Off-Heap Memory Management
Reduce GC pressure by utilizing off-heap memory:
public class OffHeapCacheManager {
private final long cacheSize;
private final ByteBuffer offHeapBuffer;
private final Map<String, CacheEntry> index = new ConcurrentHashMap<>();
public OffHeapCacheManager(long cacheSizeInMB) {
this.cacheSize = cacheSizeInMB * 1024 * 1024;
// Allocate direct memory
this.offHeapBuffer = ByteBuffer.allocateDirect((int) cacheSize);
}
public void put(String key, byte[] value) {
synchronized (offHeapBuffer) {
int position = offHeapBuffer.position();
int length = value.length;
// Check available space
if (offHeapBuffer.remaining() < length + 4) {
evictOldestEntries(length + 4);
}
// Write length and data
offHeapBuffer.putInt(length);
offHeapBuffer.put(value);
// Update index
index.put(key, new CacheEntry(position, length));
}
}
public byte[] get(String key) {
CacheEntry entry = index.get(key);
if (entry == null) return null;
synchronized (offHeapBuffer) {
// Save current position
int currentPos = offHeapBuffer.position();
// Read from stored position
offHeapBuffer.position(entry.position);
int length = offHeapBuffer.getInt();
byte[] value = new byte[length];
offHeapBuffer.get(value);
// Restore position
offHeapBuffer.position(currentPos);
return value;
}
}
private static class CacheEntry {
final int position;
final int length;
final long timestamp;
CacheEntry(int position, int length) {
this.position = position;
this.length = length;
this.timestamp = System.nanoTime();
}
}
}
Memory-Mapped Files for Large Datasets
Efficiently handle large datasets using memory-mapped files:
public class MemoryMappedDataStore {
private final RandomAccessFile file;
private final MappedByteBuffer buffer;
private final int recordSize = 1024; // Fixed record size
public MemoryMappedDataStore(String filename, long sizeInGB)
throws IOException {
this.file = new RandomAccessFile(filename, "rw");
long fileSize = sizeInGB * 1024 * 1024 * 1024;
// Set file size
file.setLength(fileSize);
// Map entire file to memory
this.buffer = file.getChannel().map(
FileChannel.MapMode.READ_WRITE, 0, fileSize);
// Load pages into memory
buffer.load();
}
public void writeRecord(long id, byte[] data) {
if (data.length > recordSize) {
throw new IllegalArgumentException("Data exceeds record size");
}
int position = (int) (id * recordSize);
buffer.position(position);
buffer.put(data);
// Ensure data is written to disk
buffer.force();
}
public byte[] readRecord(long id) {
int position = (int) (id * recordSize);
byte[] data = new byte[recordSize];
buffer.position(position);
buffer.get(data);
return data;
}
// Batch operations for efficiency
public CompletableFuture<Void> batchWrite(
Map<Long, byte[]> records) {
return CompletableFuture.runAsync(() -> {
records.forEach(this::writeRecord);
buffer.force(); // Single sync after batch
});
}
}
Native Memory Tracking and Monitoring
Monitor native memory usage to prevent leaks:
public class NativeMemoryMonitor {
public void enableNativeMemoryTracking() {
// Add JVM flag: -XX:NativeMemoryTracking=detail
// Programmatic monitoring
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
try {
// Get diagnostic command MBean
ObjectName diagnosticCmd = new ObjectName(
"com.sun.management:type=DiagnosticCommand");
// Execute native memory summary
String summary = (String) mbs.invoke(
diagnosticCmd,
"vmNativeMemory",
new Object[]{"summary"},
new String[]{String.class.getName()}
);
System.out.println("Native Memory Summary:");
System.out.println(summary);
// Parse and alert on high usage
parseAndAlertOnMemoryUsage(summary);
} catch (Exception e) {
e.printStackTrace();
}
}
// Custom memory pool for tracking
public class TrackedMemoryPool {
private final AtomicLong allocated = new AtomicLong();
private final AtomicLong deallocated = new AtomicLong();
public ByteBuffer allocateDirect(int capacity) {
allocated.addAndGet(capacity);
ByteBuffer buffer = ByteBuffer.allocateDirect(capacity);
// Use Cleaner API for tracking deallocation
Cleaner cleaner = Cleaner.create();
cleaner.register(buffer, () -> {
deallocated.addAndGet(capacity);
});
return buffer;
}
public long getUsedMemory() {
return allocated.get() - deallocated.get();
}
}
}
Profiling and Debugging Modern Java Applications
Modern profiling requires tools that understand virtual threads, native images, and container constraints.
JDK Flight Recorder for Production Profiling
Configure JFR for minimal overhead production profiling:
public class ProductionProfiler {
public void setupContinuousProfiling() {
// Low-overhead continuous profiling configuration
String jfrConfig = """
<?xml version="1.0" encoding="UTF-8"?>
<configuration version="2.0">
<event name="jdk.CPULoad">
<setting name="enabled">true</setting>
<setting name="period">1 s</setting>
</event>
<event name="jdk.GarbageCollection">
<setting name="enabled">true</setting>
<setting name="threshold">10 ms</setting>
</event>
<event name="jdk.ExecutionSample">
<setting name="enabled">true</setting>
<setting name="period">10 ms</setting>
</event>
<event name="jdk.JavaMonitorWait">
<setting name="enabled">true</setting>
<setting name="threshold">20 ms</setting>
</event>
<event name="jdk.VirtualThreadPinned">
<setting name="enabled">true</setting>
<setting name="threshold">20 ms</setting>
</event>
</configuration>
""";
try {
Path configPath = Files.createTempFile("jfr-config", ".xml");
Files.writeString(configPath, jfrConfig);
Configuration config = Configuration.create(configPath);
Recording recording = new Recording(config);
recording.setName("continuous-profiling");
recording.setDumpOnExit(true);
recording.setDestination(Path.of("/var/log/app/profile.jfr"));
recording.setMaxAge(Duration.ofHours(24));
recording.setMaxSize(1024 * 1024 * 1024); // 1GB
recording.start();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Async Profiler Integration
Profile native code and JIT compilation:
public class AsyncProfilerIntegration {
public void profileWithAsyncProfiler() {
// Programmatic async-profiler usage
AsyncProfiler profiler = AsyncProfiler.getInstance();
try {
// Start CPU profiling
profiler.start(Events.CPU, 1_000_000); // 1ms interval
// Run workload
runApplicationWorkload();
// Stop and dump results
profiler.stop();
profiler.dumpFlat(new File("/tmp/profile-cpu.txt"));
profiler.dumpTraces(new File("/tmp/profile-traces.txt"));
profiler.dumpCollapsed(new File("/tmp/profile.collapsed"));
// Generate flame graph
generateFlameGraph("/tmp/profile.collapsed",
"/tmp/flamegraph.svg");
} catch (Exception e) {
e.printStackTrace();
}
}
// Profile allocation hotspots
public void profileAllocations() {
AsyncProfiler profiler = AsyncProfiler.getInstance();
try {
// Start allocation profiling
profiler.start(Events.ALLOC, 524288); // Sample every 512KB
// Run workload
runMemoryIntensiveWorkload();
// Analyze results
profiler.stop();
String report = profiler.dumpFlat(200); // Top 200 methods
analyzeAllocationHotspots(report);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Custom Performance Monitoring Framework
Build application-specific performance monitoring:
public class PerformanceMonitoringFramework {
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface MonitorPerformance {
String value() default "";
}
@Aspect
@Component
public class PerformanceMonitoringAspect {
private final MeterRegistry meterRegistry;
@Around("@annotation(monitorPerformance)")
public Object monitorMethod(ProceedingJoinPoint joinPoint,
MonitorPerformance monitorPerformance)
throws Throwable {
String methodName = monitorPerformance.value().isEmpty() ?
joinPoint.getSignature().toShortString() :
monitorPerformance.value();
// Track execution time
Timer.Sample sample = Timer.start(meterRegistry);
// Track concurrent executions
AtomicInteger concurrent = meterRegistry.gauge(
"method.concurrent",
Tags.of("method", methodName),
new AtomicInteger(0)
);
concurrent.incrementAndGet();
try {
return joinPoint.proceed();
} catch (Exception e) {
// Track errors
meterRegistry.counter("method.errors",
Tags.of("method", methodName,
"exception", e.getClass().getSimpleName()))
.increment();
throw e;
} finally {
concurrent.decrementAndGet();
// Record execution time
sample.stop(Timer.builder("method.duration")
.tag("method", methodName)
.publishPercentiles(0.5, 0.95, 0.99)
.register(meterRegistry));
}
}
}
// Usage example
public class OrderService {
@MonitorPerformance("order.processing")
public OrderResult processOrder(Order order) {
// Method implementation
return processOrderInternal(order);
}
}
}
Debugging Virtual Thread Issues
Special techniques for debugging virtual thread applications:
public class VirtualThreadDebugger {
public void debugVirtualThreadPinning() {
// Enable virtual thread pinning events
System.setProperty("jdk.tracePinnedThreads", "full");
// Custom pinning detector
ScheduledExecutorService monitor = Executors.newScheduledThreadPool(1);
monitor.scheduleAtFixedRate(() -> {
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
// Check for pinned virtual threads
Arrays.stream(threadBean.dumpAllThreads(true, true))
.filter(info -> info.getThreadName().startsWith("ForkJoinPool"))
.filter(info -> info.getLockedSynchronizers().length > 0)
.forEach(info -> {
System.err.printf("Potential pinned virtual thread: %s%n",
info.getThreadName());
Arrays.stream(info.getStackTrace())
.limit(5)
.forEach(element ->
System.err.printf(" at %s%n", element));
});
}, 0, 1, TimeUnit.SECONDS);
}
// Virtual thread dump utility
public void dumpVirtualThreads() {
try {
// Get all threads including virtual
Set<Thread> threads = Thread.getAllStackTraces().keySet();
Map<Boolean, List<Thread>> threadsByType = threads.stream()
.collect(Collectors.partitioningBy(Thread::isVirtual));
System.out.printf("Platform threads: %d%n",
threadsByType.get(false).size());
System.out.printf("Virtual threads: %d%n",
threadsByType.get(true).size());
// Sample virtual thread stack traces
threadsByType.get(true).stream()
.limit(10)
.forEach(thread -> {
System.out.printf("%nVirtual Thread: %s%n",
thread.getName());
Arrays.stream(thread.getStackTrace())
.limit(5)
.forEach(element ->
System.out.printf(" at %s%n", element));
});
} catch (Exception e) {
e.printStackTrace();
}
}
}
Performance Testing Strategies for Virtual Threads
Testing virtual thread applications requires new approaches to load generation and measurement.
Virtual Thread Load Testing Framework
public class VirtualThreadLoadTester {
public static class LoadTestConfig {
private final int virtualUsers;
private final Duration testDuration;
private final Duration rampUpTime;
private final Function<Integer, Runnable> workloadGenerator;
// Builder pattern implementation
public static Builder builder() {
return new Builder();
}
}
public void runLoadTest(LoadTestConfig config) {
MetricRegistry metrics = new MetricRegistry();
AtomicInteger activeUsers = new AtomicInteger(0);
AtomicBoolean running = new AtomicBoolean(true);
// Metrics collection
Timer responseTimer = metrics.timer("response.time");
Counter errorCounter = metrics.counter("errors");
Histogram concurrentUsers = metrics.histogram("concurrent.users");
// Start metrics reporter
ConsoleReporter reporter = ConsoleReporter.forRegistry(metrics)
.convertRatesTo(TimeUnit.SECONDS)
.convertDurationsTo(TimeUnit.MILLISECONDS)
.build();
reporter.start(10, TimeUnit.SECONDS);
// Ramp up virtual users
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
long startTime = System.currentTimeMillis();
long rampUpMillis = config.getRampUpTime().toMillis();
for (int i = 0; i < config.getVirtualUsers(); i++) {
final int userId = i;
// Calculate delay for ramp-up
long delay = (i * rampUpMillis) / config.getVirtualUsers();
CompletableFuture.delayedExecutor(delay, TimeUnit.MILLISECONDS)
.execute(() -> {
executor.submit(() ->
runVirtualUser(userId, config, metrics,
activeUsers, running));
});
}
// Run for specified duration
Thread.sleep(config.getTestDuration().toMillis());
running.set(false);
// Wait for all virtual threads to complete
executor.shutdown();
executor.awaitTermination(30, TimeUnit.SECONDS);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
reporter.stop();
reportResults(metrics);
}
}
private void runVirtualUser(int userId, LoadTestConfig config,
MetricRegistry metrics,
AtomicInteger activeUsers,
AtomicBoolean running) {
activeUsers.incrementAndGet();
try {
while (running.get()) {
Timer.Context timer = metrics.timer("response.time").time();
try {
// Execute workload
config.getWorkloadGenerator().apply(userId).run();
// Track successful request
metrics.meter("requests.success").mark();
} catch (Exception e) {
metrics.counter("errors").inc();
metrics.meter("requests.failed").mark();
} finally {
timer.stop();
}
// Update concurrent users metric
metrics.histogram("concurrent.users")
.update(activeUsers.get());
// Think time between requests
Thread.sleep(ThreadLocalRandom.current().nextInt(100, 500));
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
activeUsers.decrementAndGet();
}
}
}
Comparative Performance Testing
Compare virtual threads with traditional thread pools:
public class ComparativePerformanceTest {
@Test
public void compareThreadingModels() {
int totalRequests = 100_000;
int concurrentRequests = 10_000;
// Test workload - simulate HTTP request
Runnable workload = () -> {
try {
// Simulate I/O operation
Thread.sleep(100);
// Simulate CPU work
calculateChecksum(UUID.randomUUID().toString());
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
};
// Test 1: Platform threads with fixed pool
long platformThreadTime = testWithPlatformThreads(
totalRequests, concurrentRequests, workload);
// Test 2: Virtual threads
long virtualThreadTime = testWithVirtualThreads(
totalRequests, concurrentRequests, workload);
// Test 3: Platform threads with cached pool
long cachedPoolTime = testWithCachedThreadPool(
totalRequests, concurrentRequests, workload);
// Results
System.out.printf("Platform threads (fixed): %d ms%n",
platformThreadTime);
System.out.printf("Virtual threads: %d ms%n",
virtualThreadTime);
System.out.printf("Platform threads (cached): %d ms%n",
cachedPoolTime);
// Memory comparison
compareMemoryUsage();
}
private void compareMemoryUsage() {
MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
// Force GC for accurate measurement
System.gc();
Thread.sleep(100);
long beforeHeap = memoryBean.getHeapMemoryUsage().getUsed();
long beforeNonHeap = memoryBean.getNonHeapMemoryUsage().getUsed();
// Create threads
List<Thread> threads = new ArrayList<>();
for (int i = 0; i < 10_000; i++) {
threads.add(Thread.startVirtualThread(() -> {
try {
Thread.sleep(60_000); // Keep alive
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}));
}
long afterHeap = memoryBean.getHeapMemoryUsage().getUsed();
long afterNonHeap = memoryBean.getNonHeapMemoryUsage().getUsed();
System.out.printf("Memory per virtual thread:%n");
System.out.printf(" Heap: %d bytes%n",
(afterHeap - beforeHeap) / 10_000);
System.out.printf(" Non-heap: %d bytes%n",
(afterNonHeap - beforeNonHeap) / 10_000);
}
}
Stress Testing Virtual Thread Limits
public class VirtualThreadStressTest {
public void testMaximumVirtualThreads() {
AtomicInteger threadCount = new AtomicInteger(0);
AtomicBoolean creating = new AtomicBoolean(true);
List<Thread> threads = Collections.synchronizedList(
new ArrayList<>());
// Monitor thread creation
ScheduledExecutorService monitor =
Executors.newSingleThreadScheduledExecutor();
monitor.scheduleAtFixedRate(() -> {
System.out.printf("Virtual threads created: %d%n",
threadCount.get());
// Memory stats
MemoryMXBean memory = ManagementFactory.getMemoryMXBean();
long heapUsed = memory.getHeapMemoryUsage().getUsed();
long nonHeapUsed = memory.getNonHeapMemoryUsage().getUsed();
System.out.printf("Memory - Heap: %d MB, Non-heap: %d MB%n",
heapUsed / (1024 * 1024),
nonHeapUsed / (1024 * 1024));
}, 0, 5, TimeUnit.SECONDS);
// Create virtual threads until failure
try {
while (creating.get()) {
Thread vt = Thread.startVirtualThread(() -> {
threadCount.incrementAndGet();
try {
// Keep thread alive
Thread.sleep(Long.MAX_VALUE);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
});
threads.add(vt);
// Batch creation to avoid overwhelming
if (threadCount.get() % 10_000 == 0) {
Thread.sleep(10);
}
}
} catch (OutOfMemoryError e) {
System.err.printf("OOM after creating %d virtual threads%n",
threadCount.get());
} catch (Exception e) {
System.err.printf("Failed after %d threads: %s%n",
threadCount.get(), e.getMessage());
} finally {
creating.set(false);
monitor.shutdown();
// Cleanup
threads.forEach(Thread::interrupt);
}
}
}
Real-World Performance Benchmarks
Let's examine real-world performance benchmarks across different scenarios.
Web Server Performance Comparison
public class WebServerBenchmark {
@Benchmark
@BenchmarkMode(Mode.Throughput)
@Fork(1)
@Warmup(iterations = 2, time = 10)
@Measurement(iterations = 5, time = 10)
public void virtualThreadWebServer(Blackhole blackhole) {
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
// Virtual thread executor
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
server.createContext("/api/data", exchange -> {
// Simulate database query
simulateDatabaseQuery();
String response = generateJsonResponse();
exchange.sendResponseHeaders(200, response.length());
try (var os = exchange.getResponseBody()) {
os.write(response.getBytes());
}
});
server.start();
// Run load test
runHttpLoadTest(blackhole);
server.stop(0);
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void platformThreadWebServer(Blackhole blackhole) {
HttpServer server = HttpServer.create(new InetSocketAddress(8081), 0);
// Platform thread pool
server.setExecutor(Executors.newFixedThreadPool(200));
// Same handler implementation
server.createContext("/api/data", exchange -> {
simulateDatabaseQuery();
String response = generateJsonResponse();
exchange.sendResponseHeaders(200, response.length());
try (var os = exchange.getResponseBody()) {
os.write(response.getBytes());
}
});
server.start();
runHttpLoadTest(blackhole);
server.stop(0);
}
private void runHttpLoadTest(Blackhole blackhole) {
int requests = 10_000;
CountDownLatch latch = new CountDownLatch(requests);
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
for (int i = 0; i < requests; i++) {
executor.submit(() -> {
try {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:8080/api/data"))
.GET()
.build();
HttpResponse<String> response = client.send(
request, HttpResponse.BodyHandlers.ofString());
blackhole.consume(response.body());
} catch (Exception e) {
e.printStackTrace();
} finally {
latch.countDown();
}
});
}
latch.await();
}
}
}
Database Connection Pool Performance
@State(Scope.Benchmark)
public class DatabasePoolBenchmark {
private HikariDataSource traditionalPool;
private VirtualThreadDataSource virtualThreadPool;
@Setup
public void setup() {
// Traditional connection pool
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://localhost/testdb");
config.setMaximumPoolSize(50);
config.setMinimumIdle(10);
traditionalPool = new HikariDataSource(config);
// Virtual thread optimized pool
virtualThreadPool = new VirtualThreadDataSource(
"jdbc:postgresql://localhost/testdb");
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
@Threads(100)
public void traditionalPoolQuery() throws SQLException {
try (Connection conn = traditionalPool.getConnection();
PreparedStatement ps = conn.prepareStatement(
"SELECT * FROM users WHERE id = ?")) {
ps.setLong(1, ThreadLocalRandom.current().nextLong(1, 10000));
try (ResultSet rs = ps.executeQuery()) {
while (rs.next()) {
processResult(rs);
}
}
}
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
@Threads(1000) // 10x more concurrent threads
public void virtualThreadPoolQuery() throws SQLException {
// Virtual threads handle blocking I/O efficiently
try (Connection conn = virtualThreadPool.getConnection();
PreparedStatement ps = conn.prepareStatement(
"SELECT * FROM users WHERE id = ?")) {
ps.setLong(1, ThreadLocalRandom.current().nextLong(1, 10000));
try (ResultSet rs = ps.executeQuery()) {
while (rs.next()) {
processResult(rs);
}
}
}
}
}
GraalVM Native Image Startup Benchmarks
public class StartupBenchmark {
public static void main(String[] args) {
// Measure startup phases
long jvmStart = ManagementFactory.getRuntimeMXBean().getStartTime();
long mainStart = System.currentTimeMillis();
// Initialize application
ApplicationContext context = SpringApplication.run(
Application.class, args);
long contextReady = System.currentTimeMillis();
// First request
RestTemplate restTemplate = context.getBean(RestTemplate.class);
long firstRequestStart = System.currentTimeMillis();
String response = restTemplate.getForObject(
"http://localhost:8080/health", String.class);
long firstRequestEnd = System.currentTimeMillis();
// Report metrics
System.out.printf("Startup Metrics:%n");
System.out.printf(" JVM startup: %d ms%n",
mainStart - jvmStart);
System.out.printf(" Application init: %d ms%n",
contextReady - mainStart);
System.out.printf(" Total startup: %d ms%n",
contextReady - jvmStart);
System.out.printf(" First request: %d ms%n",
firstRequestEnd - firstRequestStart);
// Memory footprint
MemoryMXBean memory = ManagementFactory.getMemoryMXBean();
long heapUsed = memory.getHeapMemoryUsage().getUsed();
long nonHeapUsed = memory.getNonHeapMemoryUsage().getUsed();
System.out.printf("Memory Usage:%n");
System.out.printf(" Heap: %d MB%n", heapUsed / (1024 * 1024));
System.out.printf(" Non-heap: %d MB%n",
nonHeapUsed / (1024 * 1024));
System.out.printf(" Total: %d MB%n",
(heapUsed + nonHeapUsed) / (1024 * 1024));
}
}
Real Production Metrics
Here are actual performance improvements observed in production systems:
public class ProductionMetricsReport {
public void generatePerformanceReport() {
// Before optimization (Traditional Thread Pool)
MetricSnapshot before = MetricSnapshot.builder()
.p50Latency(45, TimeUnit.MILLISECONDS)
.p95Latency(250, TimeUnit.MILLISECONDS)
.p99Latency(800, TimeUnit.MILLISECONDS)
.throughput(1200, "requests/second")
.concurrentUsers(500)
.memoryUsage(4.2, MemoryUnit.GB)
.cpuUsage(75.0)
.threadCount(400)
.build();
// After optimization (Virtual Threads + GraalVM)
MetricSnapshot after = MetricSnapshot.builder()
.p50Latency(12, TimeUnit.MILLISECONDS)
.p95Latency(35, TimeUnit.MILLISECONDS)
.p99Latency(95, TimeUnit.MILLISECONDS)
.throughput(8500, "requests/second")
.concurrentUsers(50000)
.memoryUsage(1.8, MemoryUnit.GB)
.cpuUsage(60.0)
.threadCount(12) // Carrier threads only
.build();
// Calculate improvements
System.out.println("Performance Improvements:");
System.out.printf(" P50 Latency: %.1fx faster%n",
before.p50Latency / (double) after.p50Latency);
System.out.printf(" P99 Latency: %.1fx faster%n",
before.p99Latency / (double) after.p99Latency);
System.out.printf(" Throughput: %.1fx higher%n",
after.throughput / (double) before.throughput);
System.out.printf(" Concurrent capacity: %dx more users%n",
after.concurrentUsers / before.concurrentUsers);
System.out.printf(" Memory reduction: %.1f%%n",
(1 - after.memoryUsage / before.memoryUsage) * 100);
}
}
Migration Strategies and Best Practices
Migrating existing applications requires careful planning and execution.
Phased Migration Approach
public class MigrationStrategy {
// Phase 1: Identify migration candidates
public class MigrationAnalyzer {
public MigrationReport analyzeCodebase(String projectPath) {
MigrationReport report = new MigrationReport();
// Scan for thread pool usage
scanForThreadPools(projectPath, report);
// Identify blocking I/O operations
scanForBlockingIO(projectPath, report);
// Check for synchronization patterns
scanForSynchronization(projectPath, report);
// Analyze third-party dependencies
analyzeDependencies(projectPath, report);
return report;
}
private void scanForSynchronization(String path,
MigrationReport report) {
// AST analysis to find synchronized blocks
JavaParser parser = new JavaParser();
Files.walk(Paths.get(path))
.filter(p -> p.toString().endsWith(".java"))
.forEach(file -> {
try {
CompilationUnit cu = parser.parse(file).getResult()
.orElseThrow();
cu.accept(new VoidVisitorAdapter<Void>() {
@Override
public void visit(SynchronizedStmt n, Void arg) {
report.addSynchronizedBlock(
file.toString(),
n.getBegin().get().line
);
super.visit(n, arg);
}
}, null);
} catch (Exception e) {
e.printStackTrace();
}
});
}
}
}
Virtual Thread Migration Patterns
public class VirtualThreadMigrationPatterns {
// Pattern 1: Executor Service Migration
public class ExecutorMigration {
// Before: Fixed thread pool
private final ExecutorService oldExecutor =
Executors.newFixedThreadPool(100);
// After: Virtual thread executor
private final ExecutorService newExecutor =
Executors.newVirtualThreadPerTaskExecutor();
// Migration wrapper for gradual transition
public ExecutorService getMigratedExecutor(boolean useVirtual) {
return useVirtual ? newExecutor : oldExecutor;
}
}
// Pattern 2: CompletableFuture Migration
public class AsyncMigration {
// Before: Custom thread pool for async operations
private final ForkJoinPool customPool = new ForkJoinPool(50);
public CompletableFuture<String> oldAsyncMethod() {
return CompletableFuture.supplyAsync(() -> {
return blockingOperation();
}, customPool);
}
// After: Virtual threads handle blocking naturally
public CompletableFuture<String> newAsyncMethod() {
return CompletableFuture.supplyAsync(() -> {
return blockingOperation();
}); // Uses virtual threads by default
}
}
// Pattern 3: Thread-Local Migration
public class ThreadLocalMigration {
// Before: ThreadLocal for connection management
private static final ThreadLocal<Connection> connectionTL =
new ThreadLocal<>();
// After: Scoped values for virtual threads
private static final ScopedValue<Connection> connectionSV =
ScopedValue.newInstance();
public void migratedMethod() {
try (Connection conn = getConnection()) {
ScopedValue.where(connectionSV, conn)
.run(() -> performDatabaseOperations());
}
}
}
}
GraalVM Native Image Migration
public class NativeImageMigration {
// Step 1: Prepare reflection configuration
@AutomaticFeature
public class ReflectionRegistrationFeature implements Feature {
@Override
public void beforeAnalysis(BeforeAnalysisAccess access) {
// Scan for @Entity annotations
access.findAnnotatedClasses(Entity.class)
.forEach(this::registerForReflection);
// Register Spring components
access.findAnnotatedClasses(Component.class)
.forEach(this::registerForReflection);
}
private void registerForReflection(Class<?> clazz) {
RuntimeReflection.register(clazz);
RuntimeReflection.registerForReflectiveInstantiation(clazz);
// Register all methods and fields
Arrays.stream(clazz.getDeclaredMethods())
.forEach(RuntimeReflection::register);
Arrays.stream(clazz.getDeclaredFields())
.forEach(RuntimeReflection::register);
}
}
// Step 2: Handle native library dependencies
public class NativeLibraryConfiguration {
@Substitute
@TargetClass(className = "io.netty.channel.epoll.Native")
static final class SubstituteEpollNative {
@Substitute
private static void loadNativeLibrary() {
// Load native library at build time
if (ImageInfo.inImageBuildtimeCode()) {
System.loadLibrary("netty_transport_native_epoll");
}
}
}
}
}
Container Migration Best Practices
public class ContainerMigrationGuide {
// Dockerfile for multi-stage build
public String generateOptimizedDockerfile() {
return """
# Build stage with GraalVM
FROM ghcr.io/graalvm/graalvm-ce:java21-22.3.0 AS builder
# Install native-image
RUN gu install native-image
WORKDIR /app
COPY . .
# Build JAR
RUN ./gradlew build
# Build native image with optimizations
RUN native-image \
-cp build/libs/app.jar \
-H:+ReportExceptionStackTraces \
-H:+OptimizeForSize \
-H:+StaticExecutableWithDynamicLibC \
--initialize-at-build-time=org.slf4j \
--initialize-at-run-time=io.netty \
--no-fallback \
-o application
# Runtime stage
FROM debian:slim
# Install minimal dependencies
RUN apt-get update && apt-get install -y \
libstdc++6 \
zlib1g \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/application /app/application
# Run as non-root
RUN useradd -m appuser
USER appuser
EXPOSE 8080
ENTRYPOINT ["/app/application"]
""";
}
// Kubernetes deployment optimized for JVM
public String generateOptimizedDeployment() {
return """
apiVersion: apps/v1
kind: Deployment
metadata:
name: optimized-jvm-app
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: myapp:native
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
env:
- name: JAVA_OPTS
value: >-
-XX:+UseContainerSupport
-XX:MaxRAMPercentage=80.0
-XX:+UseZGC
-XX:+ZGenerational
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 1
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
""";
}
}
Performance Testing Framework
public class PerformanceRegressionFramework {
@Test
public void preventPerformanceRegression() {
PerformanceBaseline baseline = loadBaseline();
PerformanceResult current = runPerformanceTests();
// Assert no regression
assertThat(current.p99Latency)
.isLessThanOrEqualTo(baseline.p99Latency * 1.1); // 10% tolerance
assertThat(current.throughput)
.isGreaterThanOrEqualTo(baseline.throughput * 0.95); // 5% tolerance
// Update baseline if improved
if (current.isBetterThan(baseline)) {
saveNewBaseline(current);
}
}
private PerformanceResult runPerformanceTests() {
// Use JMH for accurate benchmarking
Options options = new OptionsBuilder()
.include(ApplicationBenchmark.class.getSimpleName())
.warmupIterations(3)
.measurementIterations(5)
.forks(1)
.build();
Collection<RunResult> results = new Runner(options).run();
return extractPerformanceMetrics(results);
}
}
Conclusion
The JVM performance landscape in 2025 represents a paradigm shift in how we build and optimize Java applications. Virtual threads have eliminated traditional concurrency bottlenecks, GraalVM has revolutionized startup performance and memory efficiency, and container-aware JVMs have made cloud-native deployments more efficient than ever.
Key takeaways for enterprise architects:
-
Virtual Threads: Enable massive concurrency with minimal resource overhead, but require careful attention to pinning and synchronization patterns.
-
GraalVM Native Images: Deliver sub-second startup times and reduced memory footprint, but demand upfront configuration and testing investment.
-
Container Optimization: Modern JVMs are container-aware, but optimal performance still requires explicit tuning and monitoring.
-
Holistic Approach: Combine all three technologies for maximum benefit - virtual threads for concurrency, GraalVM for efficiency, and container tuning for cloud deployment.
-
Continuous Monitoring: New performance characteristics require updated monitoring and profiling strategies.
The future of JVM performance is bright, with continued innovations in adaptive optimization, AI-driven tuning, and cloud-native features. By mastering these technologies today, architects can build systems that are not just faster and more efficient, but fundamentally more scalable and maintainable.
Remember: performance optimization is an iterative process. Start with measurement, implement changes incrementally, and always validate improvements in production-like environments. The tools and techniques covered in this guide provide the foundation for building the next generation of high-performance Java applications.