Skip to content

CAUTION: This site hosts draft documentation for the next release. For published content of the latest release, visit www.eclipse.org/openj9/docs

JITServer tuning and practical considerations

Server caches

Client-session caches

Multiple client JVMs can be connected at the same time to a single JIT server. For each client, the server maintains a client-session cache with information about the environment the client is running in (Java classes, class hierarchy, profiling information, JVM options, etc.). Typically, the information in these caches is kept separately, per client. However, if you specify the -XX:+JITServerShareROMClasses option, the read-only part of the Java classes (ROMClasses in OpenJ9 parlance) is shared between the different clients. This option can generate memory savings at the server when the connected clients run identical or similar Java applications.

The client-session caches are deleted when the clients terminate, but this can happen only if the clients are shutdown gracefully, giving them the opportunity to send a termination message to the server. To address the scenario of clients ending abruptly, the server also deletes the cache for a client that hasn’t issued a compilation request for 1000 minutes, or 5 minutes under memory pressure. If needed, you can change these values with the following options:

-Xjit:oldAge=<time-in-ms>,oldAgeUnderLowMemory=<time-in-ms>

JITServer AOT cache

The JITServer technology can cache AOT compiled methods at the server. The JITServer can, therefore, avoid carrying out an AOT compilation when a compatible AOT method body already exists in the cache, thereby saving CPU resource and improving remote compilation latency. This mechanism works in conjunction with the dynamic AOT technology at the client and therefore the client needs to have the shared class cache (SCC) enabled (the SCC is the repository for the AOT code).

When the JITServer receives an AOT compilation request, it checks its AOT cache for a compatible compiled method body. If one is not found, the server performs the AOT compilation, sends the response to the client JVM, then serializes the compiled method and stores it in its local AOT cache, for future use. If a compatible compiled method is found, the server sends the client the serialized compiled method from its cache, thus avoiding a compilation. The client deserializes the response, stores the result in its local SCC, and loads the compiled method as a regular dynamic AOT code.

To enable this feature, specify the -XX:+JITServerUseAOTCache command line option, both at the server and at the client JVM.

A JITServer instance can have several AOT caches, each with its own name. This addresses the situation when client JVMs with significantly different profiles of execution use the same JITServer instance. A client JVM can indicate a specific AOT cache it wants to use by providing its name with the following command line option -XX:JITServerAOTCacheName=<cache_name>. The default is to use a nameless cache.

Current limitations:

  • The amount of memory that an AOT cache can consume at the server is not limited. The number of caches that a JITServer can hold is also not limited.
  • The AOT cache is a non-persistent in-memory cache. If the JITServer instance ends, the cache content is lost.
  • AOT cache entries are not shared between different JITServer instances.
  • Caching works only for AOT compilation requests. For this reason, when JITServer AOT caching is enabled, the client JVM will attempt to generate as many AOT requests as possible.

Number of concurrent clients

The amount of CPU and memory resources consumed by the server is expected to increase with the number of connected clients. Finding the appropriate number of clients to connect to a server is a tricky proposition that depends on many factors: number of methods that need to be compiled by the clients, optimization levels for these compilations, how clients are started (staggered or not), how clients are shutdown (gracefully or not), etc.

As a rule of thumb, you should have 10-20 JVMs simultaneously connected to a server with 1-2 GB of memory. With respect to CPU resources, in Kubernetes you might want to set a low "request" value at the server (1-2 vCPUs) and a larger "limit" value (4-8 vCPUs) in order to soak all those idle cycles. It is possible to connect even more clients to one server instance if memory and CPU resources are increased, but in general, two medium-sized server instances placed on different nodes are better than a single, larger server.

Alleviating CPU congestion at the server

When too many clients connect to the server, the server can become flooded with compilation requests, leading to increased compilation times and slower start-up/ramp-up for applications. It should be noted that a client JVM issues most of its compilation requests during the start-up phase and ramp-up phase of an application, when load is first applied to it. Thus, from the CPU consumption point of view what matters is the number of clients that start-up or ramp-up concurrently. To alleviate the CPU strain on the server, you can start the client JVMs in a staggered fashion, rather than all at the same time. Sometimes the staggering happens naturally; for instance, when using Kubernetes horizontal pod auto-scaling, additional application instances are launched gradually as the load increases.

Another idea is to use the -Xjit:enableJITServerHeuristics command line option at the clients. When this option is present, the client JVMs share some of the compilation burden by performing the cheap compilations locally and send only expensive compilations to the server. What constitutes a cheap compilation is determined by JIT heuristics that look at the method size, optimization level and the amount of CPU and memory available to the JVM.

Avoiding memory shortages at the server

Roughly speaking, the server uses two types of memory: 1. "Scratch" memory. This is allocated during a compilation (for JIT internal data structures) and released to the operating system at the end of the compilation. 2. "Persistent" memory. This is used for client-session caches and gets deleted only when a client terminates gracefully (or when the JITServer purging mechanism is triggered).

The total amount of scratch memory at any given moment depends on how many compilations are in progress and how expensive those compilations are. To reduce this amount, you can start the clients in a staggered fashion as suggested previously, or reduce the number of compilation threads per client. Note that the latter already happens automatically: when the server senses that it is about to run out of memory, it provides feedback to the connected clients to reduce their number of active compilation threads.

To reduce the amount of persistent memory, you can use the techniques described in section Server caches.

Traffic encryption

Enabling network encryption can increase the CPU overhead, both at the client and at the server. For this reason, you should turn on encryption only if needed. Note that some technologies like Istio, Weave, Linkerd, Calico, Cilium already encrypt all network traffic, so using JITServer encryption might be redundant.

Minimizing application stalls

Usually, the compilation threads in OpenJ9 JVM execute in parallel with Java application threads. However, for correctness reasons a small number of compilations are performed synchronously, meaning that Java application threads have to wait for the compilation result before being allowed to execute the method being compiled. Since remote compilations typically take longer to complete due to network latency, application stalls caused by synchronous compilations can be more severe in a JITServer setting. If this becomes a problem, you should add the following command line option at the client:

-XX:+JITServerLocalSyncCompiles

This option instructs the client JVM to perform the synchronous compilations locally, at a low optimization level (thus the compilation is relatively quick), and to follow-on with remote asynchronous recompilations at a higher optimization level to avoid any performance loss.

Session affinity

For technical reasons, a client JVM must use a single JITServer at a time. In a Kubernetes environment, where a JITServer service can be backed up by several server instances, you can satisfy this requirement by using session affinity. Note that if a server crashes (or gets terminated by the Kubernetes controller) the clients can connect to another server instance. This scenario imposes some performance penalty because the client-session caches that the server maintains need to be built anew. Following is an example of a Kubernetes service definition that uses sessionAffinity:

apiVersion: v1
kind: Service
metadata:
  name: jitserver
spec:
  type: ClusterIP
selector:
    app: jitserver
  ports:
    - protocol: TCP
      port: 38400
      targetPort: 38400
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 86400

Resilience

If the client JVM does not find a compatible server to connect to, compilations are performed locally, by the client itself. To account for the case where the server is temporarily unavailable (for example, server crash followed by Kubernetes launching another server instance), from time to time the client retries to connect to a server at the indicated address and port. The retry mechanism uses an exponential back-off where the retry interval is doubled with each unsuccessful attempt.

Monitoring

Performance metrics

You can enable the provision of performance metrics by specifying the -XX:+JITServerMetrics command line option. After enabling this option, you can use a monitoring tool that follows the OpenMetrics standard, such as Prometheus, to collect the data by issuing an HTTP GET request to the following url: http://<jitserveraddress>:<port>/metrics.

Note: There is a limit of four concurrent GET requests at any given time.

For more information, including the types of metrics that are provided, see the -XX:[+|-]JITServerMetrics topic.

Verbose logging

You can inspect the behavior of a JITServer instance by using the OpenJ9 verbose logging facility. Note that if the name of the verbose log is not specified, the relevant information is printed to stderr. When you use the -XX:+JITServerLogConnections command line option, the server prints a message to the verbose log every time a new client JVM connects to it or disconnects from it. This is an easy way to determine that the clients are able to reach the server. Example of output:

#JITServer: t= 74232 A new client (clientUID=14692403771747196083) connected. Server allocated a new client session.
#JITServer: t= 74282 A new client (clientUID=2599593246759846167) connected. Server allocated a new client session.
#JITServer: t= 86281 Client (clientUID=14692403771747196083) disconnected. Client session deleted

The server has a heart-beat thread that periodically prints to the verbose log information related to the number of clients connected, the number of active compilation threads, the amount of CPU used, the amount of available memory and the number of times the internal server caches have been cleared. This last bit of information is important for diagnosing performance problems. The heart-beat information is enabled with the following option:

-Xjit:statisticsFrequency=<period-in-ms>

Example of output:

#JITServer: CurrentTime: Aug 06 17:25:15 2021
#JITServer: Compilation Queue Size: 0
#JITServer: Number of clients : 2
#JITServer: Total compilation threads : 63
#JITServer: Active compilation threads : 2
#JITServer: Physical memory available: 14299 MB
#JITServer: CpuLoad 206% (AvgUsage 25%) JvmCpu 113%
...

A value greater than 0 for the Compilation Queue Size is a sign that the server is overloaded. Compilation requests that wait in the compilation queue face greater delays and run the risk of exceeding network timeouts. To avoid this scenario, you can reduce the number of connected clients, use the techniques described in section Alleviating CPU congestion at the server, or increase the number of compilation threads at the server by using the -XcompilationThreads option.

Increasing the maximum number of client threads can improve performance in high network latency settings because there can be more in-progress concurrent compilation requests. Increasing the number of threads at the server can improve performance if the server has many CPU cores available and serves a large number of clients concurrently.

More detailed diagnostics can be obtained with the option -Xjit:verbose={JITServer},verbose={compilePerformance}, which is typically used for debugging server behavior.