How to Speed Up OpenFOAM Simulations Using Parallel Processing
Blog post description.
ARTICLES
Wiratama
3/7/20263 min read
OpenFOAM simulations can become computationally expensive, especially when dealing with large meshes, complex geometries, or transient flow problems. A single simulation may take hours or even days to complete if it runs on only one processor. One of the most effective ways to reduce computation time is by using parallel processing. Parallel processing allows OpenFOAM to divide the computational domain into smaller subdomains that can be solved simultaneously by multiple CPU cores.


In a parallel simulation, the mesh is partitioned into several regions, and each processor is responsible for solving the governing equations in one of those regions. During the simulation, processors communicate with each other to exchange information along the boundaries between subdomains. This approach allows the workload to be distributed across multiple cores, significantly reducing total simulation time compared to a serial run.
The first step in running a parallel simulation in OpenFOAM is preparing the case for domain decomposition. This is done by configuring the decomposeParDict file located inside the system directory of the case folder. This file defines how the computational domain will be divided and how many processors will be used. The number of subdomains specified in this file typically matches the number of CPU cores available for the simulation.
Inside the decomposeParDict file, users can select different decomposition methods. Some common methods include simple, hierarchical, and scotch. The simple method divides the domain according to a specified number of partitions in each coordinate direction. The hierarchical method uses a structured approach that considers the topology of the computational domain. The scotch method is often preferred for complex geometries because it automatically partitions the mesh to balance the computational load between processors.
Once the decomposition configuration is defined, the next step is to execute the decomposePar command. This command reads the mesh and splits it into subdomains according to the settings in decomposeParDict. After running this command, new processor directories are created inside the case folder. Each processor directory contains a portion of the mesh and the associated field data.
After the domain has been decomposed, the simulation can be executed in parallel. This is typically done using the mpirun command together with the selected OpenFOAM solver. The command specifies how many processors should be used and launches the solver simultaneously across those processors. Each processor performs calculations on its assigned subdomain while communicating with neighboring processors when necessary.
During the simulation, OpenFOAM automatically handles the exchange of information between subdomains. For example, velocity and pressure values along the boundaries between partitions must be shared so that the solution remains physically consistent. Efficient communication between processors is an important factor that influences the overall speed of parallel simulations.
Once the simulation finishes, the results are still stored separately inside the processor directories. To combine them into a single dataset, the reconstructPar command is used. This command gathers the results from all processors and reconstructs the full computational domain. After reconstruction, the results can be visualized normally using tools such as ParaView.
Parallel processing can significantly reduce simulation time, but its efficiency depends on several factors. One important factor is the size of the mesh. If the mesh is too small, the overhead associated with processor communication may outweigh the benefits of parallelization. In such cases, running the simulation on many processors may not improve performance.
Another factor is load balancing. Ideally, each processor should receive roughly the same amount of computational work. If some processors receive much larger portions of the mesh than others, those processors will become bottlenecks while others remain idle. Choosing an appropriate decomposition method helps ensure balanced workloads.
Hardware configuration also plays a role in parallel performance. Simulations that run on multi-core workstations may experience faster communication between processors because they share the same memory system. In contrast, simulations distributed across multiple nodes in a cluster rely on network communication, which may introduce additional latency.
Efficient parallel simulations also require careful monitoring of solver performance. Checking solver logs, monitoring residuals, and evaluating CPU usage can help determine whether the simulation is scaling efficiently. If performance gains become small when adding more processors, it may indicate that communication overhead is limiting further speed improvements.
Using parallel processing effectively allows engineers to tackle much larger CFD problems than would otherwise be possible. Large meshes, transient simulations, and high-resolution models become manageable when computational workloads are distributed across multiple processors. By understanding how domain decomposition, solver execution, and result reconstruction work together, OpenFOAM users can significantly accelerate their simulations while maintaining reliable results.
cfdcourse.com
Contact
wiratama@pttensor.com
+62-821-3868-4162
© 2026. All rights reserved.
