Naive PINNs are known to have difficulties in simulating physical processes that are sensitive to small changes in the input and require a high degree of accuracy to accurately capture their dynamics. Examples of those physical systems include multi-scale problems and singular perturbation problems, which are highly relevant to domains like fluid dynamics and climate modeling.
It turns out, the same issue is also experienced by other machine learning algorithms, and a promising way to resolve this issue is by adopting the “Gradient Boosting” method. Therefore, a natural question arises: can we mimic the gradient boosting algorithm to train PINNs? The paper has given a positive answer.
Boosting is a general machine learning algorithm that can be succinctly expressed in the following iterative form:
At each boosting round, an incremental model hₘ(•) is derived and added (discounted by a learning rate ρₘ) on top of the predictor from the last iteration fₘ_₁(•), such that the accuracy of fₘ(•) could be “boosted”.
Now, if we replace fₘ_₁(•), fₘ(•), and hₘ(•) as physics-informed neural networks, we can realize training PINNs with boosting algorithm. A diagram to showcase the training process is given below:
In the paper’s implementation, the architecture and the hyperparameters of the additive PINN model hₘ(•) are pre-determined. This is different from the original gradient-boosting algorithm, as the original algorithm would utilize gradient descent to find the optimal hₘ(•) form. However, the authors claimed that using pre-selected hₘ(•)s can still mimic the behavior of boosting algorithm, but with significantly reduced computational complexity.
According to the numerical experiments conducted in the paper, usually, 3~5 PINNs are good enough to deliver satisfactory results. For setting the learning rate ρₘ, the suggested way would be to set the initial ρ as 1, and exponentially decay the ρ value as m increases.
2.3 Why the solution might work
As the proposed solution mimics the mechanism of the traditional “Gradient Boosting”, it automatically inherits all the nice things offered by the approach: by sequentially adding weak models, each new model is able to correct the mistakes made by the previous models, thus iteratively improving the overall performance. This makes the approach especially effective for challenging problems such as multi-scale or singular perturbation problems.
Meanwhile, for boosting algorithm, a “strong” model can still be achieved even if the component model at each boosting stage is relatively “weak”. This property has the benefit of making the overall PINN model less sensitive to hyperparameter settings.
The paper benchmarked the performance of the proposed strategy on four diverse problems, each representing a distinct mathematical challenge:
- 1D singular perturbation problem: singular perturbation problems are special cases where certain terms in the equations become disproportionately small or large, leading to different behaviors that are challenging to model. These problems often occur in many areas of science and engineering, such as fluid dynamics, electrical circuits, and control systems.
- 2D convection-dominated diffusion equation: this equation models physical phenomena where the convection effect (transport due to bulk motion) is much stronger than the diffusion effect (transport due to concentration gradients). These types of problems occur in various areas like meteorology (where wind disperses pollutants) and oceanography (where ocean currents transport heat).
- 2D convection-dominated diffusion problem (featuring curved streamlines and an interior boundary layer): this is a more complex variant of the previous problem where the flow pattern is curved, and there is a significant boundary layer within the problem domain. These complications require a more sophisticated numerical approach and make the problem more representative of real-world challenges.
- 2D nonlinear reaction-diffusion equation (time-dependent): this equation models reactions combined with the diffusion of substances, but it’s also nonlinear and changes over time. These types of problems are common in fields like biology and chemistry, where substances interact and spread in a medium, and the reaction rates can change over time.
The benchmark studies yielded that:
- the proposed algorithm showed substantial accuracy improvements across all test cases compared with naive PINNs;
- the proposed algorithm showed robustness, with little sensitivity to the hyperparameter choices.
2.5 Strengths and Weaknesses
- Significantly improved accuracy compared to a single PINN.
- Robust against the choice of network structure and arrangement.
- Fewer efforts are required for fine-tuning hyperparameters.
- Flexible and can be easily integrated with other PINNs techniques.
- Not suitable for solving conservation laws with derivative blow-ups (e.g., inviscid Burgers’ equation, Sod shock tube problem, etc.), which is due to the lack of sensitivity of these equations’ solutions to PDE loss.
- Limitations in terms of scalability, as it may require more computational resources and time to train multiple neural networks in sequence.
Since this is the first paper that introduces the boosting algorithm to the PINN domain, there is currently no similar work as the current paper.
Nevertheless, in terms of enhancing the PINN’s capability of modeling challenging physical processes, the paper specifically mentioned the work of Krishnapriyan et al. There, the strategy is to divide the time domain into sub-intervals, and PINNs are built progressively to model each of the sub-intervals (similar to the idea covered in the previous blog).
The current paper compared Krishnapriyan’s approach with the newly proposed one in the last benchmark case study (section 2.4 above). The results showed that the proposed boosting approach is able to achieve 4 times lower error.