## 2.2 Solution š”

The key idea here is to **re-formulate the PINN loss function**.

Specifically, we can introduce a dynamic weighting scheme to account for different contributions of PDE residual loss evaluated at different temporal locations. Letās break it down using illustrations.

For simplicity, letās assume the collocation points are uniformly sampled in the spatial-temporal domain of our simulation, as illustrated in the figure below:

To proceed with one step of gradient descent, we must first calculate the cumulative PDE residual loss across all collocation points. One specific way to do that is by first calculating the losses related to the collocation points sampled at individual time instances, and then performing a āsimple sumā to get the total loss. The following gradient descent step can then be conducted based on the calculated total loss to optimize the PINN weights.

Of course, the exact order of summation over collocation points doesnāt influence the total loss computation; all methods yield the same result. However, the decision to group loss calculations by temporal order is purposeful, designed to emphasize the element of ātemporalityā. This concept is crucial for understanding the proposed causal training strategy.

In this process, the PDE residual losses evaluated at different temporal locations are treated equally. meaning that all temporal residual losses are simultaneously minimized.

This approach, however, risks the PINN violating temporal causality, as it doesnāt enforce a chronological regularization for minimizing the temporal residual loss at successive time intervals.

So, how can we coax PINN to adhere to the temporal precedence during training?

The secret is in **selectively weighting individual temporal residual losses**. For instance, suppose that at the current iteration, we want the PINN to focus on approximating the solutions at time instance *t*ā. Then, we could simply put a higher weight on Lįµ£(*t*ā), which is the temporal residual loss** **at *t*ā. This way, Lįµ£(*t*ā) will become a dominant component in the final total loss, and as a result, the optimization algorithm will prioritize minimizing Lįµ£(*t*ā), which aligns with our goal of approximating solutions at time instance *t*ā first.

In the subsequent iteration, we shift our focus to the solutions at time instance tā. By increasing the weight on Lįµ£(tā), it now becomes the main factor in the total loss calculation. The optimization algorithm is thus directed towards minimizing Lįµ£(tā), improving the prediction accuracy of the solutions at tā.

As can be seen from our previous walk-through, varying the weights assigned to temporal residual losses at different time instances enables us to direct the PINN to approximate solutions at our chosen time instances.

So, how does this assist in incorporating a causal structure into PINN training? It turns out, we can design a causal training algorithm (as proposed in the paper), such that **the weight for the temporal residual loss at time t, i.e., Lįµ£(t), is significant only when the losses before t (Lįµ£(t-1), Lįµ£(t-2), etc.) are sufficiently small**. This effectively means that the neural network begins minimizing Lįµ£(t) only when it has achieved satisfactory approximation accuracy for prior steps.

To determine the weight, the paper proposed a simple formula: the weight Ļįµ¢ is set to be inversely exponentially proportional to the magnitude of the cumulative temporal residual loss from all the previous time instances. This ensures that the weight Ļįµ¢ will only be active (i.e., with a sufficiently large value) when the cumulative loss from all previous time instances is small, i.e., PINN can already accurately approximate solutions at previous time steps. This is how *temporal causality* is reflected in the PINN training.

With all components explained, we can piece together the full causal training algorithm as follows:

Before we conclude this section, there are two remarks worth mentioning:

- The paper suggested using the magnitude of Ļįµ¢ as the stopping criterion for PINN training. Specifically, when all Ļįµ¢ās are larger than a pre-defined threshold Ī“, the training may be deemed completed. The recommended value for Ī“ is 0.99.
- Selecting a proper value for Īµ is important. Although this value can be tuned via conventional hyperparameter tuning, the paper recommended an annealing strategy for adjusting Īµ. Details can be found in the original paper (section 3).

## 2.3 Why the solution might work š ļø

By dynamically weighting temporal residual losses evaluated at different time instances, the proposed algorithm is able to steer the PINN training to first approximate PDE solutions at earlier times before even trying to resolve the solution at later times.

This property facilitates the explicit incorporation of temporal causality into the PINN training and constitutes the key factor in potentially more accurate simulations of physical systems.

## 2.4 Benchmark ā±ļø

The paper considered a total of 3 different benchmark equations. All problems are forward problems where PINN is used to solve the PDEs.

- Lorenz system: these equations arise in studies of convection and instability in planetary atmospheric convection. Lorenz system exhibits strong sensitivity to its initial conditions, and it is known to be challenging for vanilla PINN.

- KuramotoāSivashinsky equation: this equation describes the dynamics of various wave-like patterns, such as flames, chemical reactions, and surface waves. It is known to exhibit a wealth of spatiotemporal chaotic behaviors.

- Navier-Stokes equation: this set of partial differential equations describes the motion of fluid substances and constitutes the fundamental equations in fluid mechanics. The current paper considered a classical two-dimensional decaying turbulence example in a square domain with periodic boundary conditions.

The benchmark studies yielded that:

- The proposed causal training algorithm was able to achieve 10ā100x improvements in accuracy compared to the vanilla PINN training scheme.
- Demonstrated that PINNs equipped with causal training algorithm can successfully simulate highly nonlinear, multi-scale, and chaotic systems.

## 2.5 Strengths and Weaknesses ā”

**Strengths **šŖ

- Respects the causality principle and makes PINN training more transparent.
- Introduces significant accuracy improvements, allowing it to tackle problems that have remained elusive to PINNs.
- Provides a practical quantitative criterion for assessing the training convergence of PINNs.
- Negligible added computational cost compared to the vanilla PINN training strategy. The only added cost is to compute the Ļįµ¢ās, which is negligible compared to auto-diff operations.

**Weaknesses **š

- Introduced a new hyperparameter Īµ, which controls the scheduling of the weights for temporal residual losses. Although the authors proposed an annealing strategy as an alternative to avoid the tedious hyper-parameter tuning.
- Complicated the PINN training workflow. Special attention should be given to the temporal weights Ļįµ¢ās, as they are now functions of the network trainable parameters (e.g., layer weights and bias), and the gradient associated with the computation of Ļįµ¢ should not be back-propagated.

## 2.6 Alternatives š

There are a couple of alternative methods that are trying to address the same issue as the current ācausal training algorithmā:

- Adaptive time sampling strategy (Wight et al.): instead of weighting the collocation points at different time instances, this strategy modifies the sampling density of collocation points. This has a similar effect of shifting the focus of the optimizer on minimizing temporal losses at different time instances.
- āTime-marchingā/āCurriculum trainingā strategy (e.g., Krishnapriyan et al.): the temporal causality is respected via learning the solution sequentially within separate time windows.

However, compared to those alternative approaches, the ācausal training algorithmā put temporal causality front and center, is more adaptable to a variety of problems, and enjoys low added computational cost.

There are several possibilities to further improve the proposed strategy:

- Incorporating more sophisticated data sampling strategies, such as adaptive- and residual-based sampling methods, to further improve the training efficiency and accuracy.

To learn more about how to optimize the residual points distribution, check out this blog in the PINN design pattern series.

- Extend to inverse problem settings. How to ensure casualty when point sources of information (i.e., observational data) are available would require an extension of the currently proposed training strategy.

In this blog, we looked at how to bring causality to PINN training with a reformulation of the training objectives. Here are the highlights of the design pattern proposed in the paper:

- [Problem]: How to make PINNs respect the causality principle underpinning the physical systems?
- [Solution]:
**Re-formulating the PINN training objective**, where a dynamic weighting scheme is introduced to gradually shift the training focus from earlier time steps to later time steps. - [Potential benefits]: 1. Significantly improved PINNsā accuracy. 2. Expanded the applicability of PINNs to complex problems.

Here is the PINN design card to summarize the takeaways:

I hope you found this blog useful! To learn more about PINN design patterns, feel free to check out previous posts:

Looking forward to sharing more insights with you in the upcoming blogs!

[1] Wang et al., Respecting causality is all you need for training physics-informed neural networks, arXiv, 2022.