Model-Free Reinforcement Learning for Chemical Process Development | by Georgi Tancev | Jul, 2023


Towards Universal Chemical Process Operators

Photo by Alex Kondratiev on Unsplash

Process development, design, optimization, and control are some of the main duties within chemical and process engineering. In concrete terms, the scope is finding an optimal recipe or suitable configuration of equipment or process parameters (via laboratory experiments) so that certain objectives (e.g., yield or throughput) are maximized while potential constraints (e.g., input concentrations, flow rates, reactor volumes, or boiling points of solvents) are respected. By automating these tasks, e.g., through laboratory robots, a great deal of manual labor could be saved.

The recent progress in reinforcement learning (RL) made it clear that agents can master complex tasks and play a variety of games, or even discover more efficient mathematical procedures, e.g., for matrix operations. With the availability of kinetic parameters, either from experiments or numerical simulations, agents may find optimal configurations and synthesis recipes. In contrast to convex optimization, however, the algorithm/model can be directly used for process control. Such experiments can take place either on the computer or directly in the laboratory, depending on the sample efficiency of the method. In the long term, this would (partially) automate process development. The scope of this article is to illustrate this on the example of paracetamol using proximal policy optimization (PPO).

We have a computer program, a so-called agent, here we call it an universal chemical process operator. This operator finds itself in an environment in which it can perform chemical operations, i.e., actions. Such actions include dosing component A, increasing/decreasing input/output flow, increasing/decreasing temperature, and so on. As the agent perform actions in certait states such as concentrations of certain components, it transitions into new states.

Paracetamol (PC) is synthesized from p-aminophenol (AP) and acetic anhydride (AA), shown in Fig. 1a. Under known kinetics, this process can be modeled and represents the environment, e.g., in a continuous stirred-tank reactor (CSTR) as shown in Fig…



Source link

Leave a Comment