물리정보 신경망을 사용하여 PDE 해 구하기

다음 제품이 필요합니다.

이 예제에서는 편미분 방정식(PDE)의 해를 예측하는 물리정보 신경망(PINN)을 훈련시키는 방법을 보여줍니다.

물리정보 신경망(PINN)[1]은 자체 구조와 훈련 과정에 물리 법칙을 통합시킨 신경망입니다. 예를 들어 물리 시스템을 정의하는 PDE의 해를 출력하는 신경망을 훈련시킬 수 있습니다.

이 예에서는 샘플 $(x, t)$ 를 입력값으로 취하고 $u (x, t)$ 를 출력하는 PINN을 훈련시킵니다. 여기서 $u$ 는 다음 버거스 방정식의 해이고,

$\frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} - \frac{0.01}{π} \frac{\partial^{2} u}{\partial x^{2}} = 0,$

$u (x, 0) = - s i n (π x)$ 를 초기 조건으로, $u (- 1, t) = 0$ 및 $u (1, t) = 0$ 을 경계 조건으로 갖습니다.

다음 도식은 PINN을 통과하는 데이터 흐름을 보여줍니다.

Diagram of the data flow of the neural network. The input is x_1, x_2, ... x_N, and t. The output is the PDE solution u(x,t).

모델 훈련을 위해 데이터를 미리 수집할 필요가 없습니다. PDE와 제약 조건에 대한 정의를 사용해서 데이터를 생성할 수 있습니다.

훈련 데이터 생성하기

모델을 훈련시키려면 경계 조건과 초기 조건을 적용하고 버거스 방정식을 충족하는 연결 점(collocation point)들로 구성된 데이터 세트가 필요합니다.

각각의 경계 조건 $u (- 1, t) = 0$ 및 $u (1, t) = 0$ 을 적용하기 위한 균일한 간격의 시점 25개를 선택합니다.

numBoundaryConditionPoints = [25 25];

x0BC1 = -1*ones(1,numBoundaryConditionPoints(1));
x0BC2 = ones(1,numBoundaryConditionPoints(2));

t0BC1 = linspace(0,1,numBoundaryConditionPoints(1));
t0BC2 = linspace(0,1,numBoundaryConditionPoints(2));

u0BC1 = zeros(1,numBoundaryConditionPoints(1));
u0BC2 = zeros(1,numBoundaryConditionPoints(2));

초기 조건 $u (x, 0) = - s i n (π x)$ 를 적용하기 위한 균일한 간격의 공간 점 50개를 선택합니다.

numInitialConditionPoints  = 50;

x0IC = linspace(-1,1,numInitialConditionPoints);
t0IC = zeros(1,numInitialConditionPoints);
u0IC = -sin(pi*x0IC);

초기 조건 데이터와 경계 조건 데이터를 함께 그룹화합니다.

X0 = [x0IC x0BC1 x0BC2];
T0 = [t0IC t0BC1 t0BC2];
U0 = [u0IC u0BC1 u0BC2];

신경망 출력값이 버거스 방정식을 충족하도록 10,000개의 점 $(t, x) \in (0, 1) \times (- 1, 1)$ 을 균일하게 샘플링합니다.

numInternalCollocationPoints = 10000;

points = rand(numInternalCollocationPoints,2);

dataX = 2*points(:,1)-1;
dataT = points(:,2);

신경망 아키텍처 정의하기

다층 퍼셉트론 신경망 아키텍처를 정의합니다.

Diagram of the neual network architecture. The layers are connected in series. The input is passed through 8 fully connected layers that are each proceeded by a tanh layer. The output of the final tanh layer is connected to a fully connected layer, which outputs the network predictions.

반복되는 완전 연결 계층에는 출력 크기 20을 지정합니다.
마지막 완전 연결 계층에는 출력 크기 1을 지정합니다.

신경망 하이퍼파라미터를 지정합니다.

numBlocks = 8;
fcOutputSize = 20;

신경망을 생성합니다. 계층 블록이 반복되는 신경망을 손쉽게 생성하려면 repmat 함수를 사용하여 계층 블록을 반복할 수 있습니다.

fcBlock = [
    fullyConnectedLayer(fcOutputSize)
    tanhLayer];

layers = [
    featureInputLayer(2)
    repmat(fcBlock,[numBlocks 1])
    fullyConnectedLayer(1)]

layers = 
  18×1 Layer array with layers:

     1   ''   Feature Input     2 features
     2   ''   Fully Connected   20 fully connected layer
     3   ''   Tanh              Hyperbolic tangent
     4   ''   Fully Connected   20 fully connected layer
     5   ''   Tanh              Hyperbolic tangent
     6   ''   Fully Connected   20 fully connected layer
     7   ''   Tanh              Hyperbolic tangent
     8   ''   Fully Connected   20 fully connected layer
     9   ''   Tanh              Hyperbolic tangent
    10   ''   Fully Connected   20 fully connected layer
    11   ''   Tanh              Hyperbolic tangent
    12   ''   Fully Connected   20 fully connected layer
    13   ''   Tanh              Hyperbolic tangent
    14   ''   Fully Connected   20 fully connected layer
    15   ''   Tanh              Hyperbolic tangent
    16   ''   Fully Connected   20 fully connected layer
    17   ''   Tanh              Hyperbolic tangent
    18   ''   Fully Connected   1 fully connected layer

계층 배열을 dlnetwork 객체로 변환합니다.

net = dlnetwork(layers)

net = 
  dlnetwork with properties:

         Layers: [18×1 nnet.cnn.layer.Layer]
    Connections: [17×2 table]
     Learnables: [18×3 table]
          State: [0×3 table]
     InputNames: {'input'}
    OutputNames: {'fc_9'}
    Initialized: 1

  View summary with summary.

PINN을 훈련시킬 때 학습 가능 파라미터가 double형이면 정확도가 더 높아질 수 있습니다. dlupdate 함수를 사용하여 신경망의 학습 가능한 파라미터를 double형으로 변환합니다. 참고로, 모든 신경망이 double형의 학습 가능 파라미터를 지원하지는 않습니다. 예를 들어 single형 학습 가능 파라미터에 의존하는 GPU 최적화를 사용하는 신경망이 이에 해당합니다.

net = dlupdate(@double,net);

모델 손실 함수 정의하기

함수 modelLoss를 만듭니다. 이 함수는 신경망, 신경망 입력, 초기 조건, 경계 조건을 입력값으로 받고, 학습 가능한 파라미터에 대한 손실의 기울기와 해당 손실을 반환합니다.

모델은 입력값 $(x, t)$ 가 주어졌을 때 신경망의 출력값 $u (x, t)$ 가 버거스 방정식과 경계 조건, 초기 조건을 충족하도록 훈련됩니다. 특히 다음의 두 수량이 손실 최소화에 기여합니다.

$loss = {MSE}_{f} + {MSE}_{u}$ ,

여기서 ${MSE}_{f} = \frac{1}{N_{f}} \sum_{i = 1}^{N_{f}} {| f (x_{f}^{i}, t_{f}^{i}) |}^{2}$ 및 ${MSE}_{u} = \frac{1}{N_{u}} \sum_{i = 1}^{N_{u}} {| u (x_{u}^{i}, t_{u}^{i}) - u^{i} |}^{2}$ 이며, 함수 $f$ 는 버거스 방정식 좌변이고, ${x_{u}^{i}, t_{u}^{i}}_{i = 1}^{N_{u}}$ 는 계산 영역 경계에 있는 연결 점에 해당하며 경계 조건과 초기 조건을 모두 반영합니다. ${x_{f}^{i}, t_{f}^{i}}_{i = 1}^{N_{f}}$ 는 계산 영역 내부에 있는 점입니다.

function [loss,gradients] = modelLoss(net,X,T,X0,T0,U0)

% Make predictions with the initial conditions.
XT = cat(1,X,T);
U = forward(net,XT);

% Calculate derivatives with respect to X and T.
X = stripdims(X);
T = stripdims(T);
U = stripdims(U);
Ux = dljacobian(U,X,1);
Ut = dljacobian(U,T,1);

% Calculate second-order derivatives with respect to X.
Uxx = dldivergence(Ux,X,1);

% Calculate mseF. Enforce Burger's equation.
f = Ut + U.*Ux - (0.01./pi).*Uxx;
mseF = mean(f.^2);

% Calculate mseU. Enforce initial and boundary conditions.
XT0 = cat(1,X0,T0);
U0Pred = forward(net,XT0);
mseU = l2loss(U0Pred,U0);

% Calculated loss to be minimized by combining errors.
loss = mseF + mseU;

% Calculate gradients with respect to the learnable parameters.
gradients = dlgradient(loss,net.Learnables);

end

훈련 옵션 지정하기

다음과 같이 훈련 옵션을 지정합니다.

L-BFGS 솔버를 사용하여 훈련시키고, L-BFGS 솔버 상태에 대해 디폴트 옵션을 사용합니다.
훈련을 1500번 수행합니다.
기울기 또는 스텝의 노름이 $1 0^{- 5}$ 보다 작은 경우 훈련을 조기에 중지합니다.

solverState = lbfgsState;
maxIterations = 1500;
gradientTolerance = 1e-5;
stepTolerance = 1e-5;

신경망 훈련시키기

훈련 데이터를 dlarray 객체로 변환합니다. 입력값 X와 T의 형식이 "BC"(배치, 채널)이고 초기 조건의 형식이 "CB"(채널, 배치)임을 지정합니다.

X = dlarray(dataX,"BC");
T = dlarray(dataT,"BC");
X0 = dlarray(X0,"CB");
T0 = dlarray(T0,"CB");
U0 = dlarray(U0,"CB");

dlaccelerate 함수를 사용하여 손실 함수를 가속화합니다. 함수 가속화에 대한 자세한 내용은 Deep Learning Function Acceleration for Custom Training Loops 항목을 참조하십시오.

accfun = dlaccelerate(@modelLoss);

L-BFGS 업데이트 스텝에 대한 손실 함수를 포함하는 함수 핸들을 만듭니다. 자동 미분을 사용하여 modelLoss 함수 내에서 dlgradient 함수를 실행하기 위해 dlfeval 함수를 사용합니다.

lossFcn = @(net) dlfeval(accfun,net,X,T,X0,T0,U0);

TrainingProgressMonitor 객체를 초기화합니다. 각 반복마다 손실을 플로팅하고 기울기와 스텝의 노름을 모니터링합니다. monitor 객체를 생성할 때 타이머가 시작되므로 이 객체를 훈련 루프와 가깝게 생성해야 합니다.

monitor = trainingProgressMonitor( ...
    Metrics="TrainingLoss", ...
    Info=["Iteration" "GradientsNorm" "StepNorm"], ...
    XLabel="Iteration");

각 반복마다 전체 데이터 세트를 사용하는 사용자 지정 훈련 루프를 사용하여 신경망을 훈련시킵니다. 각 반복에 대해 다음을 수행합니다.

lbfgsupdate 함수를 사용하여 신경망의 학습 가능 파라미터와 솔버 상태를 업데이트합니다.
훈련 진행 상황 모니터를 업데이트합니다.
기울기 노름이 지정된 허용오차보다 작거나 직선 탐색이 실패하는 경우 훈련을 조기에 중지합니다.

iteration = 0;
while iteration < maxIterations && ~monitor.Stop
    iteration = iteration + 1;
    [net, solverState] = lbfgsupdate(net,lossFcn,solverState);

    updateInfo(monitor, ...
        Iteration=iteration, ...
        GradientsNorm=solverState.GradientsNorm, ...
        StepNorm=solverState.StepNorm);

    recordMetrics(monitor,iteration,TrainingLoss=solverState.Loss);

    monitor.Progress = 100*iteration/maxIterations;

    if solverState.GradientsNorm < gradientTolerance || ...
            solverState.StepNorm < stepTolerance || ...
            solverState.LineSearchStatus == "failed"
        break
    end

end

모델 정확도 평가하기

$t$ 값 0.25, 0.5, 0.75, 1에서, 딥러닝 모델의 예측 값을 버거스 방정식의 실제 해와 비교합니다.

모델을 테스트할 목표 시간을 설정합니다. 각 시간마다, [-1,1] 범위의 간격이 균일한 1001개 점에서 해를 계산합니다.

tTest = [0.25 0.5 0.75 1];
numObservationsTest = numel(tTest);

szXTest = 1001;
XTest = linspace(-1,1,szXTest);
XTest = dlarray(XTest,"CB");

모델을 테스트합니다. 각 테스트 입력값에 대해 PINN을 사용하여 PDE 해를 예측하고, 그 해를 solveBurgers 함수(예제의 버거스 방정식 함수의 해 구하기 섹션에 나와 있음)에 의해 주어진 해와 비교합니다. 이 함수에 액세스하려면 예제를 라이브 스크립트로 여십시오. 예측값과 목표값 사이의 상대 오차를 계산하여 정확도를 평가합니다.

UPred = zeros(numObservationsTest,szXTest);
UTest = zeros(numObservationsTest,szXTest);

for i = 1:numObservationsTest
    t = tTest(i);

    TTest = repmat(t,[1 szXTest]);
    TTest = dlarray(TTest,"CB");
    XTTest = cat(1,XTest,TTest);

    UPred(i,:) = forward(net,XTTest);
    UTest(i,:) = solveBurgers(extractdata(XTest),t,0.01/pi);
end

err = norm(UPred - UTest) / norm(UTest)

err = 
0.0237

테스트 예측값을 플롯으로 시각화합니다.

figure
tiledlayout("flow")

for i = 1:numel(tTest)
    nexttile
    
    plot(XTest,UPred(i,:),"-",LineWidth=2);

    hold on
    plot(XTest, UTest(i,:),"--",LineWidth=2)
    hold off

    ylim([-1.1, 1.1])
    xlabel("x")
    ylabel("u(x," + t + ")")
end

legend(["Prediction" "Target"])

버거스 방정식 함수의 해 구하기

solveBurgers 함수는 [2]에 명시된 대로 시간 t에서 버거스 방정식의 실제 해를 반환합니다.

function U = solveBurgers(X,t,nu)

% Define functions.
f = @(y) exp(-cos(pi*y)/(2*pi*nu));
g = @(y) exp(-(y.^2)/(4*nu*t));

% Initialize solutions.
U = zeros(size(X));

% Loop over x values.
for i = 1:numel(X)
    x = X(i);

    % Calculate the solutions using the integral function. The boundary
    % conditions in x = -1 and x = 1 are known, so leave 0 as they are
    % given by initialization of U.
    if abs(x) ~= 1
        fun = @(eta) sin(pi*(x-eta)) .* f(x-eta) .* g(eta);
        uxt = -integral(fun,-inf,inf);
        fun = @(eta) f(x-eta) .* g(eta);
        U(i) = uxt / integral(fun,-inf,inf);
    end
end

end

참고 문헌

Maziar Raissi, Paris Perdikaris, and George Em Karniadakis, Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations https://arxiv.org/abs/1711.10561
C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani, R. Peyret, P. Orlandi, A. Patera, Spectral and finite difference solutions of the Burgers equation, Computers & fluids 14 (1986) 23–41.

참고 항목

dlarray | dlfeval | dlgradient