constrained stochastic dynamic programming does not converge

Xiaoyu Xu

2017 11월 18

0 답변

조회 수: 9 (30일)

0 개 추천

This is my first time to use Matlab to numerically solve a dynamic programming. I got stuck for a while so really appreciate any help. The distance between value functions P0 and P1 is decreasing in the beginning, but later it bounces and does not converge to zero.

I have 2 state variables and 21 control variables, and 2 inequality constraints (can be written as one equality constraint). I was using fmincon as the solver.

The below is my main code, and two m.files are attached.

    % parameter values
    beta = .95;
    S = 20;
    gamma = 2;
    lambda = .95;
    % probability distribution vector: Prob
    Prob = zeros(S,1); 
    % productivity realization vector: wbar 
    wbar = zeros(S,1);
    for s = 1:S
    Prob(s) = (1-lambda) / (1 - lambda^S) * lambda^(s-1);
    wbar(s) = s + 5;
    end
    % walking away continuation value
    vspot = 1 / (1 - beta) * Prob' * wbar.^(1 - gamma) / (1 - gamma);
    % space of the state variable: vspace
    vmin = -30;
    vmax = 50;
    ngrid = 99;
    grid = (vmax - vmin) / ngrid;
    vspace = (vmin:grid:vmax)'; 
    %%value function iteration
    % initials
    P0 = 30 * ones(ngrid + 1, S);          % initial value function
    P1 = zeros(ngrid + 1, S);
    ctrl = NaN(ngrid+1,S, S+1);           
    iterate = 0;
    maxits = 1000;
    critical = 0.1;
    dif = 1000;                        % begin with a large sup norm
    h = waitbar(0,'Value function iterating... Good luck this time!');
    while dif > critical && iterate < maxits
    tic;
    iterate = iterate + 1;
    for s = 1:S
        for i = 1:ngrid+1
            x0 = (2:2:42)'; 
            options = optimoptions(@fmincon,'Algorithm','sqp','Display','off');
            y = @(x) objfun(x,s,grid,ngrid,vmin,vspace,S,wbar,beta,Prob,P0);
            z = @(x) confuneq(x,i,s,vspace,wbar,beta,gamma,Prob,vspot);
            lb = vmin * ones(S+1,1); 
            ub = vmax * ones(S+1,1);
            [x,fval] = fmincon(y,x0,[],[],[],[],lb,ub,z,options);
            ctrl(i,s,1) = x(1);      % assign maximizers to control variables
            ctrl(i,s,2:end) = x(2:end);  
            P1(i,s) = - fval;        % assign maximized value to value function
        end
    end
    dif = norm(P1 - P0);
    P0 = P1;                       % replace value function with new matrix
    text = ['Currently norm(P1-P0)=', num2str(dif),' the last iteration is the ', num2str(iterate), 'th iteration, the maximum number of iterations is ', num2str(maxits)];
    text2 = ['Currently norm(P1-P0)=', num2str(dif)];
    waitbar(iterate / maxits);
    disp(text);
    toc;
    end