Using multiple datasets to fit parameters simultaneously in SimBiology

조회 수: 13 (최근 30일)
I want to fit a PK model with multiple datasets; every dataset has concentration time courses for different species in the model - how do I do this? The time points in each dataset are not consistent, if that matters. I'm using MATLAB R2024b and the SimBiology Model Analyzer app.
My model has multiple compartments, and Compartment1 has two species called "RNA" and "PROTEIN".
The datasets look something like this:
Dataset1 which corresponds to RNA values in the plasma:
Dataset2 which corresponds to protein levels in the plasma:
I want to fit the model parameters to both the datasets, where I'm mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".
  댓글 수: 3
Arthur Goldsipe
Arthur Goldsipe 2025년 3월 15일
Are the model's initial conditions the same for both experiments? In other words, once you fit your model, would you need to do a single simulation or two separate simulations to predict these two concentrations?
Mukti
Mukti 2025년 3월 17일
The initial conditions are the same for both experiments - I would just do one single simulation to predict these two concentrations.

댓글을 달려면 로그인하십시오.

채택된 답변

Arthur Goldsipe
Arthur Goldsipe 2025년 3월 15일
편집: Arthur Goldsipe 2025년 3월 17일
You first need to decide whether these two concentration profiles should be treated as part of the same experiment/simulation.
If so, then you need to merge them into a single time course, using NaN to indicate missing measurements (presumably the same way you're using . at time 0). If you want to do that programmatically, you can use MATLAB's join operations. Here's what the merged data might look like using the first 4 rows of your datasets:
rna = table([0;0.08;0.24;0.49], [nan;17.11;8.22;18.6], VariableNames=["Time", "Plasma_RNA"] );
protein = table([0;0.24;1.91;3.1], [nan;10;97.1;90.1], VariableNames=["Time", "Serium_protein"]);
joinedData = outerjoin(rna,protein,Keys="Time",MergeKeys=true)
joinedData = 6x3 table
Time Plasma_RNA Serium_protein ____ __________ ______________ 0 NaN NaN 0.08 17.11 NaN 0.24 8.22 10 0.49 18.6 NaN 1.91 NaN 97.1 3.1 NaN 90.1
If they're different experiments, you will just need to stack them and add a grouping variable to indicate which measurment belongs to which experiment. Here's what that would look like using the first 4 rows of your datasets:
rna_id = [table(repmat(1,height(rna), 1), VariableNames="ID"), rna ];
protein_id = [table(repmat(2,height(protein),1), VariableNames="ID"), protein];
stackedData = outerjoin(rna_id,protein_id,Keys=["ID","Time"],MergeKeys=true)
stackedData = 8x4 table
ID Time Plasma_RNA Serium_protein __ ____ __________ ______________ 1 0 NaN NaN 1 0.08 17.11 NaN 1 0.24 8.22 NaN 1 0.49 18.6 NaN 2 0 NaN NaN 2 0.24 NaN 10 2 1.91 NaN 97.1 2 3.1 NaN 90.1
Once you have the data in one of these forms, you can perform the fit in SimBiology using sbiofit or the Model Analyzer app.

추가 답변 (2개)

Arthur Goldsipe
Arthur Goldsipe 2025년 3월 14일
SimBiology users typically do this by merging the multiple datasets into a single dataset and fitting them constructing an apprporiate fit problem.
If you need more guidance on that, take a look at previous similar questions:
If you still have remaining questions, I suggest you create a new MATLAB Answers question that provides more details. Ideally, if you could share sample code (data and model) that illustrate your situation. Also please clarify what version of MATLAB you're using and whether you are working in the SimBiology Model Analyzer app or writing your own MATLAB code.

Image Analyst
Image Analyst 2025년 3월 15일
Maybe I'm misunderstanding what you want to do, but why don't you combine both time vectors into a single time vector which you use to interpolate the missing times in each set using something like interp1. Then you will have values of serum and plasma at the same/common time points. Then if you want to do "mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".' you can use polyfit or fitnlm or some other fitting algorithm (see the Regression Learner app on the Apps tab of the tool ribbon) to make a transform/model relating serum to plasma.
  댓글 수: 1
Arthur Goldsipe
Arthur Goldsipe 2025년 3월 15일
SimBiology doesn't require measurements at the same times for all responses/species. You can just put NaN (not-a-number) in any place where you don't have a measurement.
Alternatively, SimBiology allows you to treat them as two separate time courses (requiring two different model simulations, with potentially different intial conditions or dosing). If they are different conditions, the two time courses just need to be "stacked" on top of each other, and another variable needs to be added to the data to indicate each time course. (I'll add a more complete answer for this shortly.)
Moreover, I strongly discourage interpolating values for at least two reasons:
First, interpolating could result in values that are not consistent with the underlying biology. Biological measurements are often quite noisy and highly nonlinear. So standard inpolation techniques are quite risky.
Second, adding interpolated "measurements" can bias the fitting and provide incorrect statistics in the results. For example, many statistical calculations require the degrees of freedom (dfe), which is the number of observations minuts the number of estimated parameters. Artificially inflating the number of observations will change the dfe, potentially leading to very differ parameter estimates, standard errors, and so forth.

댓글을 달려면 로그인하십시오.

커뮤니티

더 많은 답변 보기:  SimBiology Community

카테고리

Help CenterFile Exchange에서 Scan Parameter Ranges에 대해 자세히 알아보기

태그

제품


릴리스

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by