Output score which is nearest to image date

조회 수: 2 (최근 30일)
AV
AV 2019년 9월 2일
댓글: AV 2019년 9월 2일
I have 2 tables - a source table and a score table.
My source table contains an ID number with an image at a certain date.
My score table contains a matching ID number with multiple scores with an associated date.
How do I find the score for the correct ID that is nearest in time to my Image_date?
How do I also output the distance in months from the Image_date of the score to another table?
Source table:
ID Image_date
4270 17/11/2011
4999 02/04/2014
Score table:
ID Score_date Score
4270 21/09/2011 30
4270 01/08/2012 29
4270 15/03/2014 27
4999 01/01/2011 24
4999 01/01/2014 20
Desired source and score table:
ID Image_date Nearest_score Months_from_Image
4270 17/11/2011
4999 02/04/2014
Desired distance table:
D Score_date Score Months_from_image
4270 21/09/2011 30
4270 01/08/2012 29
4270 15/03/2014 27
4999 01/01/2011 24
4999 01/01/2014 20
Any help would me most welcome if you have the time?

채택된 답변

Guillaume
Guillaume 2019년 9월 2일
The second output table is easy:
%demo data:
source_table = table([4270;4999], datetime({'17/11/2011';'02/04/2014'}, 'InputFormat', 'dd/MM/yyyy', 'Format', 'dd/MM/yyyy'), 'VariableNames', {'ID', 'Image_date'})
score_table = table([4270;4270;4270;4999;4999], datetime({'21/09/2011';'01/08/2012';'15/03/2014';'01/01/2011';'01/01/2014'}, 'InputFormat', 'dd/MM/yyyy', 'Format', 'dd/MM/yyyy'), [30;29;27;24;20], 'VariableNames', {'ID', 'Score_date', 'Score'})
%actual processing
distance_table = join(score_table, source_table)
distance_table.Months_from_image = between(distance_table.Score_date, distance_table.Image_date)
If you just want just the number of months without the days, then:
distance_table.Months_from_image = calmonths(distance_table.Months_from_image)
However, you lose the number of days (which may be important if two entries for the same ID have the name number of months).
For your first output, there are several ways to do this, either with rowfun, splitapply or groupsummary. First, you need an aggregation function. For splitapply I'd use:
function rowindex = nearest(rows, daysdiff) %using difference in days instead of months
[~, idx] = min(abs(daysdiff));
rowindex = rows(idx);
end
then:
group = findgroups(distance_table.ID);
rows = (1:height(distance_table))';
selectedrows = splitapply(@nearest, rows, days(distance_table.Score_date - distance_table.Image_date), group);
result = distance_table(selectedrows, :)
Note that with datetime, you use between to get a calduration (which can be expressed in months), but normal subtraction to get a duration which can be expressed in days
  댓글 수: 1
AV
AV 2019년 9월 2일
Thank you very much Guillaume - that's really helpful. It worked well, and I managed to scale this up over a much larger dataset quickly. Have a nice day!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 File Operations에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by