How do I import Velocity 3.2.0 CSV DVH data into MATLAB 9.1 (R2016b)?

Question

Daniel Bridges 2017년 1월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/319120-how-do-i-import-velocity-3-2-0-csv-dvh-data-into-matlab-9-1-r2016b

댓글: Walter Roberson 2017년 1월 7일

sample.csv

How do I import radiation oncology software Velocity 3.2.0's dose-volume histogram (DVH) data in a comma-separated value file (CSV, sample file attached) into MATLAB 9.1 (R2016b)? Using Velocity one can create DVH data for multiple tissues displayed in a single graph, and export this data as a sequential two-column CSV.

csvread requires that "the file must contain only numeric values", whereas the CSV is two columns of data sets that begin with header text and end with an empty row.

It appears that for this reason a simple execution of importdata is insufficient, because the command terminates after the importing only the first data set:

   test = importdata('filename.csv');
test = 
struct with fields:
        data: [1024×2 double]
    textdata: {2×1 cell}

whereas the file actually contains additional data sets (e.g. copying from row 1026):

   58.1704  0.00692086
  
   Prostate  
   GY   (CC)
   55.2304  0.0046139
   55.2333  0.00230695

What do we use to import data in CSV that is formatted as follows? (The following describes what is seen using Excel 2016.)

header text in Column 1
header text in Columns 1 and 2
numerical data in columns 1 and 2 in multiple rows
empty row
(repeat for next data set for multiple data sets of various length)

Walter Roberson requested a sample data file and provided a solution below using fopen, fgetl, feof, and textscan.

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Daniel Bridges 2017년 1월 5일

편집: Daniel Bridges 2017년 1월 5일

I am now seeking to answer this question.

It seems a counterproductive workaround to import the entire file into a string and then write a script to parse its contents. Or to put it another way, I expect MathWorks to have a more eloquent solution already prepared that I merely need to find.

One workaround is for Velocity: Instead of creating the "full" multiple-tissue DVH one wishes to export, one must save to multiple files a separate DVH for each organ of interest, so that there is only one data set per CSV. This is not ideal, but it seems faster than continuing to search for additional ideas.

Edit: Walter, I thought it was not uncommon for data to be written sequentially (i.e. appended end-to-end); old magnetic tape comes to mind. Because I thought MathWorks had prepared for common data files, I thought there was a command or option I was simply unaware of. I am sorry if this expectation was incorrect, but I don't see why it was unrealistic. I have attached a sample data file to the original post.

Walter Roberson 2017년 1월 5일

Velocity appears to be from Varian. Varian advertises,

https://www.varian.com/oncology/products/software/image-management-informatics/velocity?cat=store

"Velocity provides a vendor-neutral platform that integrates image, structure, plan and dose data to create a unified patient dataset." Unfortunately their documentation is a bit sparse as to what that format is. Except they mention DICOM, and they mention RT Plan software. Someone has written software to read DICOM RT Plan data in MATLAB; see https://github.com/ulrikls/dicomrt2matlab

It sounds like your data is not DICOM based.

As I poke around, the information I am finding about DVH suggests that the most common formats are not what you are describing your file as having. But it is difficult to tell, as you have not given an example file.

Walter Roberson 2017년 1월 5일

There are millions of file formats. People invent their own more often than they use standard formats, and they modify the file format over time, often without considering backwards capability. There is no practical way for Mathworks to already support them all.

Mag tape was always written in records, often fixed length binary records. Variable length records did exist but when it came time to start a new data structure, typically a new record was written. Not inevitably though: packing multiple structures into one tape record did happen. Remember though that memory was typically not large and a complete record at a time has to be read in for mag tape (no positioning by bytes), so the variable length records did not pack long continuous streams in like became common on disc files.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2017년 1월 5일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/319120-how-do-i-import-velocity-3-2-0-csv-dvh-data-into-matlab-9-1-r2016b#answer_249448

MATLAB Online에서 열기

There is no pre-written Mathworks routine to read that file format. It is however not difficult to write coode for it.

   num = 0;
   fid = fopen('sample.csv','rt');
   while true 
     H1 = fgetl(fid) ;
     if feof(fid); break; end 
     H2 = fgetl(fid) ;
     if feof(fid); break; end 
     datacell = textscan(fid, '%f%f', 'delimiter', ',', 'combineoutput', true) ;
     if isempty(datacell) || isempty(datacell{1}); break; end 
     num = num + 1;
     headers(num) = {H1, H2} ;
     data(num) = datacell;
     fgetl(fid);  %the empty line between organs
   end

This will create two cell arrays, one of headers and the other of corresponding numeric values. You might want to do some processing on H1 (organ name) and H2 (not sure what that line is for) before storing that information.

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

Daniel Bridges 2017년 1월 6일

편집: Daniel Bridges 2017년 1월 6일

MATLAB Online에서 열기

The headers line causes the error:

headers(num) = {H1,H2};

It is fixed by allowing for columns, enabling the creation of a 3x2 cell array in this case:

headers(num,:) = {H1,H2};

To get the headers to read correctly, I've had to omit the last line:

fgetl(fid); %the empty line between organs

This command was actually skipping the first header of the next data section, causing the first row of data to be stored as the second header. With it removed, the headers are stored correctly, but the empty row is being stored at the end of the numerical data as NaN in each column.

I'd like to accept this answer once I can remove the NaN from the end of the imported data. I've been writing a script to plot the data, and while the NaN may not negatively affect it since it's at the end of the vectors, for the sake of propriety it seems better to remove it.

I plan to return to this problem in about 10 hours, and try to post a solution myself unless someone does so first.

Daniel Bridges 2017년 1월 7일

편집: Daniel Bridges 2017년 1월 7일

MATLAB Online에서 열기

Is it not more legible and memory-efficient to put it immediately after textscan's cell array creation?

     datacell = textscan(fid,'%f%f','delimiter',',','collectoutput',true); 
     if isempty(datacell) || isempty(datacell{1}); break; end 
     if any(isnan(datacell{1}(end,:))); datacell{1}(end,:) = []; end

Walter Roberson 2017년 1월 7일

No, it is the same efficiency. But it certainly does not hurt to have it closer to where datacell is created.

댓글을 달려면 로그인하십시오.

How do I import Velocity 3.2.0 CSV DVH data into MATLAB 9.1 (R2016b)?

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

How do I import Velocity 3.2.0 CSV DVH data into MATLAB 9.1 (R2016b)?

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 7 이전 댓글 5개 표시이전 댓글 5개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기