필터 지우기
필터 지우기

Reading all data of streams from a adft data file with large ItemCount is very slow.

조회 수: 1 (최근 30일)
Hello,
Basically I need to read the streams data from ADFT DAT file and perform some preprocessing related to coordinates frames transformations. For this I am trying to read all data from the paricular selected stream which has an ItemCount of 18000 and trying to store in a csv file. The read(streamData) itself takes around 10-12 mins and even more if the stream has more structered data.
Can someone suggest a way which can allow me to make this reading process faster.

채택된 답변

Shubham
Shubham 2024년 2월 22일
Hi Shikha,
It seems that you were trying to read stream data from ADFT DAT file. You can try speedup the reading process by chunking the input data and leveraging the parallel computing toolbox for reducing the time taken to read the data.
You can read data using “adftFileReader” in chunks using the “select” function while providing a time range or index range as arguments. For more information, please refer to the following documentation: https://www.mathworks.com/help/driving/ug/read-data-from-adtf-dat-files.html#ReadDataFromADTFDATFilesExample-7
You can try testing it out using the following example as well:
openExample('driving/ExtractVideoStreamFromADTFDATFileExample');
Once you create chunks of the file, you can read it parallelly. Here is a simple example for reading a file using parfor:
% Create dummy data and write to a file
data = (1:1e8)';
lines = length(data);
% Uncomment the following lines when creating the dummy data for the first time
% fileID = fopen('dummy_data.txt', 'w');
% fprintf(fileID, '%d\n', data);
% fclose(fileID);
% MATLAB code to read data from a file in parallel and store in an ordered array
if isempty(gcp('nocreate'))
parpool;
end
% Define the number of workers
numWorkers = 6;
chunks = 10;
chunkSize = ceil(lines / chunks);
% Preallocate a cell array to hold the data for each chunk
dataCellArray = cell(chunks, 1);
% Read the file in parallel using parfor
parfor (curChunk = 1:chunks, numWorkers)
startLine = (curChunk - 1) * chunkSize + 1;
endLine = min(curChunk * chunkSize, lines);
dataCellArray{curChunk} = readChunk(startLine, endLine, 'dummy_data.txt');
disp("done for ");
disp(curChunk);
end
% Concatenate the data from each worker to form the complete array
dataArray = vertcat(dataCellArray{:});
da = vertcat(dataArray{:})
function dataChunk = readChunk(startLine, endLine, filename)
fileID = fopen(filename, 'r');
dataChunk = textscan(fileID, '%d', endLine-startLine+1, 'HeaderLines', startLine-1);
fclose(fileID);
end
You can modify the above code snippet to work for “adftFileReader” as well. Please refer to the following code snippet:
numWorkers = 6;
chunkSize = 10;
numChunks = itemcount/chunkSize;
dataCellArray = cell(numChunks, 1);
parfor (curChunk = 1:numChunks, numWorkers)
startIndex = (curChunk-1)*chunkSize+1;
endIndex = min(startIndex+chunkSize-1,itemcount);
dataCellArray{curChunk} = readChunk(startIndex, endIndex);
disp("done for ");
disp(curChunk);
end
dataArray = vertcat(dataCellArray{:})
function dataChunk = readChunk(startIndex, endIndex)
dataFolder = fullfile(tempdir, 'adtf-video', filesep);
datFileName = fullfile(dataFolder,"sample_can_video.dat");
file_reader = adtfFileReader(datFileName);
stream_index = 2;
stream_reader = select(file_reader, stream_index, IndexRange=[startIndex endIndex]);
dataChunk = read(stream_reader);
end
I have tested the code snippet on the example mentioned above. I have created chunks containing 10 frames (total 149 frames are present) and here is a glimpse of result stored in “dataArray”:
The first 10 frames of the video are stored as:
I would suggest to profile your code and perform the tasks asynchronously.
I hope this helps!
  댓글 수: 3
Shubham
Shubham 2024년 2월 22일
Hi Shikha,
The runtime should still be reduced upon reading a stream of structure of structures when using multiple workers. However if you think you still require additional help, you can start a new thread along with your data files and code snippets.
Thanks
Shikha
Shikha 2024년 2월 22일
Hi Shubham,
You are right about the relative increase in runtime using multiple workers. Also, I would surely open a new thread if I require more help.
Thanks a lot for your help!!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Startup and Shutdown에 대해 자세히 알아보기

제품


릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by