Hi, I'm trying to import data from a large tab delimited CSV with headers. (0.8 GB)
I don't want to import everything, just a number of specific columns. I.e. I would like to create unique column vectors for:
1. Cells D19:D234568
then
2. Cells F19:F234568
and so on.
Currently I'm doing this one-by-one as, even with 12GB ram I'm running out of memory.
There must be a simple way of doing this quickly, no? Once the vectors are loaded and saved as .mat they load up in seconds.
Cheers,
Alasdair

댓글 수: 3

dpb
dpb 2016년 8월 6일
"Currently I'm doing this one-by-one..."
How are you doing this?
You talk of "Cells D19:D234568"; is the data in a spreadsheet or are you just using the nomenclature to describe the issue?
"_There must be a simple way of doing this quickly, no?"_
Depends on whether the data are in a simple text file or other form. There isn't any particularly efficient way to read a sequential text file by anything other than well, sequentially, no. IF (the proverbial "big if") it were fixed-length and fixed-width columns there are some tricks one can play in reading the file image as character array and selecting subsets, but if it is csv and the fields are variable width, then that doesn't work, either.
You could, of course, process line-by-line using several techniques, probably the best solution would be to use textscan and process the file in sizable "chunks" in a loop, selecting the columns of choice in each block.
If it were a spreadsheet, then xlsread can select ranges; I forget whether it can do disjoint ranges in a single call; it could be done with COM/ActiveX if not.
Hi,
Sorry for the confusion - to clarify, when I view it in the MATLAB import app those are the cells I highlight manually before hitting "Import Selection". It's a tab delimited test file.
"tab delimited CSV with headers. (0.8 GB) .... even with 12GB ram I'm running out of memory" &nbsp I find it hard to believe that a 0.8GB text-file should cause an out of memory error.
Is it numerical, text or mixed data?
Did you try something like this?
frm = '%*s%*s%*s*f%*s%f%*[^\n]';
cac = textscan( fid, frm, (234568-18), 'Headerlines',18, 'Delimiter',\t')

댓글을 달려면 로그인하십시오.

 채택된 답변

dpb
dpb 2016년 8월 7일
편집: dpb 2016년 8월 7일

0 개 추천

"... in ... import app those are the cells ... tab delimited..."
In that case, use textscan and a format string set up to read the desired columns. This isn't particularly difficult to automate depending on the columns wanted...
cols={'D','F'}; % the list of wanted columns
fmt=[]; % empty string to build format string into
for i=1:length(cols) % over the number of columns to read
fmt=[fmt repmat('%*f',1,cols{i}-'A') '%f']; % skip N-1, read 1
end
fmt=[fmt '%*[^\n']; % and then skip rest of line
fid=fopen('filename','r');
data=cell2mat(textscan(fid,fmt,'delimiter','\t', ...
'headerlines', 18, ...
'collectoutput',1)); % and read the file
fid=fopen(fid);
There's a section Large Text Files linked to at the doc for textscan that describes how to read a file in blocks if this still errors out on memory altho if the import tool can do it, the above likely will work as I'd venture it's what it does as a first try, anyway...

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Load Signal Data for Simulation에 대해 자세히 알아보기

질문:

2016년 8월 5일

편집:

dpb
2016년 8월 7일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by