필터 지우기
필터 지우기

Fast Export Method

조회 수: 13 (최근 30일)
Brian
Brian 2012년 6월 11일
I have always used the export command to export a dataset to a comma delimited text file. When the files get a little larger (50-100MB) the export function seems to run very slow. Are there other functions that are much faster than the export dataset function?
My dataset is simple (just large). Col 1 is text 2:4 are numeric.
MyDS = dataset(MyData(:,1),MyData(:,2),MyData(:,3),MyData(:,4));
export(MyDS,'file','R:\Equity_Quant\BrianB\Factor Rotation\BulkData.txt','Delimiter',',');
Thanks much, Brian
  댓글 수: 3
Walter Roberson
Walter Roberson 2012년 6월 12일
export() is a method of the dataset class.
http://www.mathworks.com/help/toolbox/stats/dataset.export.html
per isakson
per isakson 2012년 6월 12일
Walter, thanks!

댓글을 달려면 로그인하십시오.

채택된 답변

per isakson
per isakson 2012년 6월 12일
The functions save, load and whos is an alternative
save( 'my_datasets.mat', 'MyDS' )
save( 'my_datasets.mat', 'MyDS_2', '-append' )
or even faster
save( 'my_datasets.mat', 'MyDS', '-v6' )
without guarantee. (Dataset does not overload save as far as I can see.)
--- Faster method to export to text file ---
Some data. I use three columns to avoid word wrap.
N = 1e1;
MyData = randn( N, 3 );
The variant with dataset
tic
ds = dataset( MyData(:,1), MyData(:,2), MyData(:,3) );
export( ds, 'file','c:\temp\test_ds.txt', 'Delimiter', ',' )
toc
produces this output
Var1,Var2,Var3
0.490621219889722,-0.64500446971637,1.6174852368957
0.0660146644635964,-0.408559384693175,0.445251540387096
...
A faster variant
tic
ms = permute( MyData, [ 2, 1 ] );
fid = fopen( 'c:\temp\test_fp.txt', 'w' );
fprintf( fid, '%s,%s,%s\n', 'Var1', 'Var2', 'Var3' );
fprintf( fid, '%f,%f,%f\n', ms );
fclose( fid );
toc
produces this output
Var1,Var2,Var3
0.490621,-0.645004,1.617485
0.066015,-0.408559,0.445252
...
I've run these two variants with different values for N. The faster variant is at least an order of magnitude faster.
With N=5e6 on my three years old vanilla desktop I get "Elapsed time is 16.478986 seconds." with the faster variant. That is 8 MB/s - something.
How many decimals do you need in the text file?
.
--- fprintf is hard to exceed ---
I added this test
tic
dlmwrite( 'c:\temp\test_dlm.txt', MyData )
toc
With N=5e6 I got the following elapsed times
  1. fprintf (A faster variant): Elapsed time is 16.507707 seconds.
  2. dlmwrite: Elapsed time is 202.905913 seconds. (without header)
  3. dateset.export: Elapsed time is 819.649789 seconds.
With plain Matlab I don't think there is a faster alternative. Maybe it is possible to do something faster with a MEX-function.
  댓글 수: 3
per isakson
per isakson 2012년 6월 13일
That's correct. I didn't realize the purpose. Maybe it is an option to write directly to the database.
Brian
Brian 2012년 6월 13일
Since the database is on a server separate from my local machine, my job is running some calculations via matlab and exporting the large files to the database server so that I can do a bulk import via the SQL server. Hence the need for exporting the data. Would fprintf be my fastest option for writing to a text file?
Thanks a lot,
Brian

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Import and Export에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by