How can I improve MATLAB performance?

조회 수: 1 (최근 30일)
Daniel T
Daniel T 2012년 9월 5일
When I import a large amount of data (using the wizard) it takes a very long time to complete. For example, importing a text file of 3.6 GB takes more than 5 minutes. Then when I try to plot some of that data it takes an additional few minutes. If I try to 'probe' the data using the Data Cursor it, once again, takes minutes for the data value to appear!
I need to reduce this lag from minutes to seconds, as its making the use of MATLAB unbearable.
Relevant Info: Win7 Prof 64-bit i5-2400 cpu (so 3.4GHz with turbo, 4 core, 4 logical processors) 8GB RAM 500GB HDD, 7200 RPM Matlab 2011b
While the data was being imported I noticed that 25% of the CPU was being used on average (one core maxed), just over 4GB of RAM used, and 5-10 MB/s HDD read.
What do I need to do to improve performance? Is there an option i'm unaware of inside Matlab? Should I buy a SSD? Perhaps upgrade to 16GB?

답변 (1개)

per isakson
per isakson 2012년 9월 5일
편집: per isakson 2012년 9월 5일
Reading the text files
  1. Windows Task Manager and Resource Monitor. Does "Physical Memory, Free" decrease to zero during during reading? The Windows file cache is hard to understand - IMO.
  2. I guess (based on googling) that a SSD will improve reading speed by something like three. However, that does not apply to your text files - cpu bound.
  3. 16GB. It is a serial read, thus it is no problem that old "chunks" are remove from the file cache. I guess it won't help much for reading.
What does your use case look like?
Spontaneously, I would say:
  1. Transfer the data of the text file to a binary file in an unattended batch job.
  2. Make a test with HDF5 and the high level API. The low level API is too low.
  3. Make a test with save the data to a version 7.3 mat-file and access it with the function, matfile.
  4. SQLite might be an alternative, see http://www.sqlite.org/ and http://mksqlite.berlios.de/.
  5. Reading binary files from an SSD will be fast
However, I don't know what kind of data you have.
You say "Then when I try to plot some of that data it takes an additional few minutes." How many points do you try to plot. Which plotting function do you use? What does the function, profile, say?
Data Cursor shouldn't take that long. How many points do you have?
.
Conclusions:
  1. Convert the text files to binary
  2. Study your code with the function, profile
.
--- In response to comments 1 to 5 ---
[...] I'm partially too lazy to do so.
In my comments below I'll try to remember that you want to focus on your domain data rather than clever Matlab code.
I collect 5 values (time, voltage, etc) every 20ms for 10 days, so approximately 5 million points.
  1. I still think you should convert the text files to binary. With 5 long time series fwrite/fread is an alternative. SQLite is not appropriate and HDF5 might be overdoing. How many such 3.6GB files will you have during the next year? If many, will you need to revisit old data sets?
  2. I bet nobody at The Mathworks ever thought anybody would try to plot time series with 5e6 elements:). Thousands of elements per pixel on the screen - nobody would want to do that. However, several years ago I made a tool to "browse" time series data. With typically 5e4 elements I had real problems with the response times. My solution in short: i) use the graphic functions to show a "time-window", ii) keep the full data outside the handle graphic objects, iii) update the data on display with set( line_handle, 'Xdata', X(time_window), ... ). This approach works very well and I still use my Databrowser. I don't know if the difference is as large with recent releases of Matlab. As a side effect Datacursor will become quicker. Con: it certainly takes some effort to develop a tool.
  3. Second thought: would the accuracy suffer if you down-sample the signals? "filtfilt, Zero-phase digital filtering" or something in an unattended batch job. Why not: read text, down-sample, save some different versions of the data to binary files?
but the Data Cursor (and zoom, highlight, etc) delay is not something I can allow to endure.
  1. I don't think the standard Zoom, Datacursor, etc. can handle your use case.
  2. In my Databrowser I have "replaced" Zoom by changing "time_window". I also have a search function, which allows me to jump to the next "time_window" in which some condition is true (e.g. find_peak).
upgrading the video/graphics
  1. Ten years ago the graphic card did certainly matter even for 2D.
  2. Does Matlab throw "5 million points" at the graphic card? Sounds scary, but I have no idea.
used the Data Cursor I saw the "Physical Memory Free" drop to nearly zero than jumped to 8GB
  1. That tells me the Datacursor is not design to work on this kind time series.
  2. Did the graph in Physical Memory Usage History increase gradually and then drop quickly? Speculation: If so does that mean that Datacursor creates and releases a temporary variable of that size?
  3. I once customized Datacursor to work on one year of hourly data. I never succeeded to make that responsive.
Star Strider: "use the findpeaks function to identify peaks"
  1. This is a good point
  2. In the File Exchange there are a number of find_peak_contributions, which might be alternatives to signal toolbox. Some of them comprise a GUI and plotting.
  3. However, one should not forget to inspect the data
  댓글 수: 6
Star Strider
Star Strider 2012년 9월 5일
If you have the Signal Processing Toolbox, use the findpeaks function to identify peaks and, with another line or two of code, the valleys as well. The function has a number of options to deal with noisy data and other constraints.
per isakson
per isakson 2012년 9월 5일
편집: per isakson 2012년 9월 5일
Bump: See my response above.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Time Series Events에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by