Reading the text files
- Windows Task Manager and Resource Monitor. Does "Physical Memory, Free" decrease to zero during during reading? The Windows file cache is hard to understand - IMO.
- I guess (based on googling) that a SSD will improve reading speed by something like three. However, that does not apply to your text files - cpu bound.
- 16GB. It is a serial read, thus it is no problem that old "chunks" are remove from the file cache. I guess it won't help much for reading.
What does your use case look like?
Spontaneously, I would say:
- Transfer the data of the text file to a binary file in an unattended batch job.
- Make a test with HDF5 and the high level API. The low level API is too low.
- Make a test with save the data to a version 7.3 mat-file and access it with the function, matfile.
- SQLite might be an alternative, see http://www.sqlite.org/ and http://mksqlite.berlios.de/.
- Reading binary files from an SSD will be fast
However, I don't know what kind of data you have.
You say "Then when I try to plot some of that data it takes an additional few minutes." How many points do you try to plot. Which plotting function do you use? What does the function, profile, say?
Data Cursor shouldn't take that long. How many points do you have?
.
Conclusions:
- Convert the text files to binary
- Study your code with the function, profile
.
--- In response to comments 1 to 5 ---
[...] I'm partially too lazy to do so.
In my comments below I'll try to remember that you want to focus on your domain data rather than clever Matlab code.
I collect 5 values (time, voltage, etc) every 20ms for 10 days, so approximately 5 million points.
- I still think you should convert the text files to binary. With 5 long time series fwrite/fread is an alternative. SQLite is not appropriate and HDF5 might be overdoing. How many such 3.6GB files will you have during the next year? If many, will you need to revisit old data sets?
- I bet nobody at The Mathworks ever thought anybody would try to plot time series with 5e6 elements:). Thousands of elements per pixel on the screen - nobody would want to do that. However, several years ago I made a tool to "browse" time series data. With typically 5e4 elements I had real problems with the response times. My solution in short: i) use the graphic functions to show a "time-window", ii) keep the full data outside the handle graphic objects, iii) update the data on display with set( line_handle, 'Xdata', X(time_window), ... ). This approach works very well and I still use my Databrowser. I don't know if the difference is as large with recent releases of Matlab. As a side effect Datacursor will become quicker. Con: it certainly takes some effort to develop a tool.
- Second thought: would the accuracy suffer if you down-sample the signals? "filtfilt, Zero-phase digital filtering" or something in an unattended batch job. Why not: read text, down-sample, save some different versions of the data to binary files?
but the Data Cursor (and zoom, highlight, etc) delay is not something I can allow to endure.
- I don't think the standard Zoom, Datacursor, etc. can handle your use case.
- In my Databrowser I have "replaced" Zoom by changing "time_window". I also have a search function, which allows me to jump to the next "time_window" in which some condition is true (e.g. find_peak).
upgrading the video/graphics
- Ten years ago the graphic card did certainly matter even for 2D.
- Does Matlab throw "5 million points" at the graphic card? Sounds scary, but I have no idea.
used the Data Cursor I saw the "Physical Memory Free" drop to nearly zero than jumped to 8GB
- That tells me the Datacursor is not design to work on this kind time series.
- Did the graph in Physical Memory Usage History increase gradually and then drop quickly? Speculation: If so does that mean that Datacursor creates and releases a temporary variable of that size?
- I once customized Datacursor to work on one year of hourly data. I never succeeded to make that responsive.
Star Strider: "use the findpeaks function to identify peaks"
- This is a good point
- In the File Exchange there are a number of find_peak_contributions, which might be alternatives to signal toolbox. Some of them comprise a GUI and plotting.
- However, one should not forget to inspect the data