Binary, ASCII and Compression Algorithms
조회 수: 5 (최근 30일)
이전 댓글 표시
I have a large number (> 1E6) of ASCII files (myFile.txt) which contain time series data, all in the same format: timestamp, field 1, field 2,...,field 20. Each data entry is one row, tab separated. Each of the fields 2-20 is a double. The timestamp is string (HH:MM:SS.FFF). The files are each c. 5GB in size.
I wish to reduce the hard disk storage required. How can I do this?
My thoughts so far are
1. Convert the files to binary format. How can I do this? Is it by applying dec2bin.m? However this function seems to only take scalars. What would this look like?
2. Compress each file. Each file may be used independently of the others, thus I wish to compress individually. I know that differing approaches to compression work differently for different data structures. Given my data structure above, which is the best one to apply?
Given the importance of this, I would be happy calling other language files from inside matlab (eg C++). Any standard libraries/ third party tools that can be recommended?
3. Any other suggestions?
Finally, an important point is that I wish the user to be able to quickly load and access the data in each file - ie the bin2dec() call must be quick as must be the decompression.
thank you!
댓글 수: 3
José-Luis
2014년 7월 14일
편집: José-Luis
2014년 7월 14일
I would use a database. Which one is mostly down to personal preferences and constraints. I like mysql because it's free.
Depending on how your data looks like, you could use the netcdf: format. It has support to be read/written in Matlab. The same is true for hdf5 . These are sort of lightweight databases though.
IMO, io through a database would be faster than wading through the mountain of files you have, unless you plan on hard-coding file paths. I haven't tested it though so that's not a definite.
답변 (1개)
Star Strider
2014년 7월 11일
I would read them in as text files, save them as ‘.mat’ files (in the default binary format), then delete the text files. Since the ‘.mat’ files have a different suffix/extension, the prefix name can be the same as for the text file. See the documentation for save and load for details.
댓글 수: 2
Star Strider
2014년 7월 11일
편집: Star Strider
2014년 7월 11일
- If you want to access the files from other applications, your best option would be to go with something other than .mat files, since to the best of my knowledge, those are MATLAB-specific. I’m not familiar with the file types Python and R can read and write, so you would need to find a common, space-efficient file format for all three applications.
- Compressing them would help. You probably have to go that route anyway, considering the sizes of the files.
참고 항목
카테고리
Help Center 및 File Exchange에서 HDF5에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!