Large streaming data direct to file
이전 댓글 표시
Hi!,
I would like to setup a system to log months’ worth of financial json websocket data to a file.
- The json data coming in looks like this {"this": "that", "foo": [1,2,3], "bar": ["a", "b", "c"]}, and there is about 20 message per second.
- I did tests with FPRINTF writing directly to a .txt file. That works but the files get really big 2gb per day. Because there is not compression.
- I tested different SAVE formats ( '-v7' being by far the best) to save a new variable inside a .mat file every 10 mins. This was a little too slow to keep up with the stream of data coming in. Taking almost a second to save every 10 mins and it wouldn't be ideal to process it if I have to load a ton of different variables. But the file size looked to be very good. (http://undocumentedmatlab.com/blog/improving-save-performance)
- I tried the MATFILE declaration to write directly to file. But only could adjoin to the end of a file with '-v7.3' .mat files. Which makes the file a lot bigger then ‘-v7’ and still takes a little too long.
- I would like to have a file that uses good compression that I can write a new message to fast. Maybe HDF5 file format.?
I believe I need to serialize the data coming in and save it directly to a file in some kind of compressed way. But I'm not exactly sure how to do that.
- I read through this article and don't get exactly how to implement it. ( https://undocumentedmatlab.com/blog/serializing-deserializing-matlab-data). Since this is older article is there a more up to date way.
- Do I use something like "h5write"? "getByteStreamFromArray"?
- After the file is created with months of data. How do I pull each message, one by one, to process it?
- Is this "Fast serialize/deserialize" in the file exchange the correct path?... I can't figure out how to use it.
Thank you!
Joe
답변 (1개)
You can create the text as chat vector by sprintf instead of fprintf and compress it in the RAM before writing them to disk: https://www.mathworks.com/matlabcentral/fileexchange/69388-mkzip . This should avoid the overhead of compressed MAT files.
Maybe it is just the disk access, which slows down the processing. Then try to use a SSD instead.
댓글 수: 1
Joe Davison
2018년 11월 17일
편집: Joe Davison
2018년 11월 29일
카테고리
도움말 센터 및 File Exchange에서 Text Files에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!