How to deal with the matrix with the size of 5000000*8760

조회 수: 1 (최근 30일)
Chaoyang Jiang
Chaoyang Jiang 2018년 3월 30일
편집: Chaoyang Jiang 2018년 3월 31일
How to deal with the matrix with the size of 5000000*8760? 5000000 here is the vehicle number and 8760 represent the hourly charging status of the vehicle for one year (8760 hours). The non-zero elements account for 10%, so I tried sparse matrix. However, it doesn't work. I have no idea how to generate, save and load so big matrix. Thank you!
% 1 generate weekly charge status
chargestatusweek=zeros(5000000,168,'single');
% 2 generate yearly charge status from weekly matrix
chargestatusyear=zeros(5000000,8760,'single');
chargestatusyear(:,1:8736)=repmat(chargestatusweek,[1,52]);
chargestatusyear(:,8737:end)=chargestatusweek(:,1:24);
% 3 save this matrix and load it in another code
% 4 scan 8760 hour to update the charging status of chargestatusyear
for t=1:8760
chargestatusyear(4,13)=0;
chargestatusyear(344,2300)=0;
chargestatusyear(3459,5600)=0;
...
end
I have tried to only use the weekly data (168 hours) and covert all other hours to the first week. However, I just found this operation will make the update the same for t=169 and t=337, as this two time will be translated to the 1st hour for the 1st week. But the correct update should be different for t=169 and t=337. That is why I am now finding ways to generate yearly data.
  댓글 수: 2
James Tursa
James Tursa 2018년 3월 31일
Can you show us some of your code, and tell us how you are getting the data into this variable and what your downstream processing of this data will be?
Chaoyang Jiang
Chaoyang Jiang 2018년 3월 31일
I have edited my question accordingly. Thank you!

댓글을 달려면 로그인하십시오.

답변 (1개)

John D'Errico
John D'Errico 2018년 3월 31일
편집: John D'Errico 2018년 3월 31일
I still do not see you say what you are doing with the matrix. "downstream processing" is not sufficient information.
There may be good reasons why you need it as a matrix. Or perhaps there are not.
Remember that this matrix is huge. Even in single precision, it will require something like
5000000*8760*4/2^30
ans =
163.17
163 GIGABYTES of memory to store that matrix.
Even if you store it in sparse form, and it is 90% zero, sparse is not supported for single precision. (At least not in R2017b. I need to download R2018a, but the release notes to R2018a do not indicate support for single sparse arrays.) Therefore you would need to store the matrix as a sparse double precision array.
The memory required for a sparse double of that size would still be on the order of 31 gigabytes of RAM. In order to use it in any way, depending on what you would do with it, MATLAB might even be forced to make copies of the array. While that might be possible, you would need a lot of RAM, and a fast hard drive. A SSD drive would be useful, because your computer will be doing a lot of memory shuffling.
Next, while you said that you TRIED a sparse matrix, we don't know how you tried to create that sparse matrix. My guess is you did not use sparse correctly, nor did you create the matrix properly. No matter what, it will require a LOT of memory just to create the list of non-zero elements, and their positions in that final sparse matrix. Then to make the matrix itself, you will create a copy of all that information. So you will end up needing something on the order of 60 to 80 GIGABYTES of RAM to create the sparse matrix. Again, a lot of memory.
You might want to read this link carefully:
https://www.mathworks.com/help/matlab/matlab_prog/strategies-for-efficient-use-of-memory.html
In the end, I would suggest that you are trying to process too large an amount of data at once for the capabilities that you have, both in terms of the memory management skills you have, and in terms of what your current computer is capable of storing. Just because you were able to process weekly data like this does not mean that you should jump to now processing yearly data all at once. Of course, even if that was easily done, then you might decide to get good accuracy, what you really needed to do was to process 5 or 10 years of data at a time. This is how things work. I need MORE DATA is the common refrain. But can you work more efficiently instead?
So I would strongly suggest that you consider reformulating how you process things.
Perhaps you might generate the array in blocks. For example, you could generate blocks that are 4 weeks in size, saving them out to disk. Save as many such blocks of data as you wish in separate files. Then read them in as you need them, replacing the previous block of data in current memory. Yes, this will require fast disk access speeds, so a large SSD drive will be useful.
Perhaps a better way to store this data would be to use a DATASTORE.
https://www.mathworks.com/help/matlab/datastore.html https://www.mathworks.com/help/matlab/import_export/what-is-a-datastore.html
This will help MATLAB to do some of the memory management work for you. Again, I don't know what you will do with this array after you create it, as you never told us that.
  댓글 수: 2
Walter Roberson
Walter Roberson 2018년 3월 31일
You might be able to use tall arrays. But not with those repmat the way they are, I suspect.
Chaoyang Jiang
Chaoyang Jiang 2018년 3월 31일
편집: Chaoyang Jiang 2018년 3월 31일
Thank you very much for your answer. The way I generate the sparse matrix is:
chargestatusweeknew=sparse(chargestatusweek).
Then the memory of new chargestatusweeknew is larger than using int8 without sparse operation.
For using the datastore/tall arrays, do I still need to generate the chargestatusyear=zeros(5000000,8760,'single') data? As I read the help documents and found that basically, you need to have a .csv(or text) file before using datastore. So I am wondering how to save the big chargestatusyear.mat file before using the datastore.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Target Language Compiler에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by