이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
Alternative to csv and parquet for arrays
조회 수: 7 (최근 30일)
이전 댓글 표시
I have pretty massive csv files. It's a pain for both transfer and also in terms of read time.
I have been using parquet but this is only for tables and my functions only work with double arrays. So whenever I load the file, I have to use table2array to create the proper variables. This takes some extra time. Still much better than using csv, but I am wondering if there are any light and efficient alternatives to csv for arrays....
댓글 수: 14
Jan
2022년 3월 13일
The question is not clear. Why do you use a text file to store large data sets? Text files are useful, if they are read and manipulated by human. The conversion of floating point numbers to strings and back to numbers can cause rounding effects. Therefore using a binary format is recommended and much more efficient.
How do you work with parquet in Matlab?
Pelajar UM
2022년 3월 14일
Thanks guys.
@Jan The data is prepared using another program (written in python). It is a mesh data with various values assinged to each one of the element (all numerical values). The MATLAB code is used to visualize this data. The file does not have to be human readable (parquet is not human readable).
Example:
[filename, folder] = uigetfile ({'*parquet'});
if ~ischar(filename); return; end %user cancel
filename = fullfile(folder, filename);
input = parquetread(filename);
app.UITable2.Data = table2array(input);
nodes = [app.UITable2.Data(:,1),app.UITable2.Data(:,2),app.UITable2.Data(:,3)];
elements = [app.UITable2.Data(:,4),app.UITable2.Data(:,5),app.UITable2.Data(:,6),app.UITable2.Data(:,7)];
elements=rmmissing(elements);
TR = triangulation(elements,nodes); %generating triangular mesh
[F,P] = freeBoundary(TR); %extracting the surface
@Walter Roberson It's written only once (in Python), and every time you open a new session of MATLAB, you have to load the data. So I would reading time is more than important than writing time.
Walter Roberson
2022년 3월 14일
I suggest writing it as a binary file, with a header indicating the size. You might want to make it compatible with https://www.mathworks.com/help/matlab/ref/multibandread.html
Walter Roberson
2022년 3월 14일
That hints to me that your data might perhaps only justify single precision but that you are using double precision.
Sarah Gilmore
2022년 3월 14일
Do you mind answering a few questons that may help narrow down the issue?
- Do you know if it's parquetread or table2array that's taking the most time? You can use the performance profiler to determine which lines are causing the issue.
- How wide is the table in the Parquet file. Is it just 7 columns?
- Which version of MATLAB are you running?
Thanks,
Sarah
Pelajar UM
2022년 3월 15일
Hi Sarah
- parquetread takes 0.11 s and table2array 0.034 s. No issues per se and relatively speaking, not long at all. But it is just an extra step and I was wondering if it could be avoided.
- This particular dataset is 23 columns wide with ~114,000 rows.
- R2021a
Walter Roberson
2022년 3월 15일
If the original data is 23 columns but you only need 7 of them, then you can improve reading speed by writing in binary one column at a time with a header indicating how many rows are present; then by knowing the size of the header and the number of rows, you can fseek() directly to the beginning of any particular column. Or in your case, since you are using the first 7 columns, just ask to fread [nrow 7] after you have positioned past the header, leaving the other 16 columns unread.
Walter Roberson
2022년 3월 15일
If those file sizes are a problem, then have the Python write each column into a separate binary file and zip the set of files together. Transfer the zip. Unzip at the destination. open and read only the files corresponding to the columns you want to read.
Pelajar UM
2022년 3월 15일
@Walter Roberson Thanks. I am actually using all 23 columns. That was just a small part of the code.
Walter Roberson
2022년 3월 15일
ok then write the file in binary and zip it to reduce file size for transfer.
Pelajar UM
2022년 3월 15일
Zipping does a wonderful job reducing the size (down to 4 mb), but I don't think MATLAB can unzip the file or otherwise read the zipped file, right?
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Import and Analysis에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)