how can I append a Parquet file?

조회 수: 12 (최근 30일)
8eodosis
8eodosis 2020년 5월 13일
답변: Kevin Gurney 2020년 9월 10일
Hello,
I have a Parquet file that I wish to append. I looked at the documentation of parquetwrite but doesnt provide any info on appending. It looks like this was an option in the old interface setting the option 'AppendData' to true:

채택된 답변

Kevin Gurney
Kevin Gurney 2020년 9월 10일
The version of parquetwrite introduced in R2019a does not currently support appending to preexisting Parquet files on disk.
The "AppendData" name-value pair that you referenced in the Parquet Support Package does not append to a preexisting file, but rather incrementally writes chunks of data to an open Parquet file output stream.
The Support Package uses a "stateful Writer object" in conjunction with multiple write() calls to achieve this. The Parquet file output stream is closed when a call to finish() is made.
There is currently no equivalent ParquetWriter object shipping in MATLAB.
----------
An alternative workflow to appending chunks of data to a preexisting Parquet file, would be to write out new Parquet files and then "emulate" the behavior of having one contiguous Parquet file using parquetDatastore.
If you write multiple Parquet files to disk in sequence (one for each chunk), which have consecutive numeric suffixes (e.g. data_01.parquet, data_02.parquet, ..., data_0N.parquet), you can use parquetDatastore to order these files as though they were one contiguous Parquet file. With this approach, you can call readall(parquetDatastore) to read the entire sequence of Parquet file "chunks" in one function call.
An example:
% Assuming the current directory contains data_01.parquet, data_02.parquet, ..., data_0N.parquet.
>> data = readall(parquetDatastore("data*.parquet"));

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Import and Analysis에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by