How to dynamically name a large number of tall tables?

Question

Hayley Rogers 2022년 11월 1일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1840983-how-to-dynamically-name-a-large-number-of-tall-tables

편집: Hayley Rogers 2022년 11월 2일

Before anyone jumps in and says it, I've read through all the responses that say why you shouldn't dynamically name tables in Matlab.

I have about 2,000 csv files, each with over 1m records, and I need to break out the records from each file into individuals tables based on their categorical name in a column of each csv file (there are about 6,000 different categorical names). I would just like to name the table based on the categorical name.

If I try breaking them up into structure fields, the structure quickly becomes overloaded, and it literally takes hours to load / save the structure, and ages to work with the structure, which is not realistic when I really just want to work with each category at a time.

I wanted to try to use tall tables, because once I organize these files, some tables should have a couple million rows, while others may have a couple thousand.

Please let me know if you have any suggestions on efficiently breaking up the data into individual tables and dynamically naming those tables, and I'm going to ignore any comments about how that route is just for beginners that don't know anything about coding.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

David Hill 2022년 11월 1일

Are all the names of the csv files similar? Provide and example. Are you just using readtable() to import the csv files?

Hayley Rogers 2022년 11월 1일

options_20221031 Sample.csv

Yes, the csv files are all similar, and I've written code to import them into Matlab, but then I just need a way to break them up by ticker name.

Specifically, I have raw option data over the last 20 years, and I attached a sample of the csv field that includes just three underlying symbols and is just an end-of-day snapshot for 10/31/2022. However, each of these daily snapshots include options data from almost 6,000 tickers, and I have one of these daily raw files going back to 2000.

I want to create individual tables with the option history for each ticker, and name the table by the Underlying Symbol name.

This is also why I was thinking I would need to use tall tables, because like the raw option history for the SPY will be insanely big going back over 20 years....

Thank you!!

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

David Hill 2022년 11월 1일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1840983-how-to-dynamically-name-a-large-number-of-tall-tables#answer_1088978

MATLAB Online에서 열기

This should read all tables one at a time, combine the same categorical names, and store new tables as the categorical names.

d=dir('*.csv');%assuming all files are together in the same folder with nothing else
s=size(d,1);
for k=1:s
    r=readtable(d(k).name);
    catnams=r.Properties.VariableNames;
    for m=1:length(catnams)
        try
           R=readtable([catnams{m},'.csv']);
           R=[R(:,1);r(:,m)];
           writetable(R,[catnams{m},'csv']);%not sure if you want to save as .mat files or .csv
        catch
           R=r(:,m);
           writetable(R,[catnams{m},'csv']);%not sure if you want to save as .mat files or .csv
        end
    end
end

댓글 수: 2
없음 표시없음 숨기기

Hayley Rogers 2022년 11월 1일

Thank you for this!! I tried this out on my similar daily stock data (much smaller) as a test, and it looks like your code turned raw files like this:

Into individual CSV files like this:

So the code created individual files for each column vector, aggregated for all symbols. I want something similar, but like aggregate all stock data (and eventually options data once I work out the code) for each symbol across all dates. So like I would have one file called A.csv and another called AA.csv, that would have all fields above (symbol, date, high, low, open, close, adjusted close, volume) for all dates, for a single ticker.

How do I modify your code to accomplish that?

Hayley Rogers 2022년 11월 2일

편집: Hayley Rogers 2022년 11월 2일

@David Hill - I rewrote your code and this worked for what I needed! Thank you!!!

cd 'D:\Historical Options and Stock Data\CSV\Stock'

opts1 = delimitedTextImportOptions("NumVariables", 8);

opts1.DataLines = [2, Inf];

opts1.Delimiter = ",";

opts1.VariableNames = ["UnderlyingSymbol", "DataDate", "Open", "High", "Low", "Close", "Volume", "Adj_Close"];

opts1.VariableTypes = ["categorical", "datetime", "single", "single", "single", "single", "single", "single"];

opts1.ExtraColumnsRule = "ignore";

opts1.EmptyLineRule = "read";

opts1 = setvaropts(opts1, "UnderlyingSymbol", "EmptyFieldRule", "auto");

opts1 = setvaropts(opts1, "DataDate", "InputFormat", "MM/dd/yyyy");

stock_folder=dir('*.csv');%assuming all files are together in the same folder with nothing else

stock_folder_size=size(stock_folder,1);

for k=1:stock_folder_size

tic

table_daily=readtable(stock_folder(k).name,opts1);

catnames= unique(table_daily(:,'UnderlyingSymbol'),'sorted');

parfor m=1:height(catnames)

table_symbol_rows = (table_daily.UnderlyingSymbol == catnames{m,1});

table_symbol_new = table_daily(table_symbol_rows,:);

filename_symbol = sprintf('%s_raw_stock.csv',catnames{m,1});

filename_symbol_mat = sprintf('%s_raw_stock',catnames{m,1});

try

table_symbol=OVM_Read_Table(filename_symbol,opts1);

table_symbol=vertcat(table_symbol,table_symbol_new);

OVM_Write_Table(table_symbol,filename_symbol);

catch

table_symbol=table_symbol_new;

OVM_Write_Table(table_symbol,filename_symbol);

end

toc

end

댓글을 달려면 로그인하십시오.

How to dynamically name a large number of tall tables?

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

How to dynamically name a large number of tall tables?

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 2
없음 표시없음 숨기기