Populating matrix with data from a table

조회 수: 5 (최근 30일)
Mihai Milea
Mihai Milea 2020년 11월 27일
댓글: dpb 2020년 11월 30일
Hello .Sorry for this questions but I am not very familiar with the table function.I am attaching a csv file .I am reading this file which basically contains some stock prices for some days .I am trying to populate a matrix with the data .For some reason it looks really cumbersome to use the table function but it might be my stupidity.Before I was using textread,except I do not know in general how many names I have so is hard (or impossible) to do that.I have read the documentation for table but nothing seems to work!
so I do something like this
opts=detectImportOptions('ii.csv');
opts.VariableNamingRule = "preserve";
P=readtable('ii.csv',opts);
nDates=length(P(:,1);
nBlIds=length(P(1,:);
closes=nan*ones(nDates,nBlIds);
% here I write the date in different way
trDates=P(:,1);tradingDates=table2array(trDates);
formatOut='yyyymmdd';
Dates=str2num(datestr(tradingDates,formatOut));
No clue how to populate the matrix using P.... All my attempts failed .Maybe is hard maybe is not but if not so obvious I fail to understand the purpose of this function if you cannot create a simple numerical matrix .Maybe useful for complicated files with weird types of variables .But in this case I just need to put the data(numerical) in some matrix .In Perl or Python this is a breeze ,unfortunately I need to use it with Matlab in this context .Any suggestion ,maybe using some other function will save me a lot of time
  댓글 수: 9
Mihai Milea
Mihai Milea 2020년 11월 29일
편집: dpb 2020년 11월 30일
Thank you so much for the answers .I did this way ,not very elegant but suggested by these answers
Table=readtable('ii.csv',opts);
nDates=size(Table,1);
nIds=size(Table,2);
%% Convert to doubles
for i =1:nDates
for j=1:nIds
if isa(Table{i,j},'double')
closes(i,j)=Table{i,j};
end;
end;
end;
dpb
dpb 2020년 11월 30일
Almost certainly didn't need to have done that.
You don't show what your next step is going to be, but to duplicate all the data you already have and to do so in a tight double loop is about as inefficient a use of MATLAB as is possible.

댓글을 달려면 로그인하십시오.

채택된 답변

dpb
dpb 2020년 11월 28일
편집: dpb 2020년 11월 28일
Since have expanded somewhat, will add additional Answer to bring to forefront -- and illustrate couple other things...
Above we ended up with something like --
opt=detectImportOptions('ii.csv'); % create base import options object
opt.VariableTypes=strrep(opt.VariableTypes,'char','double'); % all are double except time
opt.PreserveVariableNames='true'; % and keep original symbol/name
opt.VariableNames(1)={'Date'}; % fix Date column name
opt=setvaropts(opt,'Date',"DatetimeFormat",'dd-MMM-uuuu'); % and a more concise display format
Now, since it is a set of time data, use the timetable object instead of just the straight table -- this gets you the date info essentially "for free" in conjunction with the data selection.
ttII=readtimetable('ii.csv',opt); % 't' or 'tt' prefix to remind of class
Now you can do addressing however wish for whatever purpose -- remember that "()" addressing on the table will return another table but "{}" or "dot" addressing the data.
With a plain table, though, the Date and data aren't the same type/class so you can't concatenate them--that's the advantage of the timetable; the date comes along for the ride...
So you can do things like--
>> ix=randperm(width(ttII),10) % pick a random set of columns/symbols
ix =
512.00 1168.00 899.00 177.00 286.00 463.00 739.00 975.00 159.00 734.00
>> format bank, format compact % since are stock prices; more legible display format
>> ttII(:,ix) % retrieve those -- note result is another table w/ Date
ans =
3×10 timetable
Date AGGLN EMGSNO SONIPL CBGLN ROGSW JPRLN FWBIM PFCLN NG_LN ATLIM
___________ _____ ______ ______ _____ ______ _____ _____ _____ _____ _____
14-Feb-2000 0.50 NaN NaN 4.25 103.46 25.31 NaN NaN 1.69 3.22
15-Feb-2000 0.50 NaN NaN 4.15 102.14 25.55 NaN NaN 1.66 3.24
16-Feb-2000 0.49 NaN NaN 4.05 102.62 25.63 NaN NaN 1.72 3.33
>> ttII{:,ix} % retrieve as array -- free of the date
ans =
0.50 NaN NaN 4.25 103.46 25.31 NaN NaN 1.69 3.22
0.50 NaN NaN 4.15 102.14 25.55 NaN NaN 1.66 3.24
0.49 NaN NaN 4.05 102.62 25.63 NaN NaN 1.72 3.33
>>
Read up on tables and timetables -- there is a lot of processing power packaged to go with them -- particularly varfun, rowfun and friends as well as time selection with timerange
Don't sell MATLAB short...takes some time and effort to learn about syntax, but it's extremely powerful when do.
ADDENDUM:
However, you'll undoubtedly run into memory and perhaps performance issues if you try to load years of data for this many variables. I suggest to make the selection of variables needed at the time you read the data and only read in those really need.
You can do this by manipulating the 'SelectedVariableNames' property in the import options object. You could have ended up with the above table by
opt.SelectedVariableNames=opt.SelectedVariableNames(ix);
to have gotten the randomized selection or by any other chosen pattern.
  댓글 수: 1
dpb
dpb 2020년 11월 29일
" the Date and data aren't the same type/class so you can't concatenate them-"
Which is the thing then that for the most part you really won't want or need to extract the data into an array, anyway--use the tools for tables instead.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

dpb
dpb 2020년 11월 27일
WOWSERS!!! You've got something like 1500 variables!!!??? That's definitely hard to deal with by variable name no matter what the language.
But, it's trivial to get the data from the table--
tII=readtable('ii.csv'); % read the file
tII.Properties.VariableNames(1)={'Date'}; % make convenient date variable name
After this, simply
>> tII.SXXP
ans =
232.2856
227.4823
230.4354
>>
Returns the data from the first variable; not complicated at all.
You can also defeference by subscript; note that in that case use {} instead of () to return the data as native type; the parentheses will return a table object containing the referenced variable(s).
>> tII{:,2}
ans =
232.2856
227.4823
230.4354
>>
Given the time nature of the data, I'd suggest probably a timetable is more appropriate.
See the doc for how to reference data from table object -- there's a veritable plethora of options available depending on need--see <Access-data-in-a-table> for the details.
  댓글 수: 2
Mihai Milea
Mihai Milea 2020년 11월 27일
Yes I read that carefully -:)).And yet all I tried gives some sort of errors . For example : Say I have a vector of indices (not all 1500) baptized v. With your example tII(:,v') or tII{:,v'} cannot be used (or I do not know how) to populate that matrix closes .I get all sort of errors :cell cannot be converted to double,etc
dpb
dpb 2020년 11월 28일
You didn't convert the input to double for all the missing data that was interpreted as 'char' data where there are missing data.
Use the import options object to force that to happen...
>> opt=detectImportOptions('ii.csv');
>> unique(opt.VariableTypes)
ans =
1×3 cell array
{'char'} {'datetime'} {'double'}
shows that some values will be 'char' because there weren't enough samples in the input file for it to be able to determine really was supposed to be numeric. So, since know what want/need--
>> opt.VariableTypes=strrep(opt.VariableTypes,'char','double');
>> unique(opt.VariableTypes)
ans =
1×2 cell array
{'datetime'} {'double'}
>>
Set all of those to also be 'double' before reading...then, I didn't look at the input file to see just why but noticed that the date column title wasn't recognized so may as well fix it now, too...
>> opt.VariableNames(1)={'Date'};
Then,
tII=readtable('ii.csv',opt);
and all your data will be numeric and no cell arrays be left.
It's tough to visualize such large datasets that aren't possible to look at on screen or even in many editors. Is there really a need for so many variables all at one time?
I also didn't look at the variable names in the file -- you'll have to choose which way to treat those -- if you want to use named variables for selection of a specific (set of) symbols, you may want to do as you did earlier and preserve those names despite them not being valid MATLAB names and perhaps harder to type. If not, you have potential to not have easy way to know what MATLAB named them as...again, all depends on just how you intend to/need to address so many variables.
You can create dynamic variable names from the string name property programmatically may be one way or build pick lists and search for where certain names are to return those.
However you choose to go, having 1500 variables is a challenge no matter what system you're using for anything other than fully automated analyses--just not feasible to do by hand.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Whos에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by