MATLAB Answers

2

Difference between readmatrix() and readtable()

Asked by Robert Kugler on 24 Apr 2019
Latest activity Commented on by dpb
on 6 May 2019
Accepted Answer by dpb
Hi community,
After reading that xlsread() is not recommended anymore, I tried to get my excel table read through the readmatrix() function. However, I don't seem to get my excel file read in the correct way. The output that readmatrix gives me is never what what I can read in the excel file. I assume this has something to do with the coding, so I tried to get the output right through the "OutputType" option, but this did not yield satisfactory results either.
Later then, I found the readtable() function, and this seems to give me what I want. However, I still need to understand a bit more why it does so, and that brings me to my basic question:
I read the documentations on both functions and from what I understand the most basic difference between them seems to be that readmatrix() returns an array and readtable() returns a table. But why does this cause me trouble in getting the data from the excel file returned correctly/the way I need it when I use readmatrix()? What is the underlying difference in how Matlab handles the data in this case? And, if readtable() actually turns out to the function that is better suited in this case, for what prupose would readmatrix() be better suited then? Could you give an example?
Note on the data I am intending to import from the excel file: it is a relatively simple database-like structured sheet where each row is an item and each column represents a propoerty of that item (e.g.: item may be a certain material, and the colums can be "price", "weight", etc.). The data contained in the property colums can be of both text or numeric type, or empty.
Thanks in advance for your kind support!

  4 Comments

readtable should be used if the data in your file is most correctly represented as a table in MATLAB. I.e. there are multiple rows of data where each has multiple heterogeneous fields, but each of those fields has a consistent datatype.
readmatrix should be used if all the data is of the same type.
readcell is meant for the case where the data in the file is completly mixed. It will bring data in as text if it cannot otherwise be converted to a number, datetime, or duration.
Also, if you're having trouble with getting the import options for a particualr file, the Import Tool gui will generate MATLAB code (I think as of 18b for spreadsheet, 19a for delimted text files) which will construct the import options in a script or function and let you preview and customize the import.
Hi Jeremy, thanks for the additional explanation! This helps a lot
dpb
on 6 May 2019
If there are some user tools for defining and then cleaning up import option objects, that's a big step, potentially...(I'm stuck at R2017b for the time being for other reasons so can't go investigate just now).
While it's doable as is, the interface to the object through the supplied functions is very painful to deal with for complex files with large numbers of variables that may have issues such as outlined earlier (been there, done that number of times on the forum). Being able to make such changes interactively to specific variables would be great.
The same caveat as above must still be observed, however, and isn't documented nor discussed at all that I am aware of -- once one does this, one MUST retain that import object for use with all files of the type; rescanning another file may break the association thus so carefully and painfully created.

Sign in to comment.

Products


Release

R2019a

1 Answers

Answer by dpb
on 24 Apr 2019
Edited by dpb
on 26 Apr 2019
 Accepted Answer

It appears readmatrix is limited to returning one type of data in the output array as the 'OutputType' named parameter is limited to a scalar string/cell string. Therefore, there's no way to tell it to return the various types of data that may be in different columns in a spreadsheet(*).
readtable on the other hand, can and does support variables of different type by column and so can handle a spreadsheet similar to your description and is certainly the best option for that type of a spreadsheet when you want all the data and not just a subset of a given data type. A table, however, is limited to have the same number of rows in every variable so if the spreadsheet isn't perfectly regular, it will fill in a bunch of missing values down to the furtherest extent in the spreadsheet unless you limit the range. Sometimes this is ok, other times it's a pain.
xlsread is slow but has the facility to return all of what is in the spreadsheet; it does so by returning numeric and text fields separately, however, whereas a table contains them in one data structure retaining their type (with some help, sometimes from an import object with, particularly, dates). If you want the full spreadsheet unadulterated, the last optional raw output array is a cell array of the whole spreadsheet from which one can retrieve any piece of information from any row/col cell regardless of its type or whether it is consistent with the type of those cells around it. Sometimes, this feature is also invaluable. xlsread is the one option that existed before the introduction of any of the more general data classes except for cell arrays which is why it returns separate variables; there was no other option at the time.
The doc implies readmatrix will return a table if one passes the optional opts structure but I see little reason then for it in lieu of readtable which also can use an opts object.
Seems to me TMW is continuing to just add stuff just to be adding it without any clearly defined (to the user base, anyway) direction of where they're headed and why they don't just improve existing functions instead of overloading namespace and bloating code size with multiple essentially duplicated functionalities.
(*) And, of course, for an array other than a cell array, there can only be one data class for the array, anyway. Without the importoptions optional input, readmatrix just punts and returns only what numeric data it can find without offering the option of returning the text data or the combined that does xlsread. This, somehow, seems a step backwards and why not just rewrite xlsread if want to improve it? Instead, for historical reasons, it can never be actually removed and we've just got yet another eventually more or less orphaned function hanging around like textread (which, incidentally, still has some features that make it more useful on occasion than the favored textscan).

  1 Comments

I think what you want is READCELL.

Sign in to comment.



Translated by