Main Content

Read Spreadsheet Data into Table

The best way to represent spreadsheet data in MATLAB® is in a table, which can store a mix of numeric and text data, as well as variable and row names. You can read data into tables interactively or programmatically. To interactively select data, click Import Data on the Home tab, in the Variable section. To programmatically import data, use one of these functions:

  • readtable — Read a single worksheet.

  • spreadsheetDatastore — Read multiple worksheets or files.

This example shows how to import spreadsheet data programmatically using both functions. The sample data, airlinesmall_subset.xlsx, contains one sheet for each year between 1996 and 2008. The sheet names correspond to the year, such as 2003.

Read All Data from Worksheet

Call readtable to read all the data in the worksheet called 2008, and then display only the first 10 rows and columns. Specify the worksheet name using the Sheet name-value pair argument. If your data is on the first worksheet in the file, you do not need to specify Sheet.

T = readtable('airlinesmall_subset.xlsx','Sheet','2008');
T(1:10,1:10)
ans=10×10 table
    Year    Month    DayofMonth    DayOfWeek    DepTime    CRSDepTime    ArrTime    CRSArrTime    UniqueCarrier    FlightNum
    ____    _____    __________    _________    _______    __________    _______    __________    _____________    _________

    2008      1           3            4         1012         1010        1136         1135          {'WN'}           752   
    2008      1           4            5         1303         1300        1411         1415          {'WN'}          1161   
    2008      1           6            7         2134         2115        2242         2220          {'WN'}          1830   
    2008      1           7            1         1734         1655          54           30          {'WN'}           302   
    2008      1           8            2         1750         1755        2018         2035          {'WN'}          1305   
    2008      1           9            3          640          645         855          905          {'WN'}           896   
    2008      1          10            4         1943         1945        2039         2040          {'WN'}           120   
    2008      1          11            5         1303         1305        1401         1400          {'WN'}          1685   
    2008      1          13            7         1226         1230        1415         1400          {'WN'}          1118   
    2008      1          14            1         1337         1340        1623         1630          {'WN'}           730   

Read Selected Range from Specific Worksheet

From the worksheet named 1996, read only 10 rows of data from the first 5 columns by specifying a range, 'A1:E11'. The readtable function returns a 10-by-5 table.

T_selected = readtable('airlinesmall_subset.xlsx','Sheet','1996','Range','A1:E11')
T_selected=10×5 table
    Year    Month    DayofMonth    DayOfWeek    DepTime
    ____    _____    __________    _________    _______

    1996      1          18            4         2117  
    1996      1          12            5         1252  
    1996      1          16            2         1441  
    1996      1           1            1         2258  
    1996      1           4            4         1814  
    1996      1          31            3         1822  
    1996      1          18            4          729  
    1996      1          26            5         1704  
    1996      1          11            4         1858  
    1996      1           7            7         2100  

Convert Variables to Datetimes, Durations, or Categoricals

During the import process, readtable automatically detects the data types of the variables. However, if your data contains nonstandard dates, durations, or repeated labels, then you can convert those variables to their correct data types. Converting variables to their correct data types lets you perform efficient computations and comparisons and improves memory usage. For instance, represent the variables Year, Month, and DayofMonth as one datetime variable, the UniqueCarrier as categorical, and ArrDelay as duration in minutes.

data = T(:,{'Year','Month','DayofMonth','UniqueCarrier','ArrDelay'});
data.Date = datetime(data.Year,data.Month,data.DayofMonth);
data.UniqueCarrier = categorical(data.UniqueCarrier);
data.ArrDelay = minutes(data.ArrDelay);

Find the day of the year with the longest delay, and then display the date.

ind = find(data.ArrDelay == max(data.ArrDelay));
data.Date(ind)
ans = datetime
   07-Apr-2008

Read All Worksheets from Spreadsheet File

A datastore is useful for processing arbitrarily large amounts of data that are spread across multiple worksheets or multiple spreadsheet files. You can perform data import and data processing through the datastore.

Create a datastore from the collection of worksheets in airlinesmall_subset.xlsx, select the variables to import, and then preview the data.

ds = spreadsheetDatastore('airlinesmall_subset.xlsx');
ds.SelectedVariableNames = {'Year','Month','DayofMonth','UniqueCarrier','ArrDelay'};
preview(ds)
ans=8×5 table
    Year    Month    DayofMonth    UniqueCarrier    ArrDelay
    ____    _____    __________    _____________    ________

    1996      1          18           {'HP'}            6   
    1996      1          12           {'HP'}           11   
    1996      1          16           {'HP'}          -13   
    1996      1           1           {'HP'}            1   
    1996      1           4           {'US'}           -9   
    1996      1          31           {'US'}            9   
    1996      1          18           {'US'}           -2   
    1996      1          26           {'NW'}          -10   

Before importing data, you can specify what data types to use. For this example, import UniqueCarrier as a categorical variable.

 ds.SelectedVariableTypes(4) = {'categorical'};

Import data using the readall or read functions. The readall function requires that all the data fit into memory, which is true for the sample data. After the import, compute the maximum arrival delay for this dataset.

alldata = readall(ds);
max(alldata.ArrDelay)/60
ans = 
15.2333

For large data sets, import portions of the file using the read function. For more information, see Read Collection or Sequence of Spreadsheet Files.

See Also

|

Related Topics