MATLAB Answers

Help creating a homogenous array when one or more columns has data type 'cell'

조회 수: 1(최근 30일)
Alex Travis
Alex Travis 25 Aug 2020
편집: dpb 27 Aug 2020
I have a data set in table format in which all columns have a single entry (type double) except for a few columns that have type cell (but still only a single entry).
I'm trying to manipulate this data, but before I do I need to reformat the 'cell' columns from type cell into type double to match the rest of the data. This process needs to be generic enough to convert the whole table into the same data format no matter which columns are which data type.
I was thinking about doing some loop like:
for i = 1:size(table)
column(i) = 'thing that converts a vector of cells into same size vector of doubles'
Any help is appreciated

  댓글 수: 0

댓글을 달려면 로그인하십시오.


dpb 25 Aug 2020
I'm guessing here, but try
for i=1:width(t)
if iscell(t{:,i})
where t is your table variable.

  댓글 수: 5

표시 이전 댓글 수: 2
dpb 25 Aug 2020
Well, as said, it was a pure guess with no details to work with...we need to see the actual data structure that causes the problem so can see what it is that are trying to address. Attach a (smallish) .mat file with the data or at least a small text example pasted from command window...a whos on the variable name never hurts.
The code above presumes a table with columns of doubles and possibly columns of doubles in a cell will convert the cell array to a double array and place back into the table with the existing variable name.
Given an initial table that looks like
>> head(t)
ans =
8×2 table
ix ans
__ ______
1 {[ 1]}
2 {[ 2]}
4 {[ 4]}
5 {[ 5]}
8 {[ 8]}
9 {[ 9]}
10 {[10]}
12 {[12]}
the above code operatiing over t produces--
>> head(t)
ans =
8×2 table
ix ans
__ ___
1 1
2 2
4 4
5 5
8 8
9 9
10 10
12 12
That's what I guessed you had the form of from the description...if I erred in that guess, then it's the right solution; wrong problem.
I'd question why not fix the problem on the creation side of things instead, though -- what caused the mismatch in the beginng? Let's attack the problem there as being more effective solution.
Alex Travis
Alex Travis 26 Aug 2020
Hey, thanks for the reply. So the CSV files I'm trying to import are autogenerated by another program. Basically the CSVs are a giant matrix of numbers, but occasionally one of the cells has a word in it (see example below). When a single cell has a word, it seems that 'readtable' turns the whole column into data type 'cell'.
Here's the good news, I have code further down stream that eliminates these cells with words from my data set (as I only really need a subset of this data that will never have words in it. However, some of the data I do need has already been turned into cells at this point. Also, I want to have the entire data set for other purposes.
Example table:
Thanks again for your help
dpb 26 Aug 2020
"When a single cell has a word, ... 'readtable' turns the whole column into data type 'cell'." really has no choice; a column has to be a homogenous type and since the string can't be converted, the option would be to not return the column at all or return it but as a cell array.
However, you should be in luck -- use the import options object with the 'ImportErrorRule','fill' and the 'FillValue','NaN'. This is the default combination but you have to use the import options object to bring it into play, readtable on its own doesn't go the extra mile.
I just happened to have a suitable test file on hand from an Answers Q? of days of yore:
>> type ziad.txt
1 3 5 6 temp 56 d 3
4 f 45 5 6 H 4 5
has some character data scattered around amongst the numeric values as does your file..
>> t
t =
2×8 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8
____ ____ ____ ____ ____ ____ ____ ____
1 3 5 6 NaN 56 NaN 3
4 NaN 45 5 6 NaN 4 5
where the string data locations are now NaN but all variables are numeric.
Moral: It pays to read the See Also section of the documentation! :)
ADDENDUM: Depending on the location and number of strings in the input file, it is possible that detectImportOptions will decide a given column is character data instead of numeric. This can/will happen if the first several records for the variable happen to be character data instead of numeric.
So, for generic solution, you will want to ensure the identified variable types are double for all columns in the import options object.
Also NB: you can create one of these import objects that matches the file format and save it as a .mat file and reload/reuse it for any file of the will need to match the number of columns so if you have files with differing numbers of of variables, you'll need one consistent with the file. But, you don't have to call and cleanup the object for every file if you have a series of similar files.

댓글을 달려면 로그인하십시오.





Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by