이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
For consistency with "nan", wouldn't it be nice to be able to issue "missing(3)"?
조회 수: 3 (최근 30일)
이전 댓글 표시
<Missing> is the string counterpart to NaN. One can define a (say) 3x3 array of NaN's. Each NaN can be replaced as the data is generated in the analysis process, which is handy if the data isn't generated simultaneously. At no point is there any confusion between NaN versus a valid zero datum.
It would be nice to be able to do this for strings. Of course, one can issue "string(nan(3))" or "repmat(missing,3,3)". But as code gets more intricate, simplicity becomes more valuable.
댓글 수: 11
Bruno Luong
2022년 4월 26일
편집: Bruno Luong
2022년 4월 26일
No please. The object oriented such as string, missing is slow and does add anything useful IMO on serious programming. Let's stay with array of inambiguous IEEE numbers and NaN. Don't mix both worlds please.
FM
2022년 4월 26일
The two worlds don't have to mix. A huge bill of human effort goes into preparing data for computational intensive analysis. Strings make Matlab more suitable for data wrangling. The necessary conversions can then be done to make the data faster to manipulate for computationally intensive parts of analysis.
Rik
2022년 4월 26일
I'm personally not a fan of strings. Maybe I don't use them enough.
I simply don't see the benefit over a cellstr (a cell vector of char vectors). There is a lot of syntactic sugar, but nothing fundamental. You can't do {'x'}+1 (while you can do "x"+1), but I don't see how that reinforces your point.
Bruno Luong
2022년 4월 26일
But when you do "x" + pi, you are not able to control the number of digits, the length the format, so who really wants that?
FM
2022년 4월 26일
편집: FM
2022년 4월 26일
At least for me, syntatic sugar has a very real impact when it comes to quickly and consistently modifying code during exploratory analysis. I'm using exactly the same equivalence checking syntax as for numbers. When iterating over a horizontal vector of strings, I don't need to unpack the string from a cell. Within general purpose utility functions, I don't need to test a variable to see if it needs special handling. The last thing I need is for a function written months or years ago to break because the code didn't account for the special syntax of strings. I then have to rip my mind away from what it was doing and descend into the minutiae of that code, hoping that I don't break anything else by modifying it.
Just to bring the discussion back on course, this question isn't about whether strings should be supported. It is about making the behaviour cleaner and more consistent with numbers.
But in response to questions about x+pi, you would probably instead do disp("The matching records in JoinTable = "+sum(BinaryMaskingVector)). It's more readable than the traditionan fprintf, though nothing prevents you from resorting to the latter when it is more advantageous.
Rik
2022년 4월 26일
The better readability is in the eye of the beholder.
I don't really see what you mean with a special syntax. If anything, using strings instead of char (or cellstr) is the more special syntax. In almost every case my functions will convert a string to a cellstr, after which I can use code that works on every release I managed to get to start.
I don't mean you should never use strings. What I do mean is that I view strings as the on the same level as scripts: fine for debugging, but any serious analysis should happen in a function.
FM
2022년 4월 26일
As I said, exceptional handling means having to remove the cell wrapper, which is different from any numeric data. Also equivalence checking. Consistency also means the code looks the same was with numerics so that when you're filtering based on different combinations of conditions, you can quickly scan all your logic and see what you need to change, which is especially important in exploratory analysis, since it is constant change.
Jan
2022년 4월 26일
@Rik: I think, cell strings would have been fine enough, but it was a mess, that their elements could be CHAR matrices. As long as all functions for string handling must catch the exception of the CHAR matrices, the mess was too expensive. I've written some dozens of functions for "string" (in the sense of character vectors) handling as Matlab and C-Mex code and simply ignore 2D-char arrays.
A benefit of strings (as a class) is the improved memory consumption. In a cell array, each element has an overhead of about 100 bytes. For a set of 50 million gene sequences this matters.
It would be nice to have a string class using 8-bit ASCII only.
But coming back to the question of the OP: missing(2,3, 'string') looks fine to create a [2 x 3] string array containing undefined strings.
Paul
2022년 4월 26일
I did not realize that missing is a class
m = missing;
whos m
Name Size Bytes Class Attributes
m 1x1 0 missing
class(m)
ans = 'missing'
However, it's not listed as one of the Fundamental Classes. Are there other types of built-in classes besides fundamental classes? Also, that doc page states that there are 16 fundamental classes, but the table contains 18.
FM
2022년 4월 27일
@Jan: In 2019a, missing(2,3) and missing(2,3,'string') are not recognized. I'm still waiting to upgrade. The command "doc missing" yields a very sparse documentation page....
Jan
2022년 4월 27일
@FM: In R2022a this syntax is not working also - you can try this here in the forum's interpreter:
missing(2,3,'string')
Error using missing
Too many input arguments.
Too many input arguments.
So this is a useful enhancement request. Please use the link on the bottom of this page to contact the MathWorks team and suggest this improvement.
답변 (1개)
Bruno Luong
2022년 4월 27일
Why not define your own function
mymissing
ans = missing
<missing>
mymissing(3)
ans = 3×3 missing array
<missing> <missing> <missing>
<missing> <missing> <missing>
<missing> <missing> <missing>
mymissing(2,3)
ans = 2×3 missing array
<missing> <missing> <missing>
<missing> <missing> <missing>
mymissing(2,'string')
ans = 2×2 string array
<missing> <missing>
<missing> <missing>
mymissing(2,3,'double')
ans = 2×3
NaN NaN NaN
NaN NaN NaN
function x = mymissing(varargin)
% x = mymissing(size)
% x = mymissing(n1, n2, ...)
% x = mymissing(..., class)
x = missing;
if ~isempty(varargin)
if ischar(varargin{end}) || isstring(varargin{end})
sz = [varargin{1:end-1}];
if isempty(sz)
sz = 1;
end
cls = varargin{end};
x = repmat(feval(cls,x), sz);
else
sz = [varargin{1:end}];
x = repmat(x, sz);
end
end
end
댓글 수: 12
FM
2022년 4월 27일
편집: FM
2022년 4월 27일
Thanks, Bruno.
I know what I'm going to say is obvious, but a simple way would be missing(nan(3)) or repmat(missing,3,2). The functionality wasn't the point so much as consistency. We have a thing called "missing", and having it behave in the same way as NaN (or zeros or ones) would streamline the language.
It wasn't until Jan posted the "missing" constructor that I realized it has become more complicated after 2019a. The doc page for 2019a says that "missing" is the string counterpart to NaN for doubles, but if missing(2,3,'string') works for later versions of Matlab, then it has become a more general generator of indicators of absent data for more than just strings.
My 2019a doesn't have such constructor behaviour, so I can't play with it and scope out what the behaviour is. I thought the constructor returned a string (in the same way that NaN is considered to be a double), but even on my 2019a, class(missing) returns 'missing'. It is its own special class with its own behaviour.
For example, x=repmat(missing,3,3);x(2,2)=pi causes an error converting pi to "missing", but x=ones(3);x(2,2)=missing is fine.
Because of its exceptional behaviour, I'm not sure if it makes sense to allow missing(3) as opposed to string(nan(3)).
Paul
2022년 4월 27일
There is no missing constructor:
missing(2,3,'string')
Error using missing
Too many input arguments.
Too many input arguments.
FM
2022년 4월 27일
Oh. OK, so my 2019a isn't so obsolete after all. But given that missing is its own class, and x=repmat(missing,3,3) creates an array that you can't update with non-missing values, it no longer seems to makes sense to make it behave like NaN. At least for x=nan(3), you fill in the missing data using (say) x(2,2)=pi.
If TMW changes the behaviour of missing to be more consistent with NaN, then it might make sense to introduce constructor behaviour that returns an array (like Nan, zeros, and ones). Currently, it seems to be a meta-type that needs to be augmented with a more concrete type, e.g., string(nan(3)) yields an array of strings that show up as "missing". If strings are the ony class for which missing serves to indicate absent data, then there seems to be no reason why it shouldn't be made to behave like NaN.
Paul
2022년 4월 27일
No disagreement here. WRT to "If TMW changes the behaviour" ... does the behaviour in question mean being able to assign into an array? If so, keep in mind that NaN is not its own class, and that a line like this
x = nan(3,3);
is actually a function call
which nan(3,3)
built-in (/MATLAB/toolbox/matlab/elmat/nan)
So even though the elements of x have nan values, x is still a double.
In addition to the missing constructor
which missing
/MATLAB/toolbox/matlab/datatypes/missing.m % missing constructor
it seems like it may be useful to also have an overloaded built-in that provides the functionality of @Bruno Luong's mymissing(), perhaps even extending it to include a case like
%mymissing(___,var)
that returns the missing that corresponds to the type of var.
Having said that, how should mymissing() work for user-defined classes?
Bruno Luong
2022년 4월 27일
편집: Bruno Luong
2022년 4월 27일
@FM here is what the current doc states about missing. It is not particularly associated with string.
It looks like the designer (of missing) has in his mind the principal user case of
mydata(rows,cols) = missing;
so to him/her, the possibility of creating an array of missing is not needed. I would argue also that the extension to array brings little to our comfort, but I see also no techincal reason to oppose to the extension of creating missing with array dimension.
May be the place where an array of missing (not casted to other native types) is needed is within a tab or a struct. I believe they can be exported/serialized (writetab, writestruct) and then missing can be understood by other languages (C#, ...).
Bruno Luong
2022년 4월 27일
편집: Bruno Luong
2022년 4월 27일
Frankly hypothetic extension for syntax
mymissingarray = repmat(missing,100,100);
mymissingarray(7,2) = 6;
would be coherent with MATLAB way of assigment, but I have hardtime to imagine how it could be useful in reallfe programming.
FM
2022년 4월 27일
In my opinion, before implementing any changes, there needs to be some concept of the intended use of "missing". It it is only a string counterpart to double's NaN, then why can't it behave similarly, e.g., x=missing(3) or x=missing(3,2), where class(x) is 'string'. You should be able to assign a string to (say) x(2,2).
If the grand plan is for "missing" to have a more generalized purpose, then then it's up to whoever has this vision to make the case for it. Right now, "doc missing" in 2019a says that "missing" is the string counterpart to NaN, but it certainly doesn't behave in an analogous manner. So my humble opinion is the the purpose needs to be clarified, and then the implementation can follow, be it to streamline it into a string counterpart for "string" or something for grand.
Bruno Luong
2022년 4월 27일
편집: Bruno Luong
2022년 4월 27일
"It it is only a string counterpart to double's NaN,"
Not to me. That's why I send you the link of the current doc, missing is not a counter part of NaN for string.
string class has it own missing value (different than missing class) similar to NaN for double.
"Right now, "doc missing" in 2019a ..."
You can safely remove "Right now" in your sentence, we are in 2022. :)
FM
2022년 4월 27일
@Bruno Luong:
You're right. Missing can be assigned in a vectorized manner (even though I don't see it in the documentation):
>> x=repmat("dog",3)
x = 3×3 string array
"dog" "dog" "dog"
"dog" "dog" "dog"
"dog" "dog" "dog"
>> x(2:3,2:3)=missing
x = 3×3 string array
"dog" "dog" "dog"
"dog" <missing> <missing>
"dog" <missing> <missing>
As I said, my suggestion to create an array of missing was based on the (mis)impression that it was a string counterpart to NaN, which it clearly isn't. In fact, the documentation clearly shows its use in the context of other types/classes. I was mistaken about the fact that it claims "missing" to be a string counterpart to NaN. The only place where it is described as such is on a string page for handling missing values: https://www.mathworks.com/help/matlab/matlab_prog/test-for-empty-strings-and-missing-values.html#TestForEmptyStringsAndMissingValuesExample-3
Bruno Luong
2022년 4월 27일
편집: Bruno Luong
2022년 4월 27일
"(even though I don't see it in the documentation):"
Well it is always possible to assign a (scalar) to the lhs with indexing, provided the rhs type can be cast in the class of the lhs, this is not specific to "missing" class. It must be written somewhere and not the doc of missing, since it is a generic feature.
I guess in 2019 missing is just brand new implemented started with strings, and the doc at that time is somewhat missleading.
FM
2022년 4월 27일
@Bruno Luong: No, I should have been clearer. the 2019a documentation for "missing" does not describe it a specific to "string". I got that from the page cited in my last reply. Which is a very specific page for strings and missing values.
참고 항목
카테고리
Help Center 및 File Exchange에서 Logical에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)