Dynamic variable names for full workspace operations

Question

2 개 추천

To start with, I understand dynamic variable names are bad. I am not really trying to use them. What I really want to do is apply a specific operation to all variables in the current workspace; this way I can generate a generic function to apply that operation.

Two examples: Example 1

Let's say that I have a code where I can use double or single precision depending on user choice. I want to cycle through all of the workspace variables looking for e.g. doubles that have numel>1000 and convert all of them to single. I can use who to get my workspace, and then a loop with isa and a boolean to find all the variables that match those criteria. What I want to do now is perform the operation varname = single(varname) to reassign those variables to the single-precision class while keeping the same name. Is there a way to do this other than using dynamic variable names?

Example 2

Lets say I ran into an "out-of-memory" error on the GPU because there is a bunch of junk left on there from other operations. I want to cycle through all gpuArray class variables and pull them down using varname = gather(varname), perform a reset(gpuDevice), and then possibly place them back on the gpu using varname = gpuArray(varname). Again, I understand that I could write a code that knows all of the variable names, the point here is to generate a generic code that can do the operation on all the correct workplace variables.

Again, if there is a totally obvious way of doing this that doesn't involve dynamic names, please let me know. Also, if there is something super bad about either of these concepts, I need to know that too.

Otherwise...how do you code something like this using dynamic variable names, since Matlab seems to make that kind of operation intentionally difficult.

Thanks for your help, -Dan

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

Stephen23 2017년 2월 3일

편집: Stephen23 2017년 2월 3일

"...can use double or single precision depending on user choice. I want to cycle through all of the workspace variables"

"Is there a way to do this other than using dynamic variable names?"

How did these variables get into the workspace?

Did you type them all? Likely no. Which means they were imported somehow, or are defined in the functions themselves.

If they are imported using load or any file reading function then there is absolutely no reason why there needs to be a workspace full of variables: simply import into one array, cell, structure, table, etc and then your task is trivial (one line, no eval).

And if they are there because you are running a user's script then run for the hills screaming. So you sensibly have functions, and are not doing absurd things with load or assignin or the like. Then there is no way you cannot have total control over how those variables get into the function workspace (at generation or import). Most likely you could split the code into sub-functions, cells, loops, or whatever, which then allows easy points to check the values and adjust the data class as you require. Thus your entire question becomes moot.

Even though you have written "I understand dynamic variable names are bad" it seems you have not realized that you can fix the problem at its source, not by trying to patch it up later.

"totally obvious way of doing this that doesn't involve dynamic names"

Yeah, don't have lots of variables. Pretty simple really.

James Tursa 2017년 2월 4일

편집: James Tursa 2017년 2월 4일

MATLAB Online에서 열기

For re-classing variables from double to single, I will mention that this will have issues if any of the variables are shared copies of other variables. A loop that does varname = single(varname) will effectively unshare the variables which would have negative memory usage consequences. E.g., a simplistic example:

X = a 1GB double array
Y = X;  % a shared data copy of X.

At this point you only have 1GB of data in memory, since both X and Y are sharing data. Now see what happens when you make each of them single class:

X = single(X):  % X is unshared with Y and turned into single
Y = single(Y);  % Y is turned into single

After the 1st statement, total data memory is 1.5GB. After the 2nd statement, total data memory is back down to 1GB. But X and Y are not shared copies of each other any more, so there is no memory sharing benefit as was the case in the beginning. Ideally, if one knows that X and Y are shared, you would like to do something like this instead to get the total data memory down to 0.5GB:

X = single(X);
Y = X;

But there are no official mechanisms for detecting variable sharing status, either at the m-file level or in a mex routine. The only way to detect sharing is to hack into the variables in a mex routine, and even then it can get messy very quickly if there are cell arrays, struct arrays, or classdef objects involved.

Bottom line is that if there is a significant amount of data sharing involved, re-classing variables serially will wipe that sharing out and have negative memory usage consequences.

D. Plotnick 2017년 2월 10일

I did not realize data sharing was handled in that way; that is extremely helpful to know. I had thought that kind of sharing only applied when the data was stored in a handle class.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

per isakson 2017년 2월 3일

편집: per isakson 2017년 2월 8일

MATLAB Online에서 열기

1 개 추천

There are good reasons to avoid eval (Here, I use eval as shorthand for eval, evalin and assignin), see

"Example 1" &nbsp I don't think there is a solution without eval. But after all, eval exists in several languages and that's for a reason - I assume.

Here is my attempt to answer "Example 1".

M1 = ones(2e4)+eps;
M2 = ones(1e4)+eps;
variables = reshape( whos('M*'), 1,[] );
for v = variables 
    convert( v.name, 'single' )
end
whos('M*')

prints

Name          Size                    Bytes  Class     Attributes
M1        20000x20000            1600000000  single              
M2        10000x10000             400000000  single

where

function    convert( variable_name, new_class )
% convert variable, variable_name, to type, new_class, in the workspace of the caller
    %
    % assert that the value of variable_name is the name of a variable in the caller 
    xpr = sprintf( 'exist( ''%s'', ''var'' );', variable_name );
    num = evalin( 'caller', xpr );
    %
    if num == 1
        % str = sprintf( '%1$s = cast( %1$s, ''%2$s'' );', variable_name, new_class );
        % sts = evalin( 'caller', str );
        % Error: The expression to the left of the equals sign 
        % is not a valid target for an assignment.         
        xpr = sprintf( 'cast( %s, ''%s'' );', variable_name, new_class );
        try 
            assignin( 'caller', variable_name, evalin( 'caller', xpr ) );
        catch me
            fprintf( 2, 'Error: ''%s''\n', me.message );
        end
    else
        fprintf( 2, 'Undefined variable, ''%s''\n', variable_name );
    end
end

Stephen Cobeldick presents the following list of problems related to eval. I argue that my above use of eval avoids most of these problems.

Slow &nbsp the conversion in the above code is as fast as &nbsp M1=cast(M1,'single'); M2=cast(M2,'single');
Buggy &nbsp No, not in this case. convert does one thing and it's possible to test it thoroughly.
Security Risk &nbsp Not in this case. All necessary tests may be done in convert.
Difficult to Work With &nbsp The use of convert should not cause any problems.
Obfuscated Code Intent &nbsp convert communicates the intent well enough.
Confuses Data with Code &nbsp Not applicable in this case.
Code Helper Tools do not Work &nbsp That's true in this case, but F1 works with convert.

&nbsp

ADDENDUM, 2017-02-08

An improved version of convert inspired by the comments by Jan Simon

function    convert( variable_name, new_type )
% convert variable, variable_name, to type, new_type, in the workspace of the caller
      narginchk( 2, 2 )
      assert( isa( variable_name, 'char' ), 'convert:IllegalClass'...
          ,   '"%s" is not a character array', value2short(variable_name) )
      assert( isrow( variable_name ), 'convert:IllegalSize'   ...
          ,   '"%s" is not a row', value2short(variable_name) )
      assert( isvarname( variable_name ), 'convert:IllegalName'   ...
          ,   '"%s" is not a valid variable name', variable_name  )
      assert( isa( new_type, 'char' ), 'convert:IllegalClass'    ...
          ,   'The type of new_type, %s, is not a char', value2short(new_type) )
      assert( isrow( new_type ), 'convert:IllegalSize'    ...
          ,   'The value of new_type, %s, is not a row', value2short(new_type)  )
      type_list = {'int8','uint8','int16','uint16','int32','uint32'   ...
                  ,'int64','uint64','double','single','logical','char'};
      assert( any(strcmp( new_type, type_list )), 'convert:IllegalType' ...
          ,   'The value of new_type, %s, is not a valid type name', new_type  )
      % assert that the value of variable_name is the name of a variable in the caller 
      xpr = sprintf( 'exist(''%s'', ''var'' );', variable_name );
      assert( evalin('caller',xpr) == 1, 'convert:UndefinedVariable' ...
          ,   '"%s" is not a defined variable', variable_name )
      cmd = sprintf( 'builtin( ''cast'', %s, ''%s'' );', variable_name, new_type );
      try 
          assignin( 'caller', variable_name, evalin( 'caller', cmd ) );
      catch me
          fprintf( 2, 'Error: "%s"\n', me.message );
      end
  end

where

function    str = value2short( val )
%   value2short converts value to a short string that is suitable to display
%
%   See also: mat2str
%
    if nargin > 0
        str     = workspacefunc( 'getshortvalue', val );
        max_len = 48;
        if length( str ) >= max_len
            str = [ str(1:max_len-4 ), ' ...' ];
        end
    else
        str = 'NIL';
    end
end

댓글 수: 12
이전 댓글 10개 표시 이전 댓글 10개 숨기기

Jan 2017년 2월 7일

편집: Jan 2017년 2월 7일

MATLAB Online에서 열기

Buggy, Security risk: What about:

cast = randi([97, 122], 255, 255);
convert('cast', 'single');

;-) What do you expect as output? How long did it take to find out, what happens? If it is more than 15 seconds, I call the code "obfuscated".

Of course, you can insert a builtin, which could be shadowed also, as well as exist.

Slow: The conversion itself is fast, but it will slow down Matlab, because the JIT acceleration cannot work efficiently afterwards: Loops will suffer, when the type of the used variables cannot be identified uniquely. This can degrade the speed substantially - I've seen a factor of 100. This an effect of poking in the lookup table of variables. Try this:

v = zeros(1, 1e6);
tic;
for k = 1:1e6
  v(k) = k;
end
toc
convert('v', 'single');
tic;
for k = 1:1e6
  v(k) = k;
end
toc

I get: 0.007 to 0.0458 seconds (on an old R2009a in a Win764/VM). Initializing v=zeros(1,1e6, 'single') does not change the timings.

Difficult to Work With Converting all variables whose name match a certain criterion like 'M*' is prone to bugs. If tis is hidden anywhere in the code, adding the line on top:

MyString = 'hello';

will lead to an unexpected behavior, if you display this at the bottom:

disp(MyString);

You can prevent this by using meaning names like M_affectedByConvert. But then using a struct, which contains all variables, which migth be affected, would be cleaner and easier.

Obfuscated Code Intent: If this function is used for the OP's problem, the dynamic conversion to a single is not obvious anymore. The calculations might depend on the used type of variables, e.g. when using eps() as tolerance or functions, which require double as input.

My conclusion: Do not use any automatic conversions based on eval, because you will get the usual problems.

+1: For "There are good reasons to avoid eval", for mentioning the threads concerning the warnings and the clear and clean presentation of arguments and code. I do not agree with your conclusions, but the answer helps to analyse the OP's problem.

Stephen23 2017년 2월 18일

편집: Stephen23 2017년 2월 19일

MATLAB Online에서 열기

@per isackson: there is no trick. It was simply me stating what I would do if I was writing a function where at some unknown point during the calculation I needed to change the class of some variables. Here are a few starting assumptions:

The variables are known. For me this is a perfectly reasonable assumption as I never have unknown variables in my workspace (never use load directly into the workspace, avoid assignin, eval, or other dynamic variable names).
There are only a few variables. Again for me quite reasonable, because I do not fill my workspace with thousands of variables: that is what arrays are for.
The variables are accessible to the "change" function.

I do not claim that this will change many arbitrary, not previously specified variables, because I never have unknown variables in my workspace anyway (as we all know, that path leads to JIT problems, obfuscation, and hard to fix bugs). It does not happen in my code, therefore I do not need to solve that problem. I prefer to solve tasks through good design, rather than trying to patch them up later (and hence this nested function).

So in the end my code would have (by design) no unknown variables, and if there were more than a few values, have them stored in some array, giving:

function out = test(N) % try around 12
%
Z = 0;
for k = 1:N
    work()
end
%
    function work()
        Z = Z+1; % my work
        % change can be triggered anywhere:
        if rand()>0.8
            change()
        end
    end
%
    function change()
        if Z>10 % condition
            Z = single(Z);
        end
    end
%
out = class(Z);
end

Note that change can be called by any other nested or local functions, callbacks, timers, listeners, etc., at any point during the calculation.

I do not claim that this answers the original question of "cycle through all of the workspace variables": for the reasons I have given that problem would never occur in my code, allowing me to use this simple nested function to simply resolve the task of converting at any arbitrary moment during calculations involving my known variables.

Rather than trying to sledgehammer my way through my workspace, instead I asked myself: what am I trying to achieve, and found an elegant solution for that.

per isakson 2017년 2월 27일

@Stephen Cobeldick, Thank you for your answer. I agree fully regarding "good design" and "no unknown variables".

I assumed as a premise that OP had painted himself into a corner. After reading the question more carefully I realize that OP posed the question out of curiosity.

댓글을 달려면 로그인하십시오.

Answer 2

Edric Ellis 2017년 2월 2일

MATLAB Online에서 열기

1 개 추천

For the gpuArray case, you could simply use save and load, i.e.

tempFile = tempname();
save(tempFile);
reset(gpuDevice);
load(tempFile);
delete(tempFile);

댓글 수: 2
없음 표시 없음 숨기기

D. Plotnick 2017년 2월 3일

Hmmm, I hadn't considered that. The problem is file sizes and transfer speeds. Several of the variables alone are > 1GB. Pulling them on and off the graphics card is relatively fast. Dropping them onto the hard disk has two major drawbacks (1) this is a much slower operation even with an SSD and (2) due to the size of the resulting file I would need to use the '-v7.3' flag on 'save', which always makes things absolutely chug. Essentially I want to pull all of the active Matlab variables from the GPU onto RAM, flush the GPU, then put the variables back up. Right now I just do it on a variable by variable basis within the code, which works but is also a pain to code and leads to a lot of time debugging to make sure I caught everything.

Walter Roberson 2017년 2월 3일

I did some poking around and thought I was getting somewhere but it didn't work. I was looking for a way to get at the workspace of the current function, with the idea that altering the workspace would be equivalent to altering the variable. I found that if you declare a nested function and use functions() that you get a workspace of the nested function that includes all variables in the parent assigned at the point you took the handle, which seemed like a doable way of getting access to your own workspace. Unfortunately changing the workspace did not change the variables in the function even for the shared variables. I was not able to get further on this.

It did leave me wondering if it would work for moving values in and out of the GPU array. If you have a shared variable that is assigned a gpu array and you gather it and send it again, then does that affect the original gpu array? The gather is going bring it back clearly, but the rewrite might instead create a second variable. I consider evalin('caller') to be a form of eval() though others might disagree I guess.

댓글을 달려면 로그인하십시오.

Answer 3

Joss Knight 2017년 2월 5일

편집: Joss Knight 2017년 2월 5일

0 개 추천

Well, if you're really serious about a tool for managing storage of GPU arrays, then you need a new class. This would be a numeric handle type that forwards all its functions to the underlying type, and adds all new objects to a static list. All functions run in a try...catch statement to catch parallel:gpu:array:OOM and, if triggered it calls a static utility function to gather the contents of the list back to the host and try again.

The only difficulty here is that you need to provide an implementation of every single method you want your new type to implement, i.e. every method of gpuArray (and a few more that aren't methods of gpuArray but are functions that can take gpuArray inputs). But that code could be autogenerated fairly easily.

댓글 수: 2
없음 표시 없음 숨기기

D. Plotnick 2017년 2월 10일

I think I understand how to do what you are saying, and it may someday be worth it. However, for now it looks like considerably more work than taking the time to check my memory footprint during code prototyping, and just coding in a hard limit on the size of the variables I am using. Still, a really interesting idea that I wouldn't have thought of.

However I was wondering about your last comment about autogenerating code that includes any called methods; I did not know that was possible, do you have a link to any tutorial?

As always, thanks Joss.

Joss Knight 2017년 2월 17일

편집: Joss Knight 2017년 2월 17일

MATLAB Online에서 열기

It's just a boiler-plate method for any function, so, say, for plus:

   function varargout = plus(varargin)
       % This bit swaps out the custom-type arguments for
       % their underlying gpuArray property
       for i = 1:numel(varargin)
           if (isa(varargin{i}, 'MyManagedGPUArrayType')
              varargin{i} = varargin{i}.UnderlyingArrayProperty;
           end
       end
       % Try at least twice
       for i = 1:2
           try
               [varargout{1:nargout}] = plus(varargin{:});
           catch me
               if i == 2 || me.identifier ~= "parallel:gpu:array:OOM"
                   rethrow(me);
               else
                   MyManagedGPUArrayType.doSomeGatheringToClawBackMemory();
                   continue;
               end
           end
           break;
       end
   end

So you create some script that reads a long list of function and creates a file with all these forwarding methods in, substituting in the name of the function. Well, no, you'd create a utility function for most of this call-gather-call structure and have a much simpler repeated boiler-plate for each method.

댓글을 달려면 로그인하십시오.

Dynamic variable names for full workspace operations

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

채택된 답변

댓글 수: 12
이전 댓글 10개 표시 이전 댓글 10개 숨기기

추가 답변 (2개)

댓글 수: 2
없음 표시 없음 숨기기

댓글 수: 2
없음 표시 없음 숨기기

카테고리

태그

Community Treasure Hunt

Dynamic variable names for full workspace operations

댓글 수: 4 이전 댓글 2개 표시 이전 댓글 2개 숨기기

채택된 답변

댓글 수: 12 이전 댓글 10개 표시 이전 댓글 10개 숨기기

추가 답변 (2개)

댓글 수: 2 없음 표시 없음 숨기기

댓글 수: 2 없음 표시 없음 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

댓글 수: 12
이전 댓글 10개 표시 이전 댓글 10개 숨기기

댓글 수: 2
없음 표시 없음 숨기기

댓글 수: 2
없음 표시 없음 숨기기