Program Independent Jobs for a Supported Scheduler
Create and Run Jobs
This section details the steps of a typical programming session using Parallel Computing Toolbox™ software to run independent jobs on a MATLAB® Job Scheduler cluster or any third-party cluster interfaced with MATLAB Parallel Server™.
This section assumes that you are connecting to a MATLAB Job Scheduler cluster. The basic job programming sequence is the same for MATLAB Job Scheduler or any third-party scheduler cluster:
Note that the objects that the client session uses to interact with the cluster
are only references to data that is actually contained in the MATLAB Job Scheduler, not in the client session. After jobs and tasks are
created, you can close your client session and restart it, and your job is still
stored in the MATLAB Job Scheduler. You can find existing jobs using the findJob
function or the
Jobs
property of the MATLAB Job Scheduler cluster object.
Define and Select a Profile
A cluster profile identifies the type of cluster to use and its specific properties. In a profile, you define how many workers a job can access, where the job data is stored, where MATLAB is accessed and many other cluster properties. The exact properties are determined by the type of cluster.
The step in this section all assume the profile with the name
MyProfile
identifies the cluster you want to use, with
all necessary property settings. With the proper use of a profile, the rest of
the programming is the same, regardless of cluster type. After you define or
import your profile, you can set it as the default profile in the Profile
Manager GUI, or with the command:
parallel.defaultClusterProfile('MyProfile')
A few notes regarding different cluster types and their properties:
Notes
In a shared file system, all nodes require access to the folder specified
in the cluster object's JobStorageLocation
property.
Because Windows HPC Server requires a shared file system, all nodes
require access to the folder specified in the cluster object's
JobStorageLocation
property.
In a shared file system, MATLAB clients on many computers can access the same job data on the network. Properties of a particular job or task should be set from only one client computer at a time.
When you use an LSF® scheduler in a nonshared file system, the scheduler might report that a job is in the finished state even though the LSF scheduler might not yet have completed transferring the job's files.
Find a Cluster
You use the parcluster
function to identify a cluster and
to create an object representing the cluster in your local MATLAB session.
To find a specific cluster, use the cluster profile to match the properties of
the cluster you want to use. In this example, MyProfile
is
the name of the profile that defines the specific cluster.
c = parcluster('MyProfile');
Create a Job
You create a job with the createJob
function. Although
this command executes in the client session, it actually creates the job on the
cluster, c
, and creates a job object,
job1
, in the client session.
job1 = createJob(c)
Job Properties: ID: 1 Type: independent Username: mylogin State: pending SubmitDateTime: StartDateTime: RunningDuration: 0 days 0h 0m 0s NumThreads: 1 AutoAttachFiles: true Auto Attached Files: List files AttachedFiles: {} AutoAddClientPath: false AdditionalPaths: {} Associated Tasks: Number Pending: 0 Number Running: 0 Number Finished: 0 Task ID of Errors: [] Task ID of Warnings: []
Note that the job's State
property is
pending
. This means the job has not been queued for
running yet, so you can now add tasks to it.
The cluster's display now includes one pending job:
c
MJS Cluster Properties: Name: my_mjs Profile: MyProfile Modified: false Host: myhost.mydomain.com Username: myuser NumWorkers: 1 NumThreads: 1 NumBusyWorkers: 0 NumIdleWorkers: 1 JobStorageLocation: Database on myhost.mydomain.com ClusterMatlabRoot: C:\apps\matlab SupportedReleases: R2021b OperatingSystem: windows AllHostAddresses: 0:0:0:0 SecurityLevel: 0 (No security) HasSecureCommunication: false RequiresClientCertificate: false RequiresOnlineLicensing: false Associated Jobs: Number Pending: 1 Number Queued: 0 Number Running: 0 Number Finished: 0
You can transfer files to the worker by using the
AttachedFiles
property of the job object. For details,
see Share Code with Workers.
Create Tasks
After you have created your job, you can create tasks for the job using the
createTask
function. Tasks
define the functions to be evaluated by the workers during the running of the
job. Often, the tasks of a job are all identical. In this example, each task
will generate a 3-by-3 matrix of random numbers.
createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3}); createTask(job1, @rand, 1, {3,3});
The Tasks
property of job1
is now a
5-by-1 matrix of task objects.
job1.Tasks
5x1 Task array: ID State FinishDateTime Function Errors Warnings ----------------------------------------------------------------- 1 1 pending rand 0 0 2 2 pending rand 0 0 3 3 pending rand 0 0 4 4 pending rand 0 0 5 5 pending rand 0 0
Alternatively, you can create the five tasks with one call to
createTask
by providing a cell array of five cell
arrays defining the input arguments to each task.
T = createTask(job1, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}});
In this case, T
is a 5-by-1 matrix of task objects.
Submit a Job to the Job Queue
To run your job and have its tasks evaluated, you submit the job to the job
queue with the submit
function.
submit(job1)
The job manager distributes the tasks of job1
to its
registered workers for evaluation.
Each worker performs the following steps for task evaluation:
Receive
AttachedFiles
andAdditionalPaths
from the job. Place files and modify the path accordingly.Run the
jobStartup
function the first time evaluating a task for this job. You can specify this function inAttachedFiles
orAdditionalPaths
. When using a MATLAB Job Scheduler, if the same worker evaluates subsequent tasks for this job,jobStartup
does not run between tasks.Run the
taskStartup
function. You can specify this function inAttachedFiles
orAdditionalPaths
. This runs before every task evaluation that the worker performs, so it could occur multiple times on a worker for each job.If the worker is part of forming a new parallel pool, run the
poolStartup
function. (This occurs when executingparpool
or when running other types of jobs that form and use a parallel pool, such asbatch
.)Receive the task function and arguments for evaluation.
Evaluate the task function, placing the result in the task's
OutputArguments
property. Any error information goes in the task'sError
property.Run the
taskFinish
function.
Retrieve Job Results
The results of each task's evaluation are stored in that task object's
OutputArguments
property as a cell array. Use the
function fetchOutputs
to retrieve the
results from all the tasks in the job.
wait(job1) results = fetchOutputs(job1);
Display the results from each task.
results{1:5}
0.9501 0.4860 0.4565 0.2311 0.8913 0.0185 0.6068 0.7621 0.8214 0.4447 0.9218 0.4057 0.6154 0.7382 0.9355 0.7919 0.1763 0.9169 0.4103 0.3529 0.1389 0.8936 0.8132 0.2028 0.0579 0.0099 0.1987 0.6038 0.0153 0.9318 0.2722 0.7468 0.4660 0.1988 0.4451 0.4186 0.8462 0.6721 0.6813 0.5252 0.8381 0.3795 0.2026 0.0196 0.8318
Manage Objects in the Scheduler
Because all the data of jobs and tasks resides in the cluster job storage location, these objects continue to exist even if the client session that created them has ended. The following sections describe how to access these objects and how to permanently remove them:
What Happens When the Client Session Ends
When you close the client session of Parallel Computing Toolbox software, all of the objects in the workspace are cleared. However, the objects in MATLAB Parallel Server software or other cluster resources remain in place. When the client session ends, only the local reference objects are lost, not the actual job and task data in the cluster.
Therefore, if you have submitted your job to the cluster job queue for execution, you can quit your client session of MATLAB, and the job will be executed by the cluster. You can retrieve the job results later in another client session.
Recover Objects
A client session of Parallel Computing Toolbox software can access any of the objects in MATLAB Parallel Server software, whether the current client session or another client session created these objects.
You create cluster objects in the client session by using the parcluster
function.
c = parcluster('MyProfile');
When you have access to the cluster by the object c
, you
can create objects that reference all those job contained in that cluster. The
jobs are accessible in cluster object's Jobs
property,
which is an array of job objects:
all_jobs = c.Jobs
You can index through the array all_jobs
to locate a
specific job.
Alternatively, you can use the findJob
function to search in a
cluster for any jobs or a particular job identified by any of its properties,
such as its State
.
all_jobs = findJob(c); finished_jobs = findJob(c,'State','finished')
This command returns an array of job objects that reference all finished jobs
on the cluster c
.
Reset Callback Properties (MATLAB Job Scheduler Only)
When restarting a client session, you lose the settings of any callback
properties (for example, the FinishedFcn
property) on jobs
or tasks. These properties are commonly used to get notifications in the client
session of state changes in their objects. When you create objects in a new
client session that reference existing jobs or tasks, you must reset these
callback properties if you intend to use them.
Remove Objects Permanently
Jobs in the cluster continue to exist even after they are finished, and after the MATLAB Job Scheduler is stopped and restarted. The ways to permanently remove jobs from the cluster are explained in the following sections:
Delete Selected Objects. From the command line in the MATLAB client session, you can call the delete
function for any job
or task object. If you delete a job, you also remove all tasks contained in
that job.
For example, find and delete all finished jobs in your cluster that belong
to the user joep
.
c = parcluster('MyProfile') finished_jobs = findJob(c,'State','finished','Username','joep') delete(finished_jobs) clear finished_jobs
The delete
function permanently removes these jobs from
the cluster. The clear
function removes the
object references from the local MATLAB workspace.
Start a MATLAB Job Scheduler from a Clean State. When a MATLAB Job Scheduler starts, by default it starts so that it resumes its former session with all jobs intact. Alternatively, a MATLAB Job Scheduler can start from a clean state with all its former history deleted. Starting from a clean state permanently removes all job and task data from the MATLAB Job Scheduler of the specified name on a particular host.
As a network administration feature, the -clean
flag of
the startjobmanager
script is described in Start in a Clean State (MATLAB Parallel Server) in the
MATLAB
Parallel Server System Administrator's Guide.