Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Configure a Hadoop Cluster

Parallel MATLAB® code that contains tall arrays and mapreduce functions can be submitted to the Hadoop cluster from suitably configured MATLAB clients.

To configure the client to run MATLAB code on the cluster, you must already be able to submit to the cluster from the intended client machine. The client machine must have a Hadoop® installation that can access the cluster outside of MATLAB.

Many Hadoop distributions do not support direct access of Linux® based clusters from Windows® clients. Users of Windows clients typically need to set up a Linux gateway node that can be accessed from the Windows client via SSH or VNC. The cluster can then be accessed from this gateway node.

Cluster Configuration

  1. Integrate MATLAB Distributed Computing Server™ with your cluster infrastructure. For instructions, see Integrate MATLAB with Third-Party Schedulers.

  2. If your cluster requires Kerberos authentication, ensure your MATLAB Distributed Computing Server installations have been configured correctly. For instructions, see Kerberos Authentication.

Client Configuration

  1. Ensure your client can access the Hadoop cluster outside MATLAB.

  2. Ensure your client MATLAB installation has been configured for Kerberos authentication if your cluster requires it. For instructions, see Kerberos Authentication.

To access the cluster from within MATLAB, set up a parallel.cluster.Hadoop object using the following statements.

setenv('HADOOP_HOME', '/path/to/hadoop/install')
cluster = parallel.cluster.Hadoop;

Use mapreducer to specify mapreduce to run on the Hadoop cluster object.

For examples of how to run parallel MATLAB code on your Hadoop cluster, see Run mapreduce on a Hadoop Cluster (Parallel Computing Toolbox) and Use Tall Arrays on a Spark Enabled Hadoop Cluster (Parallel Computing Toolbox).

Kerberos Authentication

If the cluster uses Kerberos authentication that requires the Oracle® Java® Cryptography Extension, you must configure all installations of MATLAB and MATLAB Distributed Computing Server. If you are using Hortonworks® or Cloudera® distributions, it is likely that you need to complete these configuration steps.

The configuration instructions are the same for client and worker MATLAB installations.

Starting in R2018b, configure your MATLAB installation by enabling the appropriate security policy in the Java installation.

  1. In the MATLAB Editor, open the file ${MATLAB_ROOT}/sys/java/jre/${ARCH}/jre/lib/security/java.security.

  2. Change the line

    #crypto.policy=unlimited
    to
    crypto.policy=unlimited

For previous releases, you must download additional security files from Oracle.

  1. Download the Oracle Java Cryptography Extension zip file from the Oracle Java SE page.

  2. Unzip the downloaded zip file into a temporary folder.

  3. Replace the files local_policy.jar and US_export_policy.jar in the folder ${MATLABROOT}/sys/java/jre/${ARCH}/jre/lib/security with the downloaded versions.

Hadoop Version Support

  • MATLAB mapreduce is supported on Hadoop 2.x clusters. Note that support for Hadoop 1.x clusters has been removed.

  • MATLAB tall arrays are supported on Spark® enabled Hadoop 2.x clusters. You can use tall arrays on Spark enabled Hadoop clusters supporting all architectures for the client, while supporting Linux and Mac architectures for the cluster. This includes cross-platform support.

FunctionalityResultUse InsteadCompatibility Considerations
Support for running MATLAB mapreduce on Hadoop 1.x clusters has been removed. ErrorsUse clusters that have Hadoop 2.x or higher installed to run MATLAB mapreduce. Migrate MATLAB mapreduce code that runs on Hadoop 1.x to Hadoop 2.x.

See Also

Related Topics