Matlab + Hadoop Integration ??

Question

1 개 추천

We Followed standard doc for integrating Hadoop-2.7.2 with Matlab-R2016b. * Link for document* : http://in.mathworks.com/help/mdce/configure-a-hadoop-cluster.html

We Completed up to this stage but we are confuse with the rest part....

The requirements are:

MATLAB® Distributed Computing Server™ must be installed or available on the cluster nodes. See Install Products and Choose Cluster Configuration.
If the cluster is running in Kerberos authentication that requires the Java Cryptography Extension, you must download and install the Oracle version of this extension to each MATLAB Distributed Computing Server installation. You must also perform this step for the MATLAB client installation. To install the extension, place the Java Cryptography Extension jar files into the folder ${MATLABROOT}/sys/jre/${ARCH}/jre/lib/security.
You must have a Hadoop installation on the MATLAB client machine, that can submit normal (non-MATLAB) jobs to the cluster.
The cluster must identify its user home directory as a valid location that the nodes can access. You must choose a local filesystem path and typically use a local folder such as /tmp/hduserhome or /home/${USER}. Set yarn.nodemanager.user-home-dir for Hadoop version 2.X.
There is one Hadoop property that must not be "final." (If properties are "final", they are locked to a fixed predefined value, and jobs cannot alter them.)
The software needs to append a value to this property so that task processes are able to correctly run MATLAB. This property is passed as part of the job metadata given to Hadoop during job submission.

This property is mapred.child.env, which controls environment variables for the job's task processes.

You must provide necessary information to the parallel.cluster.Hadoop object in the MATLAB client session. For example, see Run mapreduce on a Hadoop Cluster (Parallel Computing Toolbox) and Use Tall Arrays on a Spark Enabled Hadoop Cluster (Parallel Computing Toolbox).

*MAIN QUESTION IS .....

1. We are not able to see any Cluster Configuration in Home->Parallel->Manage_Cluster_Profile ???

2.what is Role of MJS in this and how to configure it.

3.we are not understanding what we have to export and import for rest of the worker (slave )and master node* ??

Thanks

댓글 수: 2
없음 표시 없음 숨기기

Walter Roberson 2017년 5월 7일

Which MATLAB release are you using?

For me, in R2017a, look in the MATLAB toolbar, in the column that has "Preferences" at the top and "Set Path" below that. The column is between the Layout and the Add-Ons columns. The third item in the column is Parallel; click on that and it is a menu that includes Manage Cluster Profiles.

Pulkesh Haran 2017년 5월 10일

I am using R2016b Matlab and Hadoop-2.7.2. We did not finding any info there (on Path you Mention)..

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Kojiro Saito 2017년 5월 8일

0 개 추천

When integrating with Hadoop, MATLAB does not use a cluster profile. So, it's not an issue that Hadoop cluster profile is not listed in "Manage Cluster Profiles".
When integrating with Hadoop, MJS is not used. MATLAB uses Hadoop's job scheduler, so you don't need to configure in MATLAB side.
For the rest of workers and nodes, I don't think you need to export and import.

If you have any question, please ask us.

댓글 수: 13
이전 댓글 11개 표시 이전 댓글 11개 숨기기

Pulkesh Haran 2017년 5월 14일

MATLAB Online에서 열기

This Problem we are Facing while Running our Matlab Script on Cluster for 50K images.

We are Creating Sequence File for 50 thousand images. * I am attaching Matlab Code and other details. Please help us doing same.*

------------------------------- Matlab Error1--------------------------------------------

Parallel mapreduce execution on the Hadoop cluster:

******************************

MAPREDUCE PROGRESS *******************************

Map 0% Reduce 0%

Map 1% Reduce 0%

Map 33% Reduce 0%

Map 93% Reduce 0%

Error using mapreduce (line 118)

Unable to read MAT-file /tmp/tp1ce5fe8e_0189_4e64_85a3_b671c61453a4/task_0_675_MAP_4.mat: not a binary MAT-file.

Error in create_seq (line 101)

seqds = mapreduce(imageDS, @identityMap, @identityReduce,'OutputFolder',output_identity);

Error -2

> whos -file '/home/nitw_viper_user/task_0_1081_MAP_4.mat'

Name       Size            Bytes  Class         Attributes
Error      1x1              2336  MException

>> Error

Error =

MException with properties:
    identifier: 'parallel:internal:DeserializationException'
       message: 'Deserialization threw an exception.'
         cause: {0×1 cell}
         stack: [3×1 struct]

Error -3

>> create_seq

Hadoop with properties:
                 HadoopInstallFolder: '/home/nitw_viper_user/hadoop-

2.7.2'

             HadoopConfigurationFile: ''
                  SparkInstallFolder: ''
                    HadoopProperties: [2×1 containers.Map]
                     SparkProperties: [0×1 containers.Map]
                   ClusterMatlabRoot: '/usr/local/MATLAB/R2016b'
    RequiresMathWorksHostedLicensing: 0
                       LicenseNumber: ''
                     AutoAttachFiles: 1
                       AttachedFiles: {}
                    AdditionalPaths: {}

Parallel mapreduce execution on the Hadoop cluster:

****************************** * MAPREDUCE PROGRESS *******************************

Map 0% Reduce 0%

Map 1% Reduce 0%

Map 2% Reduce 0%

Map 22% Reduce 0%

Map 40% Reduce 0%

Map 80% Reduce 0%

Error using mapreduce (line 118)

The HADOOP job failed to complete.

Error in create_seq (line 101)

seqds = mapreduce(imageDS, @identityMap, @identityReduce,'OutputFolder',output_identity);

Caused by:

    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
   Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.

>>

date:07-05-2017

[WARN] BlockReaderFactory - I/O error constructing remote block reader. <java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461>java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461

at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.ch eckBlockOpStatus(DataTransferProtoUtil.java:140)

at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockRea der2.java:456)

at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockR eader2.java:424)

at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockR eaderFactory.java:818)

at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp (BlockReaderFactory.java:697)

at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.ja va:355)

at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java :656)

at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream .java:882)

at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)

at java.io.DataInputStream.read(Unknown Source)

at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)

at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)

at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)

[WARN] DFSClient - Failed to connect to /192.168.193.177:50010 for block, add to deadNodes and continue. java.io.IOException: Got

error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461 <java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461>java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461

at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.ch eckBlockOpStatus(DataTransferProtoUtil.java:140)

at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockRea der2.java:456)

at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockR eader2.java:424)

at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockR eaderFactory.java:818)

at o

rg.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp( BlockReaderFactory.java:697)

at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(Unknown Source) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769) at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)

[INFO] DFSClient - Successfully connected to /192.168.193.167:50010 for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 Error using mapreduce (line 118) Unable to read MAT-file /tmp/tp3c745940_f508_4326_93f7_fc5f6fb9ef06/task_0_805_MAP_1.mat: not a binary MAT-file.

Error in main (line 270) res = mapreduce(seqds, @Ltrp_db1_seq_file_mapper, @Ltrp_db1_reducer, 'OutputFolder', Ltrp_db1_seq_file_result); bold

Pulkesh Haran 2017년 5월 22일

This is log of Hadoop Failed Job :