How to solve ' All the input files must be Sequence files. Invalid file: '' '

Question

Jingyu Ru 2015년 8월 16일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/234017-how-to-solve-all-the-input-files-must-be-sequence-files-invalid-file

댓글: Jingyu Ru 2015년 8월 20일

채택된 답변: Rick Amos

MATLAB Online에서 열기

Hadoop version 1.2.1 Matlab version 2015a

Linux ubuntu 14.

I install Hadoop with a small cluster(one master and one slave).

It is success to run the example 'wordcount'in Hadoop.

And it is success to read the data from the HDFS through the Matlab.

But when I try to run the example in Matlab 'Run mapreduce on a Hadoop Cluster',I failed.

It shows that,

run_mapreduce_on_a_hadoop
ans = 
      ArrDelay
      ________
       8      
       8      
      21      
      13      
       4      
      59      
       3      
      11      
Parallel mapreduce execution on the Hadoop cluster:
********************************
*      MAPREDUCE PROGRESS      *
********************************
Map   0% Reduce   0%
Map 100% Reduce  33%
Map 100% Reduce  71%
Map 100% Reduce 100%
Error using mapreduce (line 100)
All the input files must be Sequence files.
Invalid file: ''
Error in run_mapreduce_on_a_hadoop (line 24)
meanDelay = mapreduce(ds,@meanArrivalDelayMapper,@meanArrivalDelayReducer,mr,...
There is my Matlab codes
setenv('HADOOP_HOME','/usr/local/hadoop');
cluster = parallel.cluster.Hadoop;
cluster.HadoopProperties('mapred.job.tracker') = 'ubuntu:50031';
cluster.HadoopProperties('fs.default.name') = 'hdfs://ubuntu:8020';
outputFolder = '/home/rjy/logs/hadooplog';
mr = mapreducer(cluster);
ds = datastore('airlinesmall.csv','TreatAsMissing','NA','SelectedVariableNames','ArrDelay','ReadSize',1000);
preview(ds)
meanDelay = mapreduce(ds,@meanArrivalDelayMapper,@meanArrivalDelayReducer,mr,...
              'OutputFolder',outputFolder);

What's ' All the input files must be Sequence files. Invalid file: '' ' mean?

I have never seen it before. I just copy the code in the Matlab document.

I wish to know how to solve this problem. I have tried many methods to solve it.

Please give me some suggestions. Thanks.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Rick Amos 2015년 8월 17일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/234017-how-to-solve-all-the-input-files-must-be-sequence-files-invalid-file#answer_189537

MATLAB Online에서 열기

This error message occurred because MATLAB could not find the output files generated by the Hadoop Job. For now, this error message should be treated as equivalent to MATLAB erroring that it could not find the output of the Hadoop Job. To resolve this error, make sure the output folder is a location that can be accessed by both your local machine and the Hadoop cluster.

I see that outputFolder is in "/home/rjy/logs/hadooplog". For MATLAB, this points to the home folder on your machine and is likely not to be accessible by the Hadoop cluster. As an alternative, could you try:

outputFolder = 'hdfs://ubuntu:8020/home/rjy/out';

This location is guaranteed to be accesible by both your local machine and the Hadoop cluster.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Jingyu Ru 2015년 8월 20일

Thank you very much to help me slove this problem! I really really appreciate it!

댓글을 달려면 로그인하십시오.

How to solve ' All the input files must be Sequence files. Invalid file: '' '

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

How to solve ' All the input files must be Sequence files. Invalid file: '' '

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기