MATLAB PARALLEL COMPUTING TOOLBOX - S Uživatelská příručka Strana 201

  • Stažení
  • Přidat do mých příruček
  • Tisk
  • Strana
    / 656
  • Tabulka s obsahem
  • ŘEŠENÍ PROBLÉMŮ
  • KNIHY
  • Hodnocené. / 5. Na základě hodnocení zákazníků
Zobrazit stránku 200
Run mapreduce on a Hadoop Cluster
6-61
Run mapreduce on a Hadoop Cluster
In this section...
“Cluster Preparation” on page 6-61
“Output Format and Order” on page 6-61
“Calculate Mean Delay” on page 6-61
Cluster Preparation
Before you can run mapreduce on a Hadoop
®
cluster, make sure that the cluster and
client machine are properly configured. Consult your system administrator, or see
“Configure a Hadoop Cluster”.
Output Format and Order
When running mapreduce on a Hadoop cluster with binary output (the default), the
resulting KeyValueDatastore points to Hadoop Sequence files, instead of binary MAT-
files as generated by mapreduce in other environments. For more information, see the
'OutputType' argument description on the mapreduce reference page.
When running mapreduce on a Hadoop cluster, the order of the key-value pairs in the
output is different compared to running mapreduce in other environments. If your
application depends on the arrangement of data in the output, you must sort the data
according to your own requirements.
Calculate Mean Delay
This example shows how modify the MATLAB example for calculating mean airline
delays to run on a Hadoop cluster.
First, you must set environment variables and cluster properties as appropriate for your
specific Hadoop configuration. See your system administrator for the values for these and
other properties necessary for submitting jobs to your cluster.
setenv('HADOOP_HOME','/share/hadoop/a2.2.0');
cluster = parallel.cluster.Hadoop;
cluster.HadoopProperties('mapred.job.tracker') = 'hadoophost1:50031';
cluster.HadoopProperties('fs.default.name') = 'hdfs://hadoophost2:8020';
Zobrazit stránku 200
1 2 ... 196 197 198 199 200 201 202 203 204 205 206 ... 655 656

Komentáře k této Příručce

Žádné komentáře