org.apache.avro.mapreduce
Class AvroJob

java.lang.Object
  extended by org.apache.avro.mapreduce.AvroJob

public final class AvroJob
extends Object

Utility methods for configuring jobs that work with Avro.

When using Avro data as MapReduce keys and values, data must be wrapped in a suitable AvroWrapper implementation. MapReduce keys must be wrapped in an AvroKey object, and MapReduce values must be wrapped in an AvroValue object.

Suppose you would like to write a line count mapper that reads from a text file. If instead of using a Text and IntWritable output value, you would like to use Avro data with a schema of "string" and "int", respectively, you may parameterize your mapper with AvroKey<CharSequence> and AvroValue<Integer> types. Then, use the setMapOutputKeySchema() and setMapOutputValueSchema() methods to set writer schemas for the records you will generate.


Field Summary
static String CONF_OUTPUT_CODEC
          The configuration key for a job's output compression codec.
 
Method Summary
static org.apache.avro.Schema getInputKeySchema(org.apache.hadoop.conf.Configuration conf)
          Gets the job input key schema.
static org.apache.avro.Schema getInputValueSchema(org.apache.hadoop.conf.Configuration conf)
          Gets the job input value schema.
static org.apache.avro.Schema getMapOutputKeySchema(org.apache.hadoop.conf.Configuration conf)
          Gets the map output key schema.
static org.apache.avro.Schema getMapOutputValueSchema(org.apache.hadoop.conf.Configuration conf)
          Gets the map output value schema.
static org.apache.avro.Schema getOutputKeySchema(org.apache.hadoop.conf.Configuration conf)
          Gets the job output key schema.
static org.apache.avro.Schema getOutputValueSchema(org.apache.hadoop.conf.Configuration conf)
          Gets the job output value schema.
static void setInputKeySchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)
          Sets the job input key schema.
static void setInputValueSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)
          Sets the job input value schema.
static void setMapOutputKeySchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)
          Sets the map output key schema.
static void setMapOutputValueSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)
          Sets the map output value schema.
static void setOutputKeySchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)
          Sets the job output key schema.
static void setOutputValueSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema schema)
          Sets the job output value schema.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CONF_OUTPUT_CODEC

public static final String CONF_OUTPUT_CODEC
The configuration key for a job's output compression codec. This takes one of the strings registered in CodecFactory

See Also:
Constant Field Values
Method Detail

setInputKeySchema

public static void setInputKeySchema(org.apache.hadoop.mapreduce.Job job,
                                     org.apache.avro.Schema schema)
Sets the job input key schema.

Parameters:
job - The job to configure.
schema - The input key schema.

setInputValueSchema

public static void setInputValueSchema(org.apache.hadoop.mapreduce.Job job,
                                       org.apache.avro.Schema schema)
Sets the job input value schema.

Parameters:
job - The job to configure.
schema - The input value schema.

setMapOutputKeySchema

public static void setMapOutputKeySchema(org.apache.hadoop.mapreduce.Job job,
                                         org.apache.avro.Schema schema)
Sets the map output key schema.

Parameters:
job - The job to configure.
schema - The map output key schema.

setMapOutputValueSchema

public static void setMapOutputValueSchema(org.apache.hadoop.mapreduce.Job job,
                                           org.apache.avro.Schema schema)
Sets the map output value schema.

Parameters:
job - The job to configure.
schema - The map output value schema.

setOutputKeySchema

public static void setOutputKeySchema(org.apache.hadoop.mapreduce.Job job,
                                      org.apache.avro.Schema schema)
Sets the job output key schema.

Parameters:
job - The job to configure.
schema - The job output key schema.

setOutputValueSchema

public static void setOutputValueSchema(org.apache.hadoop.mapreduce.Job job,
                                        org.apache.avro.Schema schema)
Sets the job output value schema.

Parameters:
job - The job to configure.
schema - The job output value schema.

getInputKeySchema

public static org.apache.avro.Schema getInputKeySchema(org.apache.hadoop.conf.Configuration conf)
Gets the job input key schema.

Parameters:
conf - The job configuration.
Returns:
The job input key schema, or null if not set.

getInputValueSchema

public static org.apache.avro.Schema getInputValueSchema(org.apache.hadoop.conf.Configuration conf)
Gets the job input value schema.

Parameters:
conf - The job configuration.
Returns:
The job input value schema, or null if not set.

getMapOutputKeySchema

public static org.apache.avro.Schema getMapOutputKeySchema(org.apache.hadoop.conf.Configuration conf)
Gets the map output key schema.

Parameters:
conf - The job configuration.
Returns:
The map output key schema, or null if not set.

getMapOutputValueSchema

public static org.apache.avro.Schema getMapOutputValueSchema(org.apache.hadoop.conf.Configuration conf)
Gets the map output value schema.

Parameters:
conf - The job configuration.
Returns:
The map output value schema, or null if not set.

getOutputKeySchema

public static org.apache.avro.Schema getOutputKeySchema(org.apache.hadoop.conf.Configuration conf)
Gets the job output key schema.

Parameters:
conf - The job configuration.
Returns:
The job output key schema, or null if not set.

getOutputValueSchema

public static org.apache.avro.Schema getOutputValueSchema(org.apache.hadoop.conf.Configuration conf)
Gets the job output value schema.

Parameters:
conf - The job configuration.
Returns:
The job output value schema, or null if not set.


Copyright © 2009-2013 The Apache Software Foundation. All Rights Reserved.