MapReduce开启压缩算法对结果进行压缩

 

方法①:在代码中进行压缩设置

map阶段中添加

Configuration configuration = new Configuration();
configuration.set("mapreduce.map.output.compress","true");
configuration.set("mapreduce.map.output.compress.codec","org.apache.hadoop.io.compress.SnappyCodec");

reduce阶段中添加

configuration.set("mapreduce.output.fileoutputformat.compress","true");
configuration.set("mapreduce.output.fileoutputformat.compress.type","RECORD");
configuration.set("mapreduce.output.fileoutputformat.compress.codec","org.apache.hadoop.io.compress.SnappyCodec"); 

方法②:全局MapReduce压缩配置

修改配置文件mapred-site.xml

map输出数据进行压缩

<property>
          <name>mapreduce.map.output.compress</name>
          <value>true</value>
</property>
<property>
         <name>mapreduce.map.output.compress.codec</name>
         <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

reduce输出数据进行压缩

<property>       
    <name>mapreduce.output.fileoutputformat.compress</name>
     <value>true</value>
</property>
<property>         
     <name>mapreduce.output.fileoutputformat.compress.type</name>
     <value>RECORD</value>
</property>
 <property>        
      <name>mapreduce.output.fileoutputformat.compress.codec</name>
      <value>org.apache.hadoop.io.compress.SnappyCodec</value> 
</property>

修改完毕后重启集群

hadoop支持的压缩算法

压缩格式 工具 算法 文件扩展名 是否可切分
DEFLATE DEFLATE .deflate
Gzip gzip DEFLATE .gz
bzip2 bzip2 bzip2 .bz2
LZO lzop LZO .lzo
LZ4 LZ4 .lz4
Snappy Snappy .snappy
点赞

发表评论

电子邮件地址不会被公开。必填项已用 * 标注