第3章Hadoop基础操作

Imagemap
第3章Hadoop基础操作任务前言学习目标(1)掌握查看存储系统的基本信息。(2)掌握查看Hadoop集群的计算资源。(3)掌握HDFS文件系统的基本操作。(4)掌握以hadoop jar方式提交MapReduce任务。(5)能够管理MapReduce多任务。任务背景实例:统计用户登录次数使用mysql统计一天需要6分10秒统计一月数据,mysql无法满足Hadoop可以解决该统计任务任务指导http://bigdata.hddly.cn/b37066/B37066L03 ...学习视频在笔记本上操作Hadoop3.x集群查看Hadoop集群基本信息http://i.hddly.cn/media/wcvjxKqwGT.mp4HDFS基本操作练习http://i.hddly.cn/media/cWULcNfnWz.mp4http://i.hddly.cn/media/pfKzq3bCTg.mp4执行MapReduce任务http://i.hddly.cn/media/VNhj5aifJW.mp4任务3.1查看Hadoop集群基本信息Hadoop集群的核心功能分布式存储HDFS查看方式命令方式浏览器方式并行计算计算资源分布在集群的各个节点上通过ResourceManager与NodeManger协同调配操作终端机hosts修改说明:
在windows文件中找到System32-->drivers-->et ...192.168.137.134 slave1
192.168.137.135 s ...查询集群的存储系统信息HDFSHadoop3.xWeb方式HDFS监控http://192.168.137.133:9870/ 统计信息Configured Capacity:        89.94 GBDFS Remaining:        81.29 GB (90.39%)Non DFS Used:        8.65 GBDFS Used:        16 KB (0%)Live Nodes        2 (Decommissioned: 0,  ...命令行方式hdfs dfsadmin -reporthdfs dfsadmin -report -live|dead|decommi ...块大小:128M查询 集群的计算资源信息YARNHadoop3.xWeb方式YARN监控http://192.168.137.133:8088/统计信息Active Nodes:3Reserved Resources:<memory:0 B, vCores:0 ...Total Resources:<memory:3 GB, vCores:3>查询监控日志JOBWeb方式http://10.255.10.65:19888/任务3.2上传文件到HDFS目录了解HDFS文件系统 查看HDFS文件或目录http://192.168.137.133:9870/用户电脑e:/data.txt->集群服务器节点/root/hadoop/->H ...掌握HDFS的基本操作说明:以下内容中myname都要替换为本人姓名全拼获取测试数据说明:
在外网:wget https://hddly.oss-cn-hangzh ...cd /root/hadooopwget https://hddly.oss-cn-hangzhou.aliyu ...tar -zxvf ./email_log.tar.gz创建hdfs目录(myname替换为本人)hdfs dfs -mkdir -p /user/myname上传文件hdfs dfs -copyFromLocal email_log.txt /u ...hdfs dfs -moveFromLocal email_log.txt /u ...hdfs dfs -put email_log.txt /user/myname ...下载文件hdfs dfs -copyToLocal /user/myname/email ...hdfs dfs -get /user/myname/email_log.txt ...查看文件hdfs dfs -cat /user/myname/email_log.txt ...hdfs dfs -tail /user/myname/email_log.tx ...删除文件或目录hdfs dfs -rm 文件hdfs dfs -rmdir 目录查看目录hdfs dfs -ls 目录hdfs dfs -ls /user/mynamehdfs dfs -ls -R 目录hdfs dfs -ls -R /user/mynameHDFS基本操作练习说明:以下内容中myname都要替换为本人姓名全拼在Hdfs中创建私人目录要求:请通过命令行方式在本小组集群的HDFS文件系统中
创建目录: /user/ ...hdfs dfs -mkdir -p /user/myname文件上传下载查看操作要求:通过命令行方式实现:
1,下载文件email_log.txt(可通过:wg ...命令集cd /root/hadoop
wget http://10.255.10.50 ...hdfs dfs -copyFromLocal email_log.txt /u ...hdfs dfs -put email_log.txt /user/myname ...hdfs dfs -get /user/myname/1.txt ./hdfs dfs -moveFromLocal ./1.txt /user/my ...hdfs dfs -tail /user/myname/2.txthdfs dfs -cat /user/myname/2.txtCtrl+c 退出cat内容查看文件或目录删除操作要求 :通过命令行方式实现:
1)在HDFS文件系统中本组员的私人目录下创建tm ...命令集hdfs dfs -mkdir -p /user/myname/tmphdfs dfs -put email_log.txt /user/myname ...hdfs dfs -rm /user/myname/tmp/2.txthdfs dfs -rmdir /user/myname/tmp了解HDFS的高级操作删除非空目录hdfs dfs -rm -r 目录例:hdfs dfs -rm -r /user/myname/tmp查目录的空间使用hdfs dfs -du 目录例:hdfs dfs -du /user/myname合并hdfs文件-getmerge将源目录和目标文件作为输入,并将src中的文件连接到目标本地文件(把两个文件的内 ...过滤内容-grep 从hdfs上过滤包含某个字符的行内容
Usage:hdfs dfs -cat < ...hdfs dfs -cat /user/yuxm/output_music_da ...创建快照设置允许快照:hdfs dfsadmin -allowSnapshot /user/mynam ...创建快照hdfs dfs -createSnapshot /user/myname my ...快照恢复先删除测试文件hdfs dfs -rm /user/myname/2.txt再快照恢复hdfs dfs -cp /user/myname/.snapshot/myna ...任务3.3运行首个MapReduce任务任务描述要求对/usr/root/email_log.txt文件进行计算处理,统计出用户 ...使用Hadoop官方提供的示例包中的词频统计模块了解Hadoop官方示例程序包hadoop-mapreduce-exampleswordcount:词频统计pi:估算圆周率wordmean:单词平均长度wordmedian:单词长度中位数MapReduce任务相关配置vi /etc/profile,添加环境变量:export HADOOP_CLA ...改后/etc/profile内容如:http://i.hddly.cn/medi ...生效环境变量: source /etc/profilevi /usr/local/hadoop-3.3.1/etc/hadoop/ma ...<?xml version="1.0"?>
<?xml-stylesheet t ...提交MapReduce任务给集群运行cd /usr/local/hadoop-3.3.1/share/hadoop/ ...hadoop jar ./hadoop-mapreduce-examples-3 ...任务3.4管理多个MapReduce任务任务描述查询多个任务的进展可以中断当前的作业和查询指定的日志文件执行MapReduce任务执行估算PI值的任务hadoop jar ./hadoop-mapreduce-examples-3 ...或 hadoop jar ./hadoop-mapreduce-examples ...词频统计 hadoop jar ./hadoop-mapreduce-examples- ...计算单词平均长度hadoop jar ./hadoop-mapreduce-examples-3 ...result:
count	8000000
length	210379675
计算单词长度中位数一般地,n个数据按大小顺序排列,处于最中间位置的一个数据(或最中间位置的两个数据 ...hadoop jar ./hadoop-mapreduce-examples-3 ...计算单词长度标准差hadoop jar ./hadoop-mapreduce-examples-3 ...result:
count	8000000
length	210379675
s ...其它参考在笔记本上操作Hadoop3.x集群集群环境网络配置主机master192.168.137.100从机slave:192.168.137.101192.168.137.102192.168.137.103Hadoop版本Hadoop 3.3.1查看Hadoop集群基本信息操作终端机hosts修改说明:
在windows文件中找到System32-->drivers-->et ...192.168.137.100 master
192.168.137.101 s ...查询集群的存储系统信息HDFSWeb方式http://master:9870/ 命令行方式hdfs dfsadmin -reporthdfs dfsadmin -report -live|dead|decommi ...查询 集群的计算资源信息YARNWeb方式http://master:8088/查询监控日志JOBWeb方式`HDFS基本操作练习说明:以下内容中myname都要替换为本人姓名全拼在Hdfs中创建私人目录要求:请通过命令行方式在本小组集群的HDFS文件系统中
创建目录: /user/ ...hdfs dfs -mkdir -p /user/myname文件上传下载查看操作要求:通过命令行方式实现:
1,下载并解压文件email_log.txt:
2) ...命令集cd /root/hadooop
wget https://hddly.oss- ...hdfs dfs -copyFromLocal email_log.txt /u ...hdfs dfs -put email_log.txt /user/myname ...hdfs dfs -get /user/myname/1.txt ./hdfs dfs -moveFromLocal ./1.txt /user/my ...hdfs dfs -tail /user/myname/2.txthdfs dfs -cat /user/myname/2.txtCtrl+c 退出cat内容查看文件或目录删除操作要求 :通过命令行方式实现:
1)在HDFS文件系统中本组员的私人目录下创建tm ...命令集hdfs dfs -mkdir -p /user/myname/tmphdfs dfs -put email_log.txt /user/myname ...hdfs dfs -rm /user/myname/tmp/2.txthdfs dfs -rmdir /user/myname/tmp执行MapReduce任务说明:以下内容中myname都要替换为本人姓名全拼cd /usr/local/hadoop-3.3.1/share/hadoop/ ...词频统计 hadoop jar ./hadoop-mapreduce-examples- ...执行估算PI值的任务hadoop jar ./hadoop-mapreduce-examples-3 ...计算单词平均长度hadoop jar ./hadoop-mapreduce-examples-3 ...计算单词长度中位数一般地,n个数据按大小顺序排列,处于最中间位置的一个数据(或最中间位置的两个数据 ...hadoop jar ./hadoop-mapreduce-examples-3 ...计算单词长度标准差hadoop jar ./hadoop-mapreduce-examples-3 ...查看任务运行情况http://master:8088管理任务,中断任务在 http://master:8088 ->job任务中 ->kill App ...Hadoop2.X集群信息查询集群的存储系统信息HDFSHadoop2.xWeb方式HDFS监控http://192.168.31.11:50070/统计信息Configured Capacity:        89.94 GBDFS Remaining:        81.29 GB (90.39%)Non DFS Used:        8.65 GBDFS Used:        16 KB (0%)Live Nodes        2 (Decommissioned: 0,  ...命令行方式hdfs dfsadmin -reporthdfs dfsadmin -report -live|dead|decommi ...块大小:128M查询 集群的计算资源信息YARNHadoop2.xWeb方式http://192.168.31.11:8088/统计信息Active Nodes:3Reserved Resources:<memory:0 B, vCores:0 ...Total Resources:<memory:3 GB, vCores:3>查询监控日志JOBWeb方式http://10.255.10.65:19888/Hdoop2.XMapRedue空间不足处理现象:在多次上传到Hdfs和下载文件到本地时出现no space情况处理:当前Linux至少分配50G空间,但实际使用为10G,可以扩大步骤1:创建分区vda3fdisk /dev/vda
进入后输入:n
然后选择主分区:p
使用默认分区号 ...partprobe使用分区更改立即生效步骤2:创建物理卷pvcreate /dev/vda3步骤3: 扩展卷组vgextend centos /dev/vda3步骤4:扩展逻辑卷lvextend -L +10G /dev/centos/rootxfs_growfs /dev/mapper/centos-rootWeb方式操作HDFS权限不足方法一:设置Hdfs上/user目录权限hdfs dfs -chmod -R 777 /user若要 rwx 属性则 4+2+1=7;
若要 rw- 属性则 4+2=6;
若要 ...方法二:配置文件中增加配置cd /usr/local/hadoop-3.3.1/etc/hadoop配置core-site.xml<!-- 当前用户全设置成root -->
<property>
<name>h ...配置hdfs-site.xml<property> 
<!-- 是否在HDFS中开启权限检查。-->
<nam ...常见问题WARN hdfs.DataStreamer: Exception in cre ...详细错误2022-02-09 08:06:22,360 WARN hdfs.DataSt ...解决:关闭防火墙systemctl disable firewalldservice stop firewalldWARN hdfs.DataStreamer: Exception in cre ...详细错误2022-02-09 08:06:22,540 WARN hdfs.DataSt ...解决:关闭防火墙 Cannot allocate containers as requested ...vi yarn-site.xml        <name>yarn.nodemanager.resource. ...conf.Configuration: resource-types.xml n ...Disabling Erasure Coding for path: /tmp/ ...解决:添加内存到3G,CPU增加到2*2报hadoop路径配置问题<property>
  <name>yarn.app.mapreduce.am ...vi /usr/local/hadoop-3.3.1/etc/hadoop/ma ...output目录已存在异常INFO client.DefaultNoHARMFailoverProxyPr ...删除文件时报异常
Cannot delete /user/root/mongo. ...在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式 ...安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根 ...关闭Hadoop的安全模式bin/hadoop dfsadmin -safemode leave进入安全模式namenode日志报异常INFO org.apache.hadoop.ipc.Server: IPC S ...进入安全模式原因The minimum number of live datanodes is  ... 因磁盘空间不足,内存不足,系统掉电等其他原因导致dataNode databl ...关闭Hadoop的安全模式bin/hadoop dfsadmin -safemode leave[root@c100 hadoop-3.3.1]# bin/hadoop dfs ... 步骤 1     执行命令退出安全模式: hdfs dfsadmin -saf ...[root@c100 logs]# hdfs dfsadmin -safemod ...突然断电导致数据存储文件丢失Directory /data/hadoop/hdfs/name is in a ...处理, vi ./core-site.xml<property>  
    <name>hadoop.tmp.dir</n ...该方法处理成功在master主机上重新格式化namenodecd /usr/local/hadoop-3.3.1/bin/ ./hdfs n ...查看集群新的集群IDcd /data/hadoop/hdfs/name/currentmore VERSION [root@c56 current]# more VERSION 
#Thu A ...clusterID=CID-3302445b-82c0-40e4-a199-12 ...进入各从机,将从机VERSION的 clusterID改成新集群的cluster ... 使用Eclipse脚本执行上传大文件异常:
DataStreamer: Dat ... Failed to execute 'send' on 'XMLHttpReq ...在查阅wordcount的运行结果时,出现此错误 解决1:hdfs-site.xml<property>
    <name>dfs.webhdfs.enabled ...解决2:在C:\Windows\System32\drivers\etc\hos ... cat: Unable to write to output stream.cat: Unable to write to output stream. Invalid resource request! Cannot alloca ...在集群linux下运行任务: hadoop jar ./hadoop-mapre ...任务报错信息:http://i.hddly.cn/media/SecureCRT_jDbUrg ...java.io.IOException: org.apache.hadoop.y ...错误原因Yarn部署到小内存主机上,默认的配置会由于资源不足导致简单的任务也无法执行成功应对措施配置在2G的内存的主机mapred-site.xml <property>
        <name>yarn.app.mapre .../yarn-site.xml <property>  
        <name>yarn.nodemana ...<property> 
    <name>mapreduce.map.memo ...运行mapreduce程序,网页监控不到任务有linux上执行mr任务有,在eclipse上执行mr任务没有解决1:yarn-site.xml<property>
        <name>yarn.nodemanage ...解决2:检查内网,外网是否有相同的域名,如 home.hddly.cn是否在内网也存在已知暂无解问题跨有线和无线的网络共同搭建Hadoop集群
存在的问题:
主机Master在有线 ...在WEB端datanode页的node处hdfs dfsadmin -printTopology结果如下:
[root@master sbin]# hdfs dfsadmin  ...实训查找大文件并进词频统计查大文件find / -size -5000M -size +500M/var/log/messages-20220208上传文件hdfs dfs -put /var/log/messages-20220208 ...词频统计cd /usr/local/hadoop-3.3.1/share/hadoop/ ...hadoop jar ./hadoop-mapreduce-examples-3 ...使用hadoop日志cd /usr/local/hadoop-3.3.1/logsls -lrt上传Hadoop日志hdfs dfs -put ./hadoop-root-nodemanager- ...词频统计cd /usr/local/hadoop-3.3.1/share/hadoop/ ...hadoop jar ./hadoop-mapreduce-examples-3 ...实训1统计文件中所有单词的平均长度训练要点掌握HDFS的基本操作掌握提交MapReduce 任务掌握对MapReduce任务的查询与中断需求说明在集群服务器的本地目录hadoop-root-nodemanager-*.log ...实现思路及步骤上传文件hadoop-root-nodemanager-*.log到 hdfs的 ...使用官方的/hadoop-mapreduce-examples-3.3.1.ja ...查看输出结果:查看hdfs:/user/myname/output_nodema ...作业要求1,环境说明:本小组主机:,本小组成员机:,本成员机:2,在http://master:9870上拍照截取本小组集群中本成员目录下/u ...3,在linux本组员的虚拟机上,截图运行 mr任务的命令行,以及运行结果截图, ...4,在http://master:8088上拍照截取本组员运行的任务记录行,和任 ...5,查看http://master:9870的文件/user/myname/ou ...实训2查询与中断MapReduce任务训练要点掌握查询 MapReduce任务信息掌握查询集群的计算资源信息掌握中断执行中的MapReduce任务需求说明在集群服务器的目录:本地目录/usr/local/hadoop-3.3.1/lo ...实现思路及步骤上传日志文件hadoop-root-*.log到 hdfs的:/user/myn ...hdfs dfs -put /usr/local/hadoop-3.3.1/lo ...使用CRT分3次打开本组员服务器,这样有3个Tab页同时连接到同个服务器使用官方的/hadoop-mapreduce-examples-3.3.1.ja ...cd /usr/local/hadoop-3.3.1/share/hadoop/ ...hadoop jar ./hadoop-mapreduce-examples-3 ...hadoop jar ./hadoop-mapreduce-examples-3 ...hadoop jar ./hadoop-mapreduce-examples-3 ...进入http://master:8088站点,打开MapReduce任务列表观察任务(wordcount、wordmean、wordmedian)任务运行情 ...找到第2个任务wordmean的任务,进入该任务详细信息,然后中断它作业要求1,环境说明:本小组主机:,本小组成员机:,本成员机:2,在linux本组员的虚拟机上,截图运行 mr任务(wordcount、wor ...3,在http://master:8088上拍照截取本组员运行的3个任务记录详细 ...
hide
第3章Hadoop基础操作
hide
任务3.2上传文件到HDFS目录
hide
HDFS基本操作练习
hide
任务3.3运行首个MapReduce任务
hide
其它参考
hide
在笔记本上操作Hadoop3.x集群
Arrow Link
hide
HDFS基本操作练习
hide
常见问题
hide
进入安全模式
hide
步骤 1 执行命令退出安全模式: hdfs dfsadmin -safemode leave
步骤 2 执行健康检查,删除损坏掉的block。 hdfs fsck / -delete
leaf
[root@c100 logs]# hdfs dfsadmin -safemode leave
Safe mode is OFF
[root@c100 logs]# hdfs fsck / -delete
Connecting to namenode via http://c100:9870/fsck?ugi=root&delete=1&path=%2F
FSCK started by root (auth:SIMPLE) from /10.255.10.100 for path / at Thu Apr 28 02:54:52 EDT 2022


/user/chenyanfang/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/chenyanfang/2.txt: MISSING 2 blocks of total size 218379675 B.
/user/chenyanfang/email_log.txt: MISSING 2 blocks of total size 218379675 B.
/user/liuchenling/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/liuchenling/email_log.txt: MISSING 2 blocks of total size 218379675 B.
/user/root/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/yeying/1.txt: MISSING 2 blocks of total size 218379675 B.
/user/yeying/email_log.txt: MISSING 2 blocks of total size 218379675 B.
Status: CORRUPT
Number of data-nodes: 0
Number of racks: 0
Total dirs: 15
Total symlinks: 0

Replicated Blocks:
Total size: 1747037400 B
Total files: 8
Total blocks (validated): 16 (avg. block size 109189837 B)
********************************
UNDER MIN REPL'D BLOCKS: 16 (100.0 %)
MINIMAL BLOCK REPLICATION: 1
CORRUPT FILES: 8
MISSING BLOCKS: 16
MISSING SIZE: 1747037400 B
********************************
Minimally replicated blocks: 0 (0.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 0.0
Missing blocks: 16
Corrupt blocks: 0
Missing replicas: 0
Blocks queued for replication: 0

Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
Blocks queued for replication: 0
FSCK ended at Thu Apr 28 02:54:53 EDT 2022 in 396 milliseconds


The filesystem under path '/' is CORRUPT
[root@c100 logs]#
hide
Invalid resource request! Cannot allocate containers as requested resource
hide
实训
hide
实训2查询与中断MapReduce任务