没有合适的资源?快使用搜索试试~ 我知道了~
18、MapReduce的计数器与通过MapReduce读取_写入数据库示例
网址:https://blog.csdn.net/chenwewi520feng/article/details/130454774
本文介绍MapReduce的计数器使用以及自定义计数器、通过MapReduce读取与写入数据库示例。
本文的前提依赖是hadoop可正常使用、mysql数据库中的表可用且有数据。
本文分为2个部分,即计数器与读写mysql数据库。
MapReduce是一种分布式计算模型,由Google开发,广泛应用于大数据处理。在MapReduce中,计数器(Counter)是一个非常重要的工具,它允许开发者在MapReduce作业执行过程中收集和跟踪各种统计信息,帮助理解和优化程序的性能。计数器分为两类:Hadoop内置的计数器和自定义计数器。
内置计数器主要由Hadoop框架提供,例如Map任务和Reduce任务的数量、输入和输出的数据量等。这些计数器在MapReduce作业的执行过程中自动更新,并在日志中打印出来,如上述日志所示。例如,“Total input files to process”表示处理的总输入文件数量,“number of splits”指示文件被分割成多少个块进行处理,“Running job”显示作业的状态等。
自定义计数器则是开发者根据实际需求创建的,用于跟踪特定任务的特定指标。开发者可以在Mapper或Reducer类中增加自定义计数器,然后在代码中增加计数器的值。这样,当作业完成后,可以通过查看计数器的值来分析程序的行为和性能。
接下来,我们将讨论如何通过MapReduce与数据库交互,尤其是MySQL数据库。在大数据场景下,有时需要将MapReduce处理的结果存储到关系型数据库中,或者从数据库中读取数据进行处理。Hadoop提供了JDBC(Java Database Connectivity)接口,使得MapReduce作业能够与数据库进行连接和操作。
要实现MapReduce读取数据库,首先需要在Mapper类中加载数据库驱动并建立连接。然后,可以在map()方法中使用SQL查询获取所需数据。在Reduce阶段,可以对数据进行进一步处理和聚合,最后将结果写入到数据库中。
对于写入数据库,通常在Reducer类的reduce()方法或cleanup()方法中进行,将处理后的数据转换为适合数据库存储的格式,然后通过JDBC API执行插入、更新或删除等操作。需要注意的是,由于MapReduce作业可能涉及大量的数据写入,因此需要考虑数据库的并发处理能力和性能优化策略。
总结一下,MapReduce的计数器提供了强大的监控和调试能力,而通过MapReduce与数据库的交互则扩展了大数据处理的应用场景。开发者可以根据需求利用计数器来优化作业性能,同时结合数据库操作实现更复杂的数据处理流程。在实际应用中,确保Hadoop环境和数据库的稳定运行是至关重要的,同时也要注意处理数据的安全性和一致性问题。

@TOC
本文介绍MapReduce的计数器使用以及自定义计数器、通过MapReduce读取与写入数据库示例。
本文的前提依赖是hadoop可正常使用、mysql数据库中的表可用且有数据。
本文分为2个部分,即计数器与读写mysql数据库。
一、计数器
1、Counter计数器介绍
在执行MapReduce程序的时候,控制台输出日志中通常有下面所示片段内容
Hadoop内置的计数器可以收集、统计程序运行中核心信息,帮助用户理解程序的运行情况,辅助用户
诊断故障
下面是示例性日志,介绍了计数器
一次map-reduce過程的日志
2022-09-15 16:21:33,324 WARN impl.MetricsConfig: Cannot locate configuration:
tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
2022-09-15 16:21:33,361 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot
period at 10 second(s).
2022-09-15 16:21:33,361 INFO impl.MetricsSystemImpl: JobTracker metrics system
started
2022-09-15 16:21:33,874 WARN mapreduce.JobResourceUploader: No job jar file set.
User classes may not be found. See Job or Job#setJar(String).
#目錄下的文件數量
2022-09-15 16:21:33,901 INFO input.FileInputFormat: Total input files to process
: 1
#maptask針對文件的切片數量
2022-09-15 16:21:33,920 INFO mapreduce.JobSubmitter: number of splits:1
#提交的任務編號
2022-09-15 16:21:33,969 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_local279925986_0001
2022-09-15 16:21:33,970 INFO mapreduce.JobSubmitter: Executing with tokens: []
#跟蹤job執行的鏈接
2022-09-15 16:21:34,040 INFO mapreduce.Job: The url to track the job:
http://localhost:8080/
#運行中的job
2022-09-15 16:21:34,040 INFO mapreduce.Job: Running job: job_local279925986_0001
2022-09-15 16:21:34,041 INFO mapred.LocalJobRunner: OutputCommitter set in
config null
2022-09-15 16:21:34,044 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 2
2022-09-15 16:21:34,044 INFO output.FileOutputCommitter: FileOutputCommitter
skip cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
2022-09-15 16:21:34,044 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2022-09-15 16:21:34,062 INFO mapred.LocalJobRunner: Waiting for map tasks
#開始map task
2022-09-15 16:21:34,062 INFO mapred.LocalJobRunner: Starting task:
attempt_local279925986_0001_m_000000_0

2022-09-15 16:21:34,072 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 2
2022-09-15 16:21:34,072 INFO output.FileOutputCommitter: FileOutputCommitter
skip cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
2022-09-15 16:21:34,077 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
2022-09-15 16:21:34,102 INFO mapred.Task: Using ResourceCalculatorProcessTree :
org.apache.hadoop.yarn.util.WindowsBasedProcessTree@638e9d1a
#輸入文件
2022-09-15 16:21:34,105 INFO mapred.MapTask: Processing split:
file:/D:/workspace/bigdata-component/hadoop/test/in/us-covid19-
counties.dat:0+136795
2022-09-15 16:21:34,136 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
#map task 100M 内存文件存儲空間
2022-09-15 16:21:34,136 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
# map task的内存存儲空間使用上限是80M
2022-09-15 16:21:34,136 INFO mapred.MapTask: soft limit at 83886080
2022-09-15 16:21:34,136 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2022-09-15 16:21:34,136 INFO mapred.MapTask: kvstart = 26214396; length =
6553600
2022-09-15 16:21:34,137 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2022-09-15 16:21:34,155 INFO mapred.LocalJobRunner:
2022-09-15 16:21:34,155 INFO mapred.MapTask: Starting flush of map output
2022-09-15 16:21:34,155 INFO mapred.MapTask: Spilling map output
2022-09-15 16:21:34,155 INFO mapred.MapTask: bufstart = 0; bufend = 114725;
bufvoid = 104857600
2022-09-15 16:21:34,155 INFO mapred.MapTask: kvstart = 26214396(104857584);
kvend = 26201420(104805680); length = 12977/6553600
#以上是將map task的内存存儲空間文件spill的過程
2022-09-15 16:21:34,184 INFO mapred.MapTask: Finished spill 0
2022-09-15 16:21:34,199 INFO mapred.Task:
Task:attempt_local279925986_0001_m_000000_0 is done. And is in the process of
committing
2022-09-15 16:21:34,200 INFO mapred.LocalJobRunner: map
2022-09-15 16:21:34,200 INFO mapred.Task: Task
'attempt_local279925986_0001_m_000000_0' done.
#map task的計數器
2022-09-15 16:21:34,204 INFO mapred.Task: Final Counters for
attempt_local279925986_0001_m_000000_0: Counters: 17
File System Counters
FILE: Number of bytes read=136992
FILE: Number of bytes written=632934
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=3245
Map output records=3245
Map output bytes=114725
Map output materialized bytes=121221
Input split bytes=140
Combine input records=0
Spilled Records=3245

Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=5
Total committed heap usage (bytes)=255328256
File Input Format Counters
Bytes Read=136795
2022-09-15 16:21:34,204 INFO mapred.LocalJobRunner: Finishing task:
attempt_local279925986_0001_m_000000_0
2022-09-15 16:21:34,205 INFO mapred.LocalJobRunner: map task executor complete.
2022-09-15 16:21:34,207 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2022-09-15 16:21:34,207 INFO mapred.LocalJobRunner: Starting task:
attempt_local279925986_0001_r_000000_0
2022-09-15 16:21:34,210 INFO output.FileOutputCommitter: File Output Committer
Algorithm version is 2
2022-09-15 16:21:34,210 INFO output.FileOutputCommitter: FileOutputCommitter
skip cleanup _temporary folders under output directory:false, ignore cleanup
failures: false
2022-09-15 16:21:34,210 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree
currently is supported only on Linux.
2022-09-15 16:21:34,239 INFO mapred.Task: Using ResourceCalculatorProcessTree :
org.apache.hadoop.yarn.util.WindowsBasedProcessTree@274c3e94
2022-09-15 16:21:34,240 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin:
org.apache.hadoop.mapreduce.task.reduce.Shuffle@5d892772
2022-09-15 16:21:34,241 WARN impl.MetricsSystemImpl: JobTracker metrics system
already initialized!
2022-09-15 16:21:34,248 INFO reduce.MergeManagerImpl: MergerManager:
memoryLimit=2639842560, maxSingleShuffleLimit=659960640,
mergeThreshold=1742296192, ioSortFactor=10, memToMemMergeOutputsThreshold=10
#EventFetcher 拉取map task的輸出
2022-09-15 16:21:34,249 INFO reduce.EventFetcher:
attempt_local279925986_0001_r_000000_0 Thread started: EventFetcher for fetching
Map Completion Events
2022-09-15 16:21:34,263 INFO reduce.LocalFetcher: localfetcher#1 about to
shuffle output of map attempt_local279925986_0001_m_000000_0 decomp: 121217 len:
121221 to MEMORY
2022-09-15 16:21:34,264 INFO reduce.InMemoryMapOutput: Read 121217 bytes from
map-output for attempt_local279925986_0001_m_000000_0
#合并reduce task從map task輸出文件拉取過來的文件
2022-09-15 16:21:34,265 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-
output of size: 121217, inMemoryMapOutputs.size() -> 1, commitMemory -> 0,
usedMemory ->121217
2022-09-15 16:21:34,265 INFO reduce.EventFetcher: EventFetcher is interrupted..
Returning
#reduce task拷貝文件
2022-09-15 16:21:34,266 INFO mapred.LocalJobRunner: 1 / 1 copied.
#將内存中的文件輸出到磁盤上
2022-09-15 16:21:34,266 INFO reduce.MergeManagerImpl: finalMerge called with 1
in-memory map-outputs and 0 on-disk map-outputs
2022-09-15 16:21:34,273 INFO mapred.Merger: Merging 1 sorted segments
2022-09-15 16:21:34,274 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 121179 bytes
2022-09-15 16:21:34,278 INFO reduce.MergeManagerImpl: Merged 1 segments, 121217
bytes to disk to satisfy reduce memory limit
2022-09-15 16:21:34,279 INFO reduce.MergeManagerImpl: Merging 1 files, 121221
bytes from disk

2022-09-15 16:21:34,279 INFO reduce.MergeManagerImpl: Merging 0 segments, 0
bytes from memory into reduce
2022-09-15 16:21:34,279 INFO mapred.Merger: Merging 1 sorted segments
2022-09-15 16:21:34,280 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 121179 bytes
2022-09-15 16:21:34,280 INFO mapred.LocalJobRunner: 1 / 1 copied.
2022-09-15 16:21:34,283 INFO Configuration.deprecation: mapred.skip.on is
deprecated. Instead, use mapreduce.job.skiprecords
2022-09-15 16:21:34,299 INFO mapred.Task:
Task:attempt_local279925986_0001_r_000000_0 is done. And is in the process of
committing
2022-09-15 16:21:34,301 INFO mapred.LocalJobRunner: 1 / 1 copied.
2022-09-15 16:21:34,301 INFO mapred.Task: Task
attempt_local279925986_0001_r_000000_0 is allowed to commit now
#reduce task的輸出文件位置
2022-09-15 16:21:34,306 INFO output.FileOutputCommitter: Saved output of task
'attempt_local279925986_0001_r_000000_0' to file:/D:/workspace/bigdata-
component/hadoop/test/out/covid/topn
2022-09-15 16:21:34,307 INFO mapred.LocalJobRunner: reduce > reduce
2022-09-15 16:21:34,307 INFO mapred.Task: Task
'attempt_local279925986_0001_r_000000_0' done.
2022-09-15 16:21:34,307 INFO mapred.Task: Final Counters for
attempt_local279925986_0001_r_000000_0: Counters: 24
File System Counters
FILE: Number of bytes read=379466
FILE: Number of bytes written=758828
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=55
Reduce shuffle bytes=121221
Reduce input records=3245
Reduce output records=160
Spilled Records=3245
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=255328256
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=4673
2022-09-15 16:21:34,307 INFO mapred.LocalJobRunner: Finishing task:
attempt_local279925986_0001_r_000000_0
#reduce task任務執行完成

2、MapReduce内置Counter
Hadoop为每个MapReduce作业维护了一些内置的计数器,报告程序执行时各种指标信息。用户可
以根据这些信息进行判断程序:执行逻辑是否合理、执行结果是否正确。
Hadoop内置计数器根据功能进行分组(Counter Group)。每个组包括若干个不同的计数器。
Hadoop计数器都是MapReduce程序中全局的计数器,跟MapReduce分布式运算没有关系,不是
所谓的局部统计信息。
内置Counter Group包括:MapReduce任务计数器(Map-Reduce Framework)、文件系统计数
器(File System Counters)、作业计数器(Job Counters)、输入文件任务计数器(File Input
2022-09-15 16:21:34,308 INFO mapred.LocalJobRunner: reduce task executor
complete.
2022-09-15 16:21:35,045 INFO mapreduce.Job: Job job_local279925986_0001 running
in uber mode : false
2022-09-15 16:21:35,047 INFO mapreduce.Job: map 100% reduce 100%
2022-09-15 16:21:35,048 INFO mapreduce.Job: Job job_local279925986_0001
completed successfully
2022-09-15 16:21:35,056 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=516458
FILE: Number of bytes written=1391762
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=3245
Map output records=3245
Map output bytes=114725
Map output materialized bytes=121221
Input split bytes=140
Combine input records=0
Combine output records=0
Reduce input groups=55
Reduce shuffle bytes=121221
Reduce input records=3245
Reduce output records=160
Spilled Records=6490
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=5
Total committed heap usage (bytes)=510656512
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=136795
File Output Format Counters
Bytes Written=4673
剩余31页未读,继续阅读
资源推荐
资源评论
2021-05-12 上传
145 浏览量
2020-12-31 上传


184 浏览量

115 浏览量
2022-04-19 上传
2022-09-12 上传
138 浏览量
2021-03-01 上传
2022-04-07 上传

199 浏览量

161 浏览量
2022-07-14 上传
169 浏览量
2024-06-16 上传
2021-01-07 上传
143 浏览量
104 浏览量
116 浏览量
344 浏览量
点击了解资源详情
资源评论


一瓢一瓢的饮alanchanchn
- 粉丝: 1w+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 纸机基础板安装施工工法.doc
- 管理程序文件产品的监视和测量控制程序.doc
- 门窗工程防渗漏质量控制实施要点.doc
- 商贸楼框架结构计算书.doc
- 80%位老机械设计工程师的工作心得体会.doc
- 基于工学结合模式的《计算机基础》教材编写探索.docx
- 基于51单片机与DS18B20的数字温度计方案设计书.doc
- 高中计算机会考操作技能知识点汇总.doc
- 现制水磨石地面分项工程质量管理.doc
- 二次结构施工方案1.docx
- 地下防水混凝土工程质量管理.doc
- [甘肃]框剪结构教学楼综合工程模板施工方案.doc
- 大连某超高层公寓钢结构施工方案.doc
- 大体积溷凝土施工方案.doc
- 第四章-结构抗震计算(2).docx
- 杭州某小区160t生活污水处理回用设计方案及报价(mbr).doc
安全验证
文档复制为VIP权益,开通VIP直接复制
