Allen's Blog: Resource Scheduler , Calculator, Short-Circuit in Hadoop YARN and HDFS

In order to execute the next-year plan, I search the research topics and technologies in Hadoop YARN and HDFS, then make a note as follows:

Since Hadoop YARN was proposed, the new generation technology are continusly discussed. For knowing the work of YARN, please refer to the post [1].
The capacity scheduler of YARN[2][3] provides a default capacity scheduler, org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler [4], to let hadoop eco-system manipulates its’ resources. It also provides resource calculator, org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator [5], to calculate the memory usage, the disk usage and the cpu usage of each compute node.
For those system administrators and developers who are curious about the operations of resource allocation and the scheduler, please see the reports [6], [7] and [8].
In scientific area, Project HaSTE [9] proposed a new Hadoop YARN scheduling algorithm, which aims at efficiently utilizing the resources for scheduling map/reduce tasks in Hadoop YARN and improving the makespan of MapReduce jobs.

On the other hand, HDFS [10] is a usually used file system in Hadoop. However, it needs TCP socket connection to read/write data. Due to this reason, the IO performance will be lower than directly reading from local disk without network connection. Therefore, HDFS provides a function called HDFS Short-Circuit Local Reads [11] and also provides a native libaray to directly access the HDFS file system. According to the report [12], the I/O performance of using Short-Circuit is better than TCP.
ps. The other tricky technology to improve the I/O performance of HDFS is to use CombineFileInputFormat [13][14]. But I don’t think this method is better than using Short-Circuit.

Reference

Karthik Kambatla, Wing Yew Poon, and Vikram Srivastava, “How Apache Hadoop YARN HA Works,” Cludera. Available: [Online] http://blog.cloudera.com/blog/2014/05/how-apache-hadoop-yarn-ha-works/
Hadoop, “Haddop MapReduce Next Generation – Capacity Scheduler”, Apache Hadoop. Available: [Online] http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Configuration
skyWalker_ONLY, “Hadoop-2.4.1学习之容量调度器”. Available: [Online] http://blog.csdn.net/skywalker_only/article/details/41351147
GrepCode, “CapacityScheduler”, GrepCode.com. Available: [Online] http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-yarn-server-resourcemanager/2.6.0/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java/
GrepCode, “ResourceCalculator”, GrepCode.com. Available: [Online] http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-yarn-common/2.6.0/org/apache/hadoop/yarn/util/resource/DefaultResourceCalculator.java/
Vinod Kumar Vavilapalli, “Resource Location in YARN: Deep Dive,” Hortonworks. Available: [Online] http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/
SEQUENCEIQ, “YARN Schedulers demystified – Part 1: Capacity.” Available: [Online] http://blog.sequenceiq.com/blog/2014/07/22/schedulers-part-1/
SEQUENCEIQ, “YARN Schedulers demystified – Part 2: Fair.” Available: [Online] http://blog.sequenceiq.com/blog/2014/09/09/yarn-schedulers-demystified-part-2-fair/
Bo Sheng, “Project HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand,” The 7th IEEE International Conference on Cloud Computing, Anchorage, AK, June 2014. Available: [Online] http://www.cs.umb.edu/~shengbo/research/haste.html
Hadoop, “HDFS User Guide,” Apache Hadoop. Available: [Online] http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Related_Documentation
Hadoop, “HDFS Short-Circuit Local Reads,” Apache Hadoop. Available: [Online] https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html
Colin McCabe, “How Improved Short-Circuit Local Reads Bring Better Performance and Security to Hadoop,” Cloudera. Available: [Online] http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/
Dhruba Borthakur, “HDFS block replica placement in your hands now!” Available: [Online] http://hadoopblog.blogspot.de/2009/09/hdfs-block-replica-placement-in-your.html
Hadoop, “Class CombineFileInputFormat<K,V>”. Available: [Online] http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html

Allen's Blog

2015年5月30日星期六

Resource Scheduler , Calculator, Short-Circuit in Hadoop YARN and HDFS

沒有留言:

張貼留言

2015年5月30日 星期六

Resource Scheduler , Calculator, Short-Circuit in Hadoop YARN and HDFS

沒有留言:

張貼留言

2015年5月30日星期六