Wednesday 31 May 2017

Top Hadoop Interview Questions For 2017

Here is top objective type sample Hadoop Interview questions and their answers are given just below to them. These sample questions are framed by experts from Besant Technologies who train for Learn Hadoop Training in Chennai to give you an idea of a type of questions which may be asked in an interview. We have taken full care to give correct answers for all the questions. Do comment your thoughts. Happy Job Hunting!Here is top objective type sample Hadoop Interview questions and their answers are given just below to them. In case you have attended Hadoop interviews previously, we encourage you to add your questions in the comments tab. We will be happy to answer them, and spread the word to the community of fellow job seekers.

1.What is Hadoop and its components?

Ans : When “Big Data” emerged as a problem, Apache Hadoop evolved as a solution to it. Apache Hadoop is a framework which provides us various services or tools to store and process Big Data. It helps in analyzing Big Data and making business decisions out of it, which can’t be done efficiently and effectively using traditional systems.
♣ Tip: Now, while explaining Hadoop, you should also explain the main components of Hadoop, i.e.:Storage unit– HDFS (NameNode, DataNode)Processing framework– YARN (ResourceManager, NodeManager)
2.Compare HDFS with Network Attached Storage (NAS)?
Ans : In this question, first explain NAS and HDFS, and then compare their features as follows:
  • Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. NAS can either be a hardware or software which provides services for storing and accessing files. Whereas Hadoop Distributed File System (HDFS) is a distributed filesystem to store data using commodity hardware.
  • In HDFS Data Blocks are distributed across all the machines in a cluster. Whereas in NAS data is stored on a dedicated hardware.
  • HDFS is designed to work with MapReduce paradigm, where computation is moved to the data. NAS is not suitable for MapReduce since data is stored separately from the computations.
  • HDFS uses commodity hardware which is cost effective, whereas a NAS is a high-end storage devices which includes high cost.
3.What are active and passive “NameNodes”? 
Ans : In HA (High Availability) architecture, we have two NameNodes – Active “NameNode” and Passive “NameNode”.
  • Active “NameNode” is the “NameNode” which works and runs in the cluster.
  • Passive “NameNode” is a standby “NameNode”, which has similar data as active “NameNode”.
When the active “NameNode” fails, the passive “NameNode” replaces the active “NameNode” in the cluster. Hence, the cluster is never without a “NameNode” and so it never fails.

4. What will you do when NameNode is down?
Ans : The NameNode recovery process involves the following steps to make the Hadoop cluster up and running:
1. Use the file system metadata replica (FsImage) to start a new NameNode. 
2. Then, configure the DataNodes and clients so that they can acknowledge this new NameNode, that is started.
3. Now the new NameNode will start serving the client after it has completed loading the last checkpoint FsImage (for metadata information) and received enough block reports from the DataNodes. 
Whereas, on large Hadoop clusters this NameNode recovery process may consume a lot of time and this becomes even a greater challenge in the case of the routine maintenance. Therefore, we have HDFS High Availability Architecture which is covered in the HA architecture
5.Why do we use HDFS for applications having large data sets and not when there are a lot of small files?
Ans : HDFS is more suitable for large amounts of data sets in a single file as compared to small amount of data spread across multiple files. As you know, the NameNode stores the metadata information regarding file system in the RAM. Therefore, the amount of memory produces a limit to the number of files in my HDFS file system. In other words, too much of files will lead to generation of too much meta data. And, storing these meta data in the RAM will become a challenge. As a thumb rule, metadata for a file, block or directory takes 150 bytes. 
Check Out Our Hadoop Training in ChennaiCheck Out Our Hadoop Course
6. How can I restart “NameNode” or all the daemons in Hadoop? 
Ans : This question can have two answers, we will discuss both the answers. We can restart NameNode by following methods:
  1. You can stop the NameNode individually using. /sbin /hadoop-daemon.sh stop namenode command and then start the NameNode using. /sbin/hadoop-daemon.sh start namenode command.
  2. To stop and start all the daemons, use. /sbin/stop-all.sh and then use ./sbin/start-all.sh command which will stop all the daemons first and then start all the daemons.
These script files reside in the sbin directory inside the Hadoop directory.







3 comments:


  1. Good,keep do posting
    All the ways that you suggested for find a new post was very good.
    obiee training in hyderabad

    ReplyDelete

  2. Very good post
    All the ways that you suggested for find a new post was very good.
    matlab training in hyderabad

    ReplyDelete
  3. I really enjoy reading this article. Hope that you would do great in upcoming time.A perfect post. Thanks for sharing.SAS Training in Bangalore

    ReplyDelete