Wednesday 31 May 2017

Top Hadoop Interview Questions For 2017

Here is top objective type sample Hadoop Interview questions and their answers are given just below to them. These sample questions are framed by experts from Besant Technologies who train for Learn Hadoop Training in Chennai to give you an idea of a type of questions which may be asked in an interview. We have taken full care to give correct answers for all the questions. Do comment your thoughts. Happy Job Hunting!Here is top objective type sample Hadoop Interview questions and their answers are given just below to them. In case you have attended Hadoop interviews previously, we encourage you to add your questions in the comments tab. We will be happy to answer them, and spread the word to the community of fellow job seekers.

1.What is Hadoop and its components?

Ans : When “Big Data” emerged as a problem, Apache Hadoop evolved as a solution to it. Apache Hadoop is a framework which provides us various services or tools to store and process Big Data. It helps in analyzing Big Data and making business decisions out of it, which can’t be done efficiently and effectively using traditional systems.
♣ Tip: Now, while explaining Hadoop, you should also explain the main components of Hadoop, i.e.:Storage unit– HDFS (NameNode, DataNode)Processing framework– YARN (ResourceManager, NodeManager)
2.Compare HDFS with Network Attached Storage (NAS)?
Ans : In this question, first explain NAS and HDFS, and then compare their features as follows:
  • Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. NAS can either be a hardware or software which provides services for storing and accessing files. Whereas Hadoop Distributed File System (HDFS) is a distributed filesystem to store data using commodity hardware.
  • In HDFS Data Blocks are distributed across all the machines in a cluster. Whereas in NAS data is stored on a dedicated hardware.
  • HDFS is designed to work with MapReduce paradigm, where computation is moved to the data. NAS is not suitable for MapReduce since data is stored separately from the computations.
  • HDFS uses commodity hardware which is cost effective, whereas a NAS is a high-end storage devices which includes high cost.
3.What are active and passive “NameNodes”? 
Ans : In HA (High Availability) architecture, we have two NameNodes – Active “NameNode” and Passive “NameNode”.
  • Active “NameNode” is the “NameNode” which works and runs in the cluster.
  • Passive “NameNode” is a standby “NameNode”, which has similar data as active “NameNode”.
When the active “NameNode” fails, the passive “NameNode” replaces the active “NameNode” in the cluster. Hence, the cluster is never without a “NameNode” and so it never fails.

4. What will you do when NameNode is down?
Ans : The NameNode recovery process involves the following steps to make the Hadoop cluster up and running:
1. Use the file system metadata replica (FsImage) to start a new NameNode. 
2. Then, configure the DataNodes and clients so that they can acknowledge this new NameNode, that is started.
3. Now the new NameNode will start serving the client after it has completed loading the last checkpoint FsImage (for metadata information) and received enough block reports from the DataNodes. 
Whereas, on large Hadoop clusters this NameNode recovery process may consume a lot of time and this becomes even a greater challenge in the case of the routine maintenance. Therefore, we have HDFS High Availability Architecture which is covered in the HA architecture
5.Why do we use HDFS for applications having large data sets and not when there are a lot of small files?
Ans : HDFS is more suitable for large amounts of data sets in a single file as compared to small amount of data spread across multiple files. As you know, the NameNode stores the metadata information regarding file system in the RAM. Therefore, the amount of memory produces a limit to the number of files in my HDFS file system. In other words, too much of files will lead to generation of too much meta data. And, storing these meta data in the RAM will become a challenge. As a thumb rule, metadata for a file, block or directory takes 150 bytes. 
Check Out Our Hadoop Training in ChennaiCheck Out Our Hadoop Course
6. How can I restart “NameNode” or all the daemons in Hadoop? 
Ans : This question can have two answers, we will discuss both the answers. We can restart NameNode by following methods:
  1. You can stop the NameNode individually using. /sbin /hadoop-daemon.sh stop namenode command and then start the NameNode using. /sbin/hadoop-daemon.sh start namenode command.
  2. To stop and start all the daemons, use. /sbin/stop-all.sh and then use ./sbin/start-all.sh command which will stop all the daemons first and then start all the daemons.
These script files reside in the sbin directory inside the Hadoop directory.







Sunday 28 May 2017

4 Benefits of Using Apache Kafka in Lieu of AMQP or JMS

As hotness goes, it's laborious to beat Apache Spark. consistent with a replacement Syncsort survey, Spark has displaced Hadoop because the most visible and active massive knowledge project. on condition that Spark makes it way more easy (and possible) to manage fast knowledge, this is not shocking.
What is shocking, however, is how briskly Apache author is closing in on Spark, its kissing kin.
According to Redmonk analysis, Hadoop Training in Chennai  author is "is progressively in demand for usage in union workloads like IoT, among others." This, consistent with Redmonk analyst Fintan Ryan, has resulted in "a large dealing in developer interest in, chatter around, and usage of, Kafka."
So, wherever will author grow from here,Hadoop Training Institute in Chennai  and may you employ it?
Batch-oriented knowledge infrastructure was fine within the time period of massive knowledge, however because the trade has full-grown snug with streaming knowledge, tools like Hadoop have fallen out of favor. whereas there'll Best Hadoop Training in Chennai seemingly perpetually be an area for Hadoop to shine, as Spark takes over a general message broker like writer starts to create plenty of sense.

As Ryan writes, "With new workloads in areas like IoT, mobile and play generating large, and ever increasing, streams of information, developers are longing for a mechanism to simply consume the information during a consistent and Hadoop Training in Chennai with placement coherent manner."
Kafka sits at the front-end of streaming knowledge, acting as a electronic communication system to capture and publish feeds, with Spark (or other) because the transformation tier that permits knowledge to be "manipulated, enriched associated analyzed before it's persisted to be used by an application," as MemSQL chief executive officer Eric Frenkiel wrote.
This partnership with fashionable streaming systems like Spark has resulted in "consistent growth of active users on the writer users listing, that is simply over 260% since July 2014," Ryan notes.
In fact, demand for writer is therefore high right away that it's outpacing even Spark, a minimum of in terms of relative leader demand: