Friday 16 June 2017

5 Unbelievable Facts About Big Data Hadoop

Big Data is we tend to square measure simply at the start of a revolution which will bite each business and each life on this planet.But many folks square measure still treating the idea of massive knowledge as one thing they'll prefer to ignore Hadoop Training in Chennai once really, they’re near to be run over by the steamroller that's massive knowledge.

  1. The data volumes area unit exploding, additional knowledge has been created within the past 2 years than within the entire previous history of the civilization.
  2. Data is growing quicker than ever before and by the year 2020, about 1.7 megabytes of recent info are created each second for each individual on the earth.
  3. By then, our accumulated digital universe of information can grow from four.4 zettabyets these days to around four zettabytes, or forty four trillion gigabytes.
  4. Every second we tend to produce new knowledge. for instance, we tend to perform forty,000 search queries each second (on Google alone), that makes it three.5 searches per day and one.2 trillion searches per annum.
  5. In August 2015, over one billion individuals used Facebook FB +0.57% during a single day.
  6. Facebook users ship average thirty one.25 million messages and examine two.77 million videos each minute.
  7. We area unit seeing a vast growth in video and exposure knowledge, wherever each minute up to three hundred hours of video area unit uploaded to YouTube alone.
  8. In 2015, a staggering one trillion photos are taken and billions of them are shared on-line. By 2017, nearly eightieth of photos are taken on sensible phones.
  9. This year, over 1.4 billion sensible phones are shipped - all jam-choked with sensors capable of collection every kind of information, to not mention the information the users produce themselves.
  10. By 2020, we are going to have over vi.1 billion smartphone users globally (overtaking basic fastened phone subscriptions).
  11. Within 5 years there'll be over fifty billion sensible connected devices within the world, all developed to gather, analyze and share knowledge.
  12. By 2020, a minimum of a 3rd of all knowledge can labor under the cloud (a network of servers connected to the Internet) Hadoop Training Institute in Chennai.
  13. Distributed computing (performing computing tasks employing a network of computers within the cloud) is extremely real. Google GOOGL -0.16% uses it each day to involve concerning one,000 computers in respondent one search question, that takes no quite zero.2 seconds to complete.
  14. The Hadoop (open supply computer code for distributed computing) market is forecast to grow at a compound annual rate fifty eight surpassing $1 billion by 2020.
  15. Estimates counsel that by higher desegregation massive knowledge, health care may save the maximum amount as $300 billion a year — that’s capable reducing prices by $1000 a year for each man, woman, and child Best Hadoop Training in Chennai.
  16. The White House has already endowed quite $200 million in massive knowledge comes.
  17. For a typical Fortune a thousand company, simply a tenth increase in knowledge accessibility can end in quite $65 million extra income.
  18. Retailers UN agency leverage the total power of huge knowledge may increase their operational margins by the maximum amount as hr.
  19. 73% of organizations have already endowed or attempt to invest in massive knowledge by 2016 Hadoop Training in Chennai with Placement
  20. And one in all my favorite facts: At the instant but zero.5% of all knowledge is ever analysed and used, simply imagine the potential here.

Sunday 11 June 2017

Importing Data from Relational Databases into Hadoop

Although Apache Hadoop was created to method immense amounts of unstructured knowledge, particularly on the net, you may realize Hadoop sitting aboard a computer database. This stems from the prevalence of relative databases. you need to conjointly contemplate the actual fact that firms, people, and groups considering a migration to Hadoop Training ought to port their knowledge to Hadoop so MapReduce jobs will use it. though you'll be able to piece and manually execute knowledge migration, their area unit tools offered to try to this for you. One such tool is Sqoop, that was discharged by Cloudera, however, currently, it’s associated Apache project. because the name signifies, it scoops knowledge from a relative supply to HDFS, and contrariwise.

Before we tend to see however Sqoop imports information into Hadoop, let’s initial check out however we will be intimate while not a 3rd party tool. the apparent approach is to form a MapReduce job that pulls information from a structured information supply, like MySQL dB with direct JDBC access and write it into HDFS. the first issue with this approach is configuring and handling dB association pooling since approach too several mappers can try and maintain an association whereas propulsion information.

Now let’s see however we tend to try this exploitation Sqoop. the primary step is to transfer, install and piece SQOOP. check that you have got Hadoop one.4+ on your machine. Configuring SQOOP is pretty simple, however, browse Time for action – downloading and configuring Sqoop in Hadoop Beginner’s Guide for additional details. once putting in SQOOP, check that you have got the JDBC driver of your relative sound unit (MySQL for example) and replica it into SQOOP’s lib directory. Everything you wish is about up. currently, we are going to arrange to merely dump information from a MySQL table into structured files on HDFS.

Learn Now Hadoop Training Course reach us , Hadoop Training in Chennai , Hadoop Training Institute in Chennai , Best Hadoop Training in Chennai , Hadoop Training , Hadoop Training in Chennai with placement,  Big Data Hadoop Training in chennai

{ Create an “Employees” table in MySQL and populate it with some data. We will import this table into HDFS. Now run Sqoop to export this data:
Let’s examine what just happened. With a single sqoop command, we pulled data from MySQL database’s “Employees” table into Hadoop. The first option specifies the type of sqoop (import in our case). Next, we listed the JDBC URI for our MySQL database along with the table name. Sqoop does the rest. Sqoop will place the data pulled from the table into multiple files (based upon the number of mappers it spun) into the home directory.


$ sqoop import --connect jdbc:mysql://localhost/hadooptest
--username hadoopuser --password password --table employees

$ hadoop fs -ls employees

Found 6 items
-rw-r--r--   3 hadoop supergroup     
               0 2013-04-24 04:10 /user/hadoop/employees/_SUCCESS
drwxr-xr-x   - hadoop supergroup     
               0 2012-04-24 04:10 /user/hadoop/employees/_logs
-rw-r--r--   3 … /user/hadoop/employees/part-m-00000
-rw-r--r--   3 … /user/hadoop/employees/part-m-00001
-rw-r--r--   3 … /user/hadoop/employees/part-m-00002
-rw-r--r--   3 … /user/hadoop/employees/part-m-00003


Wednesday 7 June 2017

Why Big Data Analytics is the Best Career Move?

Why Big Data Analytics is the Best Career move

If you are still not convinced by the fact that Big Data Analytics is one of the hottest skills, here are 10 more reasons for you to see the big picture.

1. Soaring Demand for Analytics Professionals:

Jeanne Harris, senior executive at Accenture Institute for High Performance, has stressed the significance of analytics professionals by saying, “…data is useless without the skill to analyze it.” There are more job opportunities in Big Data management and Analytics than there were last year and many IT professionals are prepared to invest time and money for the training.
The job trend graph for Big Data Analytics, from Indeed.com, proves that there is a growing trend for it and as a result there is a steady increase in the number of job opportunities.

2. Huge Job Opportunities & Meeting the Skill Gap:

The demand for Analytics skill is going up steadily but there is a huge deficit on the supply side. This is happening globally and is not restricted to any part of geography. In spite of Big Data Analytics being a ‘Hot’ job, there is still a large number of unfilled jobs across the globe due to shortage of required skill. A McKinsey Global Institute study states that the US will face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using Big Data by 2018.
India, currently has the highest concentration of analytics globally. In spite of this, the scarcity of data analytics talent is particularly acute and demand for talent is expected to be on the higher side  as more global organizations are outsourcing their work.
According to Srikanth Velamakanni, co-founder and CEO of Fractal Analytics, there are two types of talent deficits: Data Scientists, who can perform analytics and Analytics Consultant, who can understand and use data. The talent supply for these job title, especially Data Scientists is extremely scarce and the demand is huge.

3. Salary Aspects:

Strong demand for Data Analytics skills is boosting the wages for qualified professionals and making Big Data pay big bucks for the right skill. This phenomenon is being seen globally where countries like Australia and the U.K are witnessing this ‘Moolah Marathon’.
According to the 2015 Skills and Salary Survey Report published by the Institute of Analytics Professionals of Australia (IAPA), the annual median salary for data analysts is $130,000, up four per cent from last year. Continuing the trend set in 2013 and 2014, the median respondent earns 184% of the Australian full-time median salary. The rising demand for analytics professionals is also reflected in IAPA’s membership, which has grown to more than 5000 members in Australia since its formation in 2006.
Randstad states that the annual pay hikes for Analytics professionals in India is on an average 50% more than other IT professionals. According to The Indian Analytics Industry Salary Trend Report by Great Lakes Institute of Management, the average salaries for analytics professionals in India was up by 21% in 2015 as compared to 2014. The report also states that 14% of all analytics professionals get a salary of more than Rs. 15 lakh per annum.
A look at the salary trend for Big Data Analytics in the UK also indicates a positive and exponential growth. A quick search on Itjobswatch.co.uk shows a median salary of £62,500 in early 2016 for Big Data Analytics jobs, as compared to £55,000 in the same period in 2015. Also, a year-on-year median salary change of +13.63% is observed.
The table below looks at the statistics for Big Data Analytics skills in IT jobs advertised across the UK. Included is a guide to the salaries offered in IT jobs that have cited Big Data Analytics over the 3 months to 23 June 2016 with a comparison to the same period over the previous 2 years.

Numerous Choices in Job Titles and Type of Analytics :

From a career point of view, there are so many option available, in terms of domain as well as nature of job. Since Analytics is utilized in varied fields, there are numerous job titles for one to choose from.
  • Big Data Analytics Business Consultant
  • Big Data Analytics Architect
  • Big Data Engineer
  • Big Data Solution Architect
  • Big Data Analyst
  • Analytics Associate
  • Business Intelligence and Analytics Consultant
  • Metrics and Analytics Specialist
Big Data Analytics career is deep and one can choose from the 3 types of data analytics depending on the Big Data environment.
  • Prescriptive Analytics
  • Predictive Analytics
  • Descriptive Analytics.
A huge array of organizations like Ayata, IBM, Alteryx, Teradata, TIBCO, Microsoft, Platfora, ITrend, Karmasphere, Oracle, Opera, Datameer, Pentaho, Centrofuge, FICO, Domo, Quid, Saffron, Jaspersoft, GoodData, Bluefin Labs, Tracx, Panaroma Software, and countless more are utilizing Big Data Analytics for their business needs and a huge job opportunities are possible with them.

Wednesday 31 May 2017

Top Hadoop Interview Questions For 2017

Here is top objective type sample Hadoop Interview questions and their answers are given just below to them. These sample questions are framed by experts from Besant Technologies who train for Learn Hadoop Training in Chennai to give you an idea of a type of questions which may be asked in an interview. We have taken full care to give correct answers for all the questions. Do comment your thoughts. Happy Job Hunting!Here is top objective type sample Hadoop Interview questions and their answers are given just below to them. In case you have attended Hadoop interviews previously, we encourage you to add your questions in the comments tab. We will be happy to answer them, and spread the word to the community of fellow job seekers.

1.What is Hadoop and its components?

Ans : When “Big Data” emerged as a problem, Apache Hadoop evolved as a solution to it. Apache Hadoop is a framework which provides us various services or tools to store and process Big Data. It helps in analyzing Big Data and making business decisions out of it, which can’t be done efficiently and effectively using traditional systems.
♣ Tip: Now, while explaining Hadoop, you should also explain the main components of Hadoop, i.e.:Storage unit– HDFS (NameNode, DataNode)Processing framework– YARN (ResourceManager, NodeManager)
2.Compare HDFS with Network Attached Storage (NAS)?
Ans : In this question, first explain NAS and HDFS, and then compare their features as follows:
  • Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. NAS can either be a hardware or software which provides services for storing and accessing files. Whereas Hadoop Distributed File System (HDFS) is a distributed filesystem to store data using commodity hardware.
  • In HDFS Data Blocks are distributed across all the machines in a cluster. Whereas in NAS data is stored on a dedicated hardware.
  • HDFS is designed to work with MapReduce paradigm, where computation is moved to the data. NAS is not suitable for MapReduce since data is stored separately from the computations.
  • HDFS uses commodity hardware which is cost effective, whereas a NAS is a high-end storage devices which includes high cost.
3.What are active and passive “NameNodes”? 
Ans : In HA (High Availability) architecture, we have two NameNodes – Active “NameNode” and Passive “NameNode”.
  • Active “NameNode” is the “NameNode” which works and runs in the cluster.
  • Passive “NameNode” is a standby “NameNode”, which has similar data as active “NameNode”.
When the active “NameNode” fails, the passive “NameNode” replaces the active “NameNode” in the cluster. Hence, the cluster is never without a “NameNode” and so it never fails.

4. What will you do when NameNode is down?
Ans : The NameNode recovery process involves the following steps to make the Hadoop cluster up and running:
1. Use the file system metadata replica (FsImage) to start a new NameNode. 
2. Then, configure the DataNodes and clients so that they can acknowledge this new NameNode, that is started.
3. Now the new NameNode will start serving the client after it has completed loading the last checkpoint FsImage (for metadata information) and received enough block reports from the DataNodes. 
Whereas, on large Hadoop clusters this NameNode recovery process may consume a lot of time and this becomes even a greater challenge in the case of the routine maintenance. Therefore, we have HDFS High Availability Architecture which is covered in the HA architecture
5.Why do we use HDFS for applications having large data sets and not when there are a lot of small files?
Ans : HDFS is more suitable for large amounts of data sets in a single file as compared to small amount of data spread across multiple files. As you know, the NameNode stores the metadata information regarding file system in the RAM. Therefore, the amount of memory produces a limit to the number of files in my HDFS file system. In other words, too much of files will lead to generation of too much meta data. And, storing these meta data in the RAM will become a challenge. As a thumb rule, metadata for a file, block or directory takes 150 bytes. 
Check Out Our Hadoop Training in ChennaiCheck Out Our Hadoop Course
6. How can I restart “NameNode” or all the daemons in Hadoop? 
Ans : This question can have two answers, we will discuss both the answers. We can restart NameNode by following methods:
  1. You can stop the NameNode individually using. /sbin /hadoop-daemon.sh stop namenode command and then start the NameNode using. /sbin/hadoop-daemon.sh start namenode command.
  2. To stop and start all the daemons, use. /sbin/stop-all.sh and then use ./sbin/start-all.sh command which will stop all the daemons first and then start all the daemons.
These script files reside in the sbin directory inside the Hadoop directory.







Sunday 28 May 2017

4 Benefits of Using Apache Kafka in Lieu of AMQP or JMS

As hotness goes, it's laborious to beat Apache Spark. consistent with a replacement Syncsort survey, Spark has displaced Hadoop because the most visible and active massive knowledge project. on condition that Spark makes it way more easy (and possible) to manage fast knowledge, this is not shocking.
What is shocking, however, is how briskly Apache author is closing in on Spark, its kissing kin.
According to Redmonk analysis, Hadoop Training in Chennai  author is "is progressively in demand for usage in union workloads like IoT, among others." This, consistent with Redmonk analyst Fintan Ryan, has resulted in "a large dealing in developer interest in, chatter around, and usage of, Kafka."
So, wherever will author grow from here,Hadoop Training Institute in Chennai  and may you employ it?
Batch-oriented knowledge infrastructure was fine within the time period of massive knowledge, however because the trade has full-grown snug with streaming knowledge, tools like Hadoop have fallen out of favor. whereas there'll Best Hadoop Training in Chennai seemingly perpetually be an area for Hadoop to shine, as Spark takes over a general message broker like writer starts to create plenty of sense.

As Ryan writes, "With new workloads in areas like IoT, mobile and play generating large, and ever increasing, streams of information, developers are longing for a mechanism to simply consume the information during a consistent and Hadoop Training in Chennai with placement coherent manner."
Kafka sits at the front-end of streaming knowledge, acting as a electronic communication system to capture and publish feeds, with Spark (or other) because the transformation tier that permits knowledge to be "manipulated, enriched associated analyzed before it's persisted to be used by an application," as MemSQL chief executive officer Eric Frenkiel wrote.
This partnership with fashionable streaming systems like Spark has resulted in "consistent growth of active users on the writer users listing, that is simply over 260% since July 2014," Ryan notes.
In fact, demand for writer is therefore high right away that it's outpacing even Spark, a minimum of in terms of relative leader demand: