Hadoop is a Java-based programming framework that supports the processing and storage of extremely large datasets on a cluster of inexpensive machines. It was the first major open source project in the big data playing field and provides high throughput access to application data .
The main goal of this tutorial is to get a simple Hadoop installation up and running so that you can play around with the software and learn more about it.
Environment: This blog has been tested in the following software version.
- REHL ( Red hat Linux 7.4) on Virtual box 5.2
- Hadoop 2.7.3 version
- update /etc/hosts file with Hostname and IP address.
[root@cdhs ~]# cat /etc/hosts
Dedicated Hadoop system user:
After VM set up, please add a non sudo user dedicated to Hadoop which will be used to configure Hadoop. Following command will add the user hduser and the group hadoop to VM machine. Continue reading → Install Apache Hadoop – Single Node REHL 7