For batch processing, we are going to write custom defined scripts using a custom map and reduce scripts using a scripting language. It provides SQL like environment and support for easy querying. In this tutorial, you will learn-

Working with Structured Data using Hive Working with Semi structured data using Hive (XML, JSON) Hive in Real time projects – When and Where to Use

Working with Structured Data using Hive

Structured Data means that data is in the proper format of rows and columns. This is more of like RDBMS data with proper rows and columns. Here we are going to load structured data present in text files in Hive Step 1) In this step we are creating table “employees_guru” with column names such as Id, Name, Age, Address, Salary and Department of the employees with data types.

From the above screenshot, we can observe the following,

Creation of table “employees_guru” Loading data from Employees.txt into table”employees_guru”

Step 2) In this step we are displaying the contents stored in this table by using “Select” command. We can observe the table contents in the following screen shot.

– Sample code Snippet Queries to be performed

  1. Create table employees_guru(Id INT, Name STRING, Age INT, Address STRING, Salary FLOAT, Department STRING)

Row format delimited Fields terminated by ‘,’;

  1. load data local inpath ‘/home/hduser/Employees.txt’ into TABLE employees_guru;
  2. select * from employees_guru;

Working with Semi structured data using Hive (XML, JSON)

Hive performs ETL functionalities in Hadoop ecosystem by acting as ETL tool. It can be difficult to perform map reduce in some type of applications, Hive can reduce the complexity and provides the best solution to the IT applications in terms of data warehousing sector. Semi structured data such as XML and JSON can be processed with less complexity using Hive. First we will see how we can use Hive for XML. XML TO HIVE TABLE In this, we are going to load XML data into Hive tables, and we will fetch the values stored inside the XML tags. Step 1) Creation of Table “xmlsample_guru” with str column with string data type.

From the above screenshot, we can observe the following

Creation of table “xmlsample_guru” Loading data from the test.xml into table “xmlsample_guru”

Step 2) Using XPath () method we will be able to fetch the data stored inside XML tags.

From the above screenshot, we can observe the following

Using XPATH( ) method we are fetching the values stored under /emp/esal/ and /emp/ename/ Values present Inside XML tags. In this step, we are displaying actual values stored under XML tags in table “xmlsample_guru”

Step 3) In this step, we will fetch and display the Raw XML of table “xmlsample_guru.”

From the above screenshot, we can observe the following

The actual XML data displaying with tags If we observe single tag, it is with “emp” as parent tag with”ename” and “esal” as child tags.

Code Snippet: Queries to be performed

  1. create table xmlsample_guru(str string); 2) load data local inpath ‘/home/hduser/test.xml’ overwrite into table xmlsample_guru;
  2. select xpath(str,’emp/ename/text()’), xpath(str,’emp/esal/text()’) from xmlsample_guru;

JSON (JavaScript Object Notation) Twitter and websites data is stored in JSON format. Whenever we try to fetch data from online servers it will return JSON files. Using Hive as data store we can able to load JSON data into Hive tables by creating schemas. JSON TO HIVE TABLE In this, we are going to load JSON data into Hive tables, and we will fetch the values stored in JSON schema. Step 1) In this step, we are going to create JSON table name “json_guru”. Once created loading and displaying contents of the actual schema.

From the above screenshot, we can observe the following

Creation of table “json_guru” Loading data from test.json into table “json_guru” Displaying actual schema of JSON file stored in json_guru tables

Step 2) Using get_json_object() Method we can able to fetch the Data values stored in JSON hierarchy

From the above screen shot we can observe the following

Using get_json_object (str,’$.ecode) it can fetch ecode values from table json_guru. Similarly using get_json_object (str,’$.ename) ,get_json_object (str,’$.Sali) it will fetch ename sal values from table json_guru Values stored inside of JSON Hierarchy in json_guru

Code Snippet Queries to be performed

  1. create table json_guru(str string);
  2. load data inpath ‘home/hduser/test.json’ into table json_guru;
  3. select * from json1;
  4. select get_json_object(str,’$.ecode’) as ecode, get_json_object(str,’$.ename’) as ename ,get_json_object(str,’$.sal’) as salary from json_guru;

Complex JSON TO HIVE TABLE In this, we are going to load Complex JSON data into Hive tables, and we will fetch the values stored in JSON schema Step 1) Creating complexjson_guru with single column field

From the above screenshot, we can observe the following

Creation on table complexjson_guru with single column field as string data type Loading data into complexjson_guru from emp.json complex JSON file

Step 2) By using get_json_object we can retrieve the actual content that stored inside of JSON file hierarchy. From the following screenshot, we can able to see the output of the data stored in complexjson_guru.

Step 3) In this step, by using “Select” command we actually able to see complex JSON data stored inside table “complexjson_guru”

-Sample Code Snippet, Queries to be performed

  1. create table complexjson_guru(json string);
  2. load data inpath ‘home/hduser/emp.json’ into table complexjson_guru;
  3. select get_json_object(json,’$.ecode’) as ecode ,get_json_object(json,’$.b’) as code, get_json_object(json,’$.c’) from complexjson_guru;
  4. select * from complexjson_guru;

Hive in Real time projects – When and Where to Use

When and Where to Use Hive on Hadoop Ecosystem: When

When working with strong and powerful statistical functions on Hadoop ecosystem When working with structured and Semi structured data processing As data warehouse tool with Hadoop Real time data ingestion with HBASE, Hive can be used

Where

For ease usage of ETL and data warehousing tool To provide SQL type environment and to query like SQL using HIVEQL To use and deploy custom specified map and reducer scripts for the specific client requirements