Hive interview questions part-1

- 8/05/2022 10:44:00 PM

1. Differentiate between Pig and Hive.

Criteria	Apache Pig	Apache Hive
Nature	Procedural data flow language	Declarative SQL-like language
Application	Used for programming	Used for report creation
Used by	Researchers and programmers	Mainly Data Analysts
Operates on	The client-side of a cluster	The server-side of a cluster
Accessing raw data	Not as fast as HiveQL	Faster with in-built features
Schema or data type	Always defined in the script itself	Stored in the local database
Ease of learning	Takes little extra time and effort to master	Easy to learn from database experts

2. What is a Hive variable? What do we use it for?

Hive variables are basically created in the Hive environment that is referenced by Hive scripting languages. They allow to pass some values to a Hive query when the query starts executing. They use the source command.

3. Can we change the settings within a Hive session? If yes, how?

Yes, we can change the settings within a Hive session using the SET command. It helps change the Hive job settings for an exact query. For example, the following command shows that buckets are occupied according to the table definition:

hive> SET hive.enforce.bucketing=true;

We can see the current value of any property by using SET with the property name. SET will list all the properties with their values set by Hive.

hive> SET hive.enforce.bucketing;

hive.enforce.bucketing=true

This list will not include the defaults of Hadoop. So, we should use the below code:

SET -v

It will list all the properties including the Hadoop defaults in the system.

4. How to change the column data type in Hive? Explain RLIKE in Hive.

We can change the column data type by using ALTER and CHANGE as follows:

ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;

For example, if we want to change the data type of the salary column from integer to bigint in the employee table, we can use the following:

ALTER TABLE employee CHANGE salary salary BIGINT;

RLIKE: Its full form is Right-Like and it is a special function in Hive. It helps examine two substrings, i.e., if the substring of A matches with B, then it evaluates to true.

5. What are the components used in Hive Query Processor?

Following are the components of a Hive Query Processor:

Parse and Semantic Analysis (ql/parse)
Metadata Layer (ql/metadata)
Type Interfaces (ql/typeinfo)
Sessions (ql/session)
Map/Reduce Execution Engine (ql/exec)
Plan Components (ql/plan)
Hive Function Framework (ql/udf)
Tools (ql/tools)
Optimizer (ql/optimizer)

6. What are Buckets in Hive?

Buckets in Hive are used in segregating Hive table data into multiple files or directories. They are used for efficient querying.

7. What kind of data warehouse application is suitable for Hive? What are the types of tables in Hive?

Hive is not considered a full database. The design rules and regulations of Hadoop and HDFS have put restrictions on what Hive can do. However, Hive is most suitable for data warehouse applications because it:

Analyzes relatively static data
Has less responsive time
Does not make rapid changes in data

Although Hive doesn’t provide fundamental features required for Online Transaction Processing (OLTP), it is suitable for data warehouse applications in large datasets. There are two types of tables in Hive:

Managed tables
External tables

8.What is the definition of Hive? What is the present version of Hive? Explain ACID transactions in Hive.

Hive is an open-source data warehouse system. We can use Hive for analyzing and querying large datasets. It’s similar to SQL. The present version of Hive is 0.13.1. Hive supports ACID (Atomicity, Consistency, Isolation, and Durability) transactions. ACID transactions are provided at row levels. Following are the options Hive uses to support ACID transactions:

Insert
Delete
Update

Search This Blog

Kishan’s big data world