Hive interview questions part-1

 

1. Differentiate between Pig and Hive.

CriteriaApache PigApache Hive
NatureProcedural data flow languageDeclarative SQL-like language
ApplicationUsed for programmingUsed for report creation
Used byResearchers and programmersMainly Data Analysts
Operates onThe client-side of a clusterThe server-side of a cluster
Accessing raw dataNot as fast as HiveQLFaster with in-built features
Schema or data typeAlways defined in the script itselfStored in the local database
Ease of learningTakes little extra time and effort to masterEasy to learn from database experts

2. What is a Hive variable? What do we use it for?

Hive variables are basically created in the Hive environment that is referenced by Hive scripting languages. They allow to pass some values to a Hive query when the query starts executing. They use the source command.

3. Can we change the settings within a Hive session? If yes, how?

Yes, we can change the settings within a Hive session using the SET command. It helps change the Hive job settings for an exact query. For example, the following command shows that buckets are occupied according to the table definition:

hive> SET hive.enforce.bucketing=true;

We can see the current value of any property by using SET with the property name. SET will list all the properties with their values set by Hive.

hive> SET hive.enforce.bucketing;

hive.enforce.bucketing=true

This list will not include the defaults of Hadoop. So, we should use the below code:

SET -v

It will list all the properties including the Hadoop defaults in the system.

4. How to change the column data type in Hive? Explain RLIKE in Hive.

We can change the column data type by using ALTER and CHANGE as follows:

ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;

For example, if we want to change the data type of the salary column from integer to bigint in the employee table, we can use the following:

ALTER TABLE employee CHANGE salary salary BIGINT;

RLIKE: Its full form is Right-Like and it is a special function in Hive. It helps examine two substrings, i.e., if the substring of A matches with B, then it evaluates to true.

5. What are the components used in Hive Query Processor?

Following are the components of a Hive Query Processor:

  • Parse and Semantic Analysis (ql/parse)
  • Metadata Layer (ql/metadata)
  • Type Interfaces (ql/typeinfo)
  • Sessions (ql/session)
  • Map/Reduce Execution Engine (ql/exec)
  • Plan Components (ql/plan)
  • Hive Function Framework (ql/udf)
  • Tools (ql/tools)
  • Optimizer (ql/optimizer)

6. What are Buckets in Hive?

Buckets in Hive are used in segregating Hive table data into multiple files or directories. They are used for efficient querying.

7. What kind of data warehouse application is suitable for Hive? What are the types of tables in Hive?

Hive is not considered a full database. The design rules and regulations of Hadoop and HDFS have put restrictions on what Hive can do. However, Hive is most suitable for data warehouse applications because it:

  • Analyzes relatively static data
  • Has less responsive time
  • Does not make rapid changes in data

Although Hive doesn’t provide fundamental features required for Online Transaction Processing (OLTP), it is suitable for data warehouse applications in large datasets. There are two types of tables in Hive:

  • Managed tables
  • External tables


8.What is the definition of Hive? What is the present version of Hive? Explain ACID transactions in Hive.

Hive is an open-source data warehouse system. We can use Hive for analyzing and querying large datasets. It’s similar to SQL. The present version of Hive is 0.13.1. Hive supports ACID (Atomicity, Consistency, Isolation, and Durability) transactions. ACID transactions are provided at row levels. Following are the options Hive uses to support ACID transactions:

  • Insert
  • Delete
  • Update

Comments

Popular posts from this blog

MongoDB - Data Modelling

SPARK - Deployment

SQOOP