Hive interview questions Part-2

 1. What is the maximum size of a string data type supported by Hive? Explain how Hive supports binary formats.

The maximum size of a string data type supported by Hive is 2 GB. Hive supports the text file format by default, and it also supports the binary format sequence files, ORC files, Avro data files, and Parquet files.

  • Sequence file: It is a splittable, compressible, and row-oriented file with a general binary format.
  • ORC file: Optimized row columnar (ORC) format file is a record-columnar and column-oriented storage file. It divides the table in row split. Each split stores the value of the first row in the first column and follows subsequently.
  • Avro data file: It is the same as a sequence file that is splittable, compressible, and row-oriented but without the support of schema evolution and multilingual binding.
  • Parquet file: In Parquet format, along with storing rows of data adjacent to one another, we can also store column values adjacent to each other such that both horizontally and vertically datasets are partitioned.


2. What is the precedence order of Hive configuration?

We are using a precedence hierarchy for setting properties:

  1. The SET command in Hive
  2. The command-line –hiveconf option
  3. Hive-site.XML
  4. Hive-default.xml
  5. Hadoop-site.xml
  6. Hadoop-default.xml


3. If you run a select * query in Hive, why doesn't it run MapReduce?

The hive.fetch.task.conversion property of Hive lowers the latency of MapReduce overhead, and in effect when executing queries such as SELECT, FILTER, LIMIT, etc. it skips the MapReduce function.

4. How can we improve the performance with ORC format tables in Hive?

We can store Hive data in a highly efficient manner in an Optimized Row Columnar (ORC) file format. It can simplify many Hive file format limitations. We can improve the performance by using ORC files while reading, writing, and processing data.

Set hive.compute.query.using.stats-true;
Set hive.stats.dbclass-fs;
CREATE TABLE orc_table (
idint,
name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\:’
LINES TERMINATED BY ‘\n’
STORES AS ORC;

Comments

Popular posts from this blog

MongoDB - Data Modelling

SPARK - Deployment

SQOOP