Hive interview questions part-6

When you point a partition of a hive table to a new directory, what happens to the data?

ALTER TABLE table_name
CHANGE COLUMN new_col  INT
BEFORE x_col

No. It only reduces the number of files which becomes easier for namenode to manage.

By using the ENABLE OFFLINE clause with ALTER TABLE atatement.

By Omitting the LOCAL CLAUSE in the LOAD DATA statement.

The new incoming files are just added to the target directory and the existing files are simply overwritten. Other files whose name does not match any of the incoming files will continue to exist.

If you add the OVERWRITE clause then all the existing data in the directory will be deleted before new data is written.

It creates partition on table employees with partition values coming from the columns in the select clause. It is called Dynamic partition insert.

A table generating function is a function which takes a single column as argument and expands it to multiple column or rows. Example exploe()

If we set the property hive.exec.mode.local.auto to true then hive will avoid mapreduce to fetch query results.

The LIKE operator behaves the same way as the regular SQL operators used in select queries. Example −

street_name like ‘%Chi’

But the RLIKE operator uses more advance regular expressions which are available in java

Example − street_name RLIKE ‘.*(Chi|Oho).*’ which will select any word which has either chi or oho in it.

No. As this kind of Join can not be implemented in mapreduce

In a join query the smallest table to be taken in the first position and largest table should be taken in the last position.

It controls ho wthe map output is reduced among the reducers. It is useful in case of streaming data

Select cast(price as FLOAT) 

Hive will return NULL

No. The name of a view must be unique whne compared to all other tables and views present in the same database.

No. A view can not be the target of a INSERT or LOAD statement.

Indexes occupies space and there is a processing cost in arranging the values of the column on which index is cerated.

SHOW INDEX ON table_name 

This will list all the indexes created on any of the columns in the table table_name.

The values in a column are hashed into a number of buckets which is defined by user. It is a way to avoid too many partitions or nested partitions while ensuring optimizes query output.

It is query hint to stream a table into memory before running the query. It is a query optimization Technique.

Yes. A partition can be archived. Advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient query and does not offer any space savings.

It is a UDF which is created using a java program to server some specific need not covered under the existing functions in Hive. It can detect the type of input argument programmatically and provide appropriate response.

The local inpath should contain a file and not a directory. The $env:HOME is a valid variable available in the hive environment.

The TBLPROPERTIES clause is used to add the creator name while creating a table.

The TBLPROPERTIES is added like −

TBLPROPERTIES(‘creator’= Kishan’)

Comments

Popular posts from this blog

MongoDB - Data Modelling

SPARK - Deployment

SQOOP