Hive interview questions Part-3

 1. Explain the functionality of ObjectInspector.

ObjectInspector helps analyze the internal structure of a row object and the individual structure of columns in Hive. It also provides a uniform way to access complex objects that can be stored in multiple formats in the memory.

  • An instance of Java class
  • A standard Java object
  • A lazily initialized object

ObjectInspector tells the structure of the object and also the ways to access the internal fields inside the object.

2. Whenever we run a Hive query, a new metastore_db is created. Why?

A local metastore is created when we run Hive in an embedded mode. Before creating, it checks whether the metastore exists or not, and this metastore property is defined in the configuration file, hive-site.xml. The property is:

javax.jdo.option.ConnectionURL

with the default value:

jdbc:derby:;databaseName=metastore_db;create=true

Therefore, we have to change the behavior of the location to an absolute path so that from that location the metastore can be used.


3.Differentiate between Hive and HBase.

HiveHBase
Enables most SQL queriesDoes not allow SQL queries
Operations do not run in real timeOperations run in real time
A data warehouse frameworkA NoSQL database
Runs on top of MapReduceRuns on top of HDFS

4. How can we access the subdirectories recursively?

By using the below commands, we can access subdirectories recursively in Hive:

hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be pointed to the higher level directory, and this is suitable for the directory structure:

/data/country/state/city/

5.  What are the uses of Hive Explode?

Hadoop Developers consider an array as their input and convert it into a separate table row. To convert complicated data types into desired table formats, Hive uses Explode.

6.  What is the available mechanism for connecting applications when we run Hive as a server?

  • Thrift Client: Using Thrift, we can call Hive commands from various programming languages, such as C++, PHP, Java, Python, and Ruby.
  • JDBC Driver: JDBC Driver enables accessing data with JDBC support, by translating calls from an application into SQL and passing the SQL queries to the Hive engine.
  • ODBC Driver: It implements the ODBC API standard for the Hive DBMS, enabling ODBC-compliant applications to interact seamlessly with Hive.

7. How do we write our own custom SerDe?

Mostly, end-users prefer writing a Deserializer instead of using SerDe as they want to read their own data format instead of writing to it, e.g., RegexDeserializer deserializes data with the help of the configuration parameter ‘regex’ and with a list of column names.

If our SerDe supports DDL (i.e., SerDe with parameterized columns and column types), we will probably implement a protocol based on DynamicSerDe, instead of writing a SerDe. This is because the framework passes DDL to SerDe through the ‘Thrift DDL’ format and it’s totally unnecessary to write a “Thrift DDL” parser.

8. Mention various date types supported by Hive.

The timestamp data type stores date in the java.sql.timestamp format.

Three collection data types in Hive are:

  • Arrays
  • Maps
  • Structs

9. Can we run UNIX shell commands from Hive? Can Hive queries be executed from script files? If yes, how? Give an example.

Yes, we can run UNIX shell commands from Hive using an ‘!‘ mark before the command. For example, !pwd at Hive prompt will display the current directory.
We can execute Hive queries from the script files using the source command.



Comments

Popular posts from this blog

MongoDB - Data Modelling

SPARK - Deployment

SQOOP