Spark interview question part-2

1. Does Spark provide the storage layer too?

No, it doesn’t provide storage layer but it lets you use many data sources. It provides the ability to read from almost every popular file systems such as HDFS, Cassandra, Hive, HBase, SQL servers.

2. Where does Spark Driver run on Yarn?

If you are submitting a job with –master client, the Spark driver runs on the client’s machine. If you are submitting a job with –master yarn-cluster, the Spark driver would run inside a YARN container.

3. To use Spark on an existing Hadoop Cluster, do we need to install Spark on all nodes of Hadoop?

Since Spark runs as an application on top of Yarn, it utilizes yarn for the execution of its commands over the cluster’s nodes. So, you do not need to install the Spark on all nodes. When a job is submitted, the Spark will be installed temporarily on all nodes on which execution is needed.

4. What is sparkContext?

SparkContext is the entry point to Spark. Using sparkContext you create RDDs which provided various ways of churning data.

5. What is DAG – Directed Acyclic Graph?

Directed Acyclic Graph – DAG is a graph data structure having edges which are directional and do not have any loops or cycles.

People use DAG almost all the time. Let’s take an example of getting ready for office.

Search This Blog

Kishan’s big data world

Spark interview question part-2

Comments

Post a Comment

Popular posts from this blog

MongoDB - Data Modelling

Spark interview question part-1

SPARK - Deployment