A reduce action converts an RDD to a single value by applying recursively the provided (in argument) function on the elements of an RDD until only one value is left. The provided function must be commutative and associative – the order of arguments or in what way we apply the function should not make difference.
The following diagram shows the process of applying “sum” reduce function on an RDD containing 1, 2, 3, 4.
Data in MongoDB has a flexible schema.documents in the same collection. They do not need to have the same set of fields or structure Common fields in a collection’s documents may hold different types of data. Data Model Design MongoDB provides two types of data models: — Embedded data model and Normalized data model. Based on the requirement, you can use either of the models while preparing your document. Embedded Data Model In this model, you can have (embed) all the related data in a single document, it is also known as de-normalized data model. For example, assume we are getting the details of employees in three different documents namely, Personal_details, Contact and, Address, you can embed all the three documents in a single one as shown below − { _id : , Emp_ID : "10025AE336" Personal_details :{ First_Name : "Kishan" , Last_Name : "choudhary" , Date_Of_Birth : "1995-09-26" }, Contact : { e - mail : "
Spark application, using spark-submit, is a shell command used to deploy the Spark application on a cluster. It uses all respective cluster managers through a uniform interface. Therefore, you do not have to configure your application for each one. Example Let us take the same example of word count, we used before, using shell commands. Here, we consider the same example as a spark application. Sample Input The following text is the input data and the file named is in.txt . people are not as beautiful as they look, as they walk or as they talk. they are only as beautiful as they love, as they care as they share. Look at the following program − SparkWordCount.scala import org . apache . spark . SparkContext import org . apache . spark . SparkContext . _ import org . apache . spark . _ object SparkWordCount { def main ( args : Array [ String ]) { val sc = new SparkContext ( "local" , "Word Count" , "/usr/local/spark" ,
What is Apache Sqoop? Many of us still wonder what Apache Sqoop is, its architecture, features, uses, and how it is related to big data. In this Sqoop write up, we will talk about everything along with its requirements. Let’s get started! Apache Sqoop is a big data tool for transferring data between Hadoop and relational database servers. Sqoop is used to transfer data from RDBMS (relational database management system) like MySQL and Oracle to HDFS (Hadoop Distributed File System). Big Data Sqoop can also be used to transform data in Hadoop MapReduce and then export it into RDBMS. Sqoop is a data collection and ingestion tool used to import and export data between RDBMS and HDFS. SQOOP = SQL + HADOOP Why do we need Big Data Sqoop? Sqoop Big Data Tool is primarily used for bulk data transfer to and from relational databases or mainframes. Sqoop in Big Data can import from entire tables or allow the user to specify predicates to restrict data selection. You can write directly to HDFS
Comments
Post a Comment