Real Databricks Associate-Developer-Apache-Spark Exam Questions Study Guide [Q54-Q69]

Share

Real Databricks Associate-Developer-Apache-Spark Exam Questions Study Guide

Updated and Accurate Associate-Developer-Apache-Spark Questions for passing the exam Quickly


How to become Databricks Associate Developer Apache Spark certified

  • The Databricks Associate Developer Apache Spark certification is a great opportunity to get hands-on experience with the most popular big data platform. It's a two-day certification course that teaches you how to use the latest version of Apache Spark. Databricks Associate Developer Apache Spark exam dumps are the key of success.

  • To create applications that run on the Databricks platform. You'll learn how to use the Spark shell to interact with the data stored in Databricks, the Databricks DataFrame API, and the Databricks Query Language (DQL). You'll also gain exposure to a variety of machine learning and data mining algorithms that can be used to solve real-world problems.


What is the Databricks Associate Developer Apache Spark Exam?

The Databricks Associate Developer Apache Spark Exam is a certification that can be earned by anyone who has successfully completed the Databricks Associate Developer Apache Spark Certification Training. The exam covers all the material that was covered in the training. The exam is designed to test your knowledge of the concepts, skills, and abilities that you learned during the course.

Do you want to become a Data Engineer or a Spark Architect? If so, then the Databricks Associate Developer Apache Spark Exam is a must-pass. The Databricks Associate Developer Apache Spark Exam is designed to help you develop a complete understanding of the technology used by the Databricks platform. You will learn about the basics of Spark, including the Spark programming language, Spark SQL, Spark Streaming, and the Spark ecosystem. Databricks Associate Developer Apache Spark exam dumps are the choice of champions.

The Databricks Associate Developer Apache Spark Exam is a test that aims to assess whether you have the knowledge required to become a certified Apache Spark developer. The Databricks Associate Developer Apache Spark Exam consists of two parts: the first part tests your knowledge of the fundamentals of the Apache Spark framework and the second part tests your ability to apply this knowledge. This post will help you get a head start in preparing for the Databricks Associate Developer Apache Spark Exam. The executors disk division actions documentation frame for the executor syntax variables object return allowed partition for the fit output transformation to induce couple of manager and evaluated expected safely, lazily named nodes broadcast operations for correctly mock driver.


The Exam cost of Databricks Associate Developer Apache Spark Exam?

The cost of the Databricks Associate Developer Apache Spark Exam is 200 USD per attempt.

 

NEW QUESTION 54
Which of the following describes tasks?

  • A. A task is a command sent from the driver to the executors in response to a transformation.
  • B. A task is a collection of rows.
  • C. Tasks transform jobs into DAGs.
  • D. Tasks get assigned to the executors by the driver.
  • E. A task is a collection of slots.

Answer: D

Explanation:
Explanation
Tasks get assigned to the executors by the driver.
Correct! Or, in other words: Executors take the tasks that they were assigned to by the driver, run them over partitions, and report the their outcomes back to the driver.
Tasks transform jobs into DAGs.
No, this statement disrespects the order of elements in the Spark hierarchy. The Spark driver transforms jobs into DAGs. Each job consists of one or more stages. Each stage contains one or more tasks.
A task is a collection of rows.
Wrong. A partition is a collection of rows. Tasks have little to do with a collection of rows. If anything, a task processes a specific partition.
A task is a command sent from the driver to the executors in response to a transformation.
Incorrect. The Spark driver does not send anything to the executors in response to a transformation, since transformations are evaluated lazily. So, the Spark driver would send tasks to executors only in response to actions.
A task is a collection of slots.
No. Executors have one or more slots to process tasks and each slot can be assigned a task.

 

NEW QUESTION 55
Which of the following statements about executors is correct?

  • A. Each node hosts a single executor.
  • B. Executors store data in memory only.
  • C. An executor can serve multiple applications.
  • D. Executors stop upon application completion by default.
  • E. Executors are launched by the driver.

Answer: D

Explanation:
Explanation
Executors stop upon application completion by default.
Correct. Executors only persist during the lifetime of an application.
A notable exception to that is when Dynamic Resource Allocation is enabled (which it is not by default). With Dynamic Resource Allocation enabled, executors are terminated when they are idle, independent of whether the application has been completed or not.
An executor can serve multiple applications.
Wrong. An executor is always specific to the application. It is terminated when the application completes (exception see above).
Each node hosts a single executor.
No. Each node can host one or more executors.
Executors store data in memory only.
No. Executors can store data in memory or on disk.
Executors are launched by the driver.
Incorrect. Executors are launched by the cluster manager on behalf of the driver.
More info: Job Scheduling - Spark 3.1.2 Documentation, How Applications are Executed on a Spark Cluster | Anatomy of a Spark Application | InformIT, and Spark Jargon for Starters. This blog is to clear some of the... | by Mageswaran D | Medium

 

NEW QUESTION 56
Which of the following code blocks reads all CSV files in directory filePath into a single DataFrame, with column names defined in the CSV file headers?
Content of directory filePath:
1._SUCCESS
2._committed_2754546451699747124
3._started_2754546451699747124
4.part-00000-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-298-1-c000.csv.gz
5.part-00001-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-299-1-c000.csv.gz
6.part-00002-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-300-1-c000.csv.gz
7.part-00003-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-301-1-c000.csv.gz spark.option("header",True).csv(filePath)

  • A. spark.read().option("header",True).load(filePath)
  • B. spark.read.format("csv").option("header",True).option("compression","zip").load(filePath)
  • C. spark.read.load(filePath)
  • D. spark.read.format("csv").option("header",True).load(filePath)

Answer: D

Explanation:
Explanation
The files in directory filePath are partitions of a DataFrame that have been exported using gzip compression.
Spark automatically recognizes this situation and imports the CSV files as separate partitions into a single DataFrame. It is, however, necessary to specify that Spark should load the file headers in the CSV with the header option, which is set to False by default.

 

NEW QUESTION 57
Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

  • A. DataFrame.repartition(12)
  • B. DataFrame.repartition(6)
  • C. DataFrame.coalesce(6, shuffle=True)
  • D. DataFrame.coalesce(6)
  • E. DataFrame.coalesce(6).shuffle()

Answer: B

Explanation:
Explanation
DataFrame.repartition(6)
Correct. repartition() always triggers a full shuffle (different from coalesce()).
DataFrame.repartition(12)
No, this would just leave the DataFrame with 12 partitions and not 6.
DataFrame.coalesce(6)
coalesce does not perform a full shuffle of the data. Whenever you see "full shuffle", you know that you are not dealing with coalesce(). While coalesce() can perform a partial shuffle when required, it will try to minimize shuffle operations, so the amount of data that is sent between executors.
Here, 12 partitions can easily be repartitioned to be 6 partitions simply by stitching every two partitions into one.
DataFrame.coalesce(6, shuffle=True) and DataFrame.coalesce(6).shuffle() These statements are not valid Spark API syntax.
More info: Spark Repartition & Coalesce - Explained and Repartition vs Coalesce in Apache Spark - Rock the JVM Blog

 

NEW QUESTION 58
Which of the following code blocks generally causes a great amount of network traffic?

  • A. DataFrame.select()
  • B. DataFrame.count()
  • C. DataFrame.collect()
  • D. DataFrame.rdd.map()
  • E. DataFrame.coalesce()

Answer: C

Explanation:
Explanation
DataFrame.collect() sends all data in a DataFrame from executors to the driver, so this generally causes a great amount of network traffic in comparison to the other options listed.
DataFrame.coalesce() just reduces the number of partitions and generally aims to reduce network traffic in comparison to a full shuffle.
DataFrame.select() is evaluated lazily and, unless followed by an action, does not cause significant network traffic.
DataFrame.rdd.map() is evaluated lazily, it does therefore not cause great amounts of network traffic.
DataFrame.count() is an action. While it does cause some network traffic, for the same DataFrame, collecting all data in the driver would generally be considered to cause a greater amount of network traffic.

 

NEW QUESTION 59
The code block shown below should return a single-column DataFrame with a column named consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
DataFrame itemsDf:
1.+------+----------------------------------+-----------------------------+-------------------+
2.|itemId|itemName |attributes |supplier |
3.+------+----------------------------------+-----------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7.+------+----------------------------------+-----------------------------+-------------------+ Code block:
itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))

  • A. 1. length
    2. regexp_replace
    3. lower
    4. col("itemName")
    5. alias
  • B. 1. lower
    2. regexp_replace
    3. length
    4. "itemName"
    5. alias
  • C. 1. size
    2. regexp_replace
    3. lower
    4. "itemName"
    5. alias
  • D. 1. size
    2. regexp_extract
    3. lower
    4. col("itemName")
    5. alias
  • E. 1. length
    2. regexp_extract
    3. upper
    4. col("itemName")
    5. as

Answer: A

Explanation:
Explanation
Correct code block:
itemsDf.select(length(regexp_replace(lower(col("itemName")), "a|e|i|o|u|\s", "")).alias("consonant_ct")) Returned DataFrame:
+------------+
|consonant_ct|
+------------+
| 19|
| 16|
| 10|
+------------+
This question tries to make you think about the string functions Spark provides and in which order they should be applied. Arguably the most difficult part, the regular expression "a|e|i|o|u|
\s", is not a numbered blank. However, if you are not familiar with the string functions, it may be a good idea to review those before the exam.
The size operator and the length operator can easily be confused. size works on arrays, while length works on strings. Luckily, this is something you can read up about in the documentation.
The code block works by first converting all uppercase letters in column itemName into lowercase (the lower() part). Then, it replaces all vowels by "nothing" - an empty character "" (the regexp_replace() part). Now, only lowercase characters without spaces are included in the DataFrame. Then, per row, the length operator counts these remaining characters. Note that column itemName in itemsDf does not include any numbers or other characters, so we do not need to make any provisions for these. Finally, by using the alias() operator, we rename the resulting column to consonant_ct.
More info:
- lower: pyspark.sql.functions.lower - PySpark 3.1.2 documentation
- regexp_replace: pyspark.sql.functions.regexp_replace - PySpark 3.1.2 documentation
- length: pyspark.sql.functions.length - PySpark 3.1.2 documentation
- alias: pyspark.sql.Column.alias - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2

 

NEW QUESTION 60
Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

  • A. 1.itemsDf.withColumnRenamed("attributes", "feature0")
    2.itemsDf.withColumnRenamed("supplier", "feature1")
  • B. itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")
  • C. itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")
  • D. itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))
  • E. itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

Answer: C

Explanation:
Explanation
itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1") Correct! Spark's DataFrame.withColumnRenamed syntax makes it relatively easy to change the name of a column.
itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1) Incorrect. In this code block, the Python interpreter will try to use attributes and the other column names as variables. Needless to say, they are undefined, and as a result the block will not run.
itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1")) Wrong. The DataFrame.withColumnRenamed() operator takes exactly two string arguments. So, in this answer both using col() and using four arguments is wrong.
itemsDf.withColumnRenamed("attributes", "feature0")
itemsDf.withColumnRenamed("supplier", "feature1")
No. In this answer, the returned DataFrame will only have column supplier be renamed, since the result of the first line is not written back to itemsDf.
itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1") Incorrect. While withColumn works for adding and naming new columns, you cannot use it to rename existing columns.
More info: pyspark.sql.DataFrame.withColumnRenamed - PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 3

 

NEW QUESTION 61
Which of the following describes the conversion of a computational query into an execution plan in Spark?

  • A. Depending on whether DataFrame API or SQL API are used, the physical plan may differ.
  • B. Spark uses the catalog to resolve the optimized logical plan.
  • C. The catalog assigns specific resources to the optimized memory plan.
  • D. The executed physical plan depends on a cost optimization from a previous stage.
  • E. The catalog assigns specific resources to the physical plan.

Answer: D

Explanation:
Explanation
The executed physical plan depends on a cost optimization from a previous stage.
Correct! Spark considers multiple physical plans on which it performs a cost analysis and selects the final physical plan in accordance with the lowest-cost outcome of that analysis. That final physical plan is then executed by Spark.
Spark uses the catalog to resolve the optimized logical plan.
No. Spark uses the catalog to resolve the unresolved logical plan, but not the optimized logical plan. Once the unresolved logical plan is resolved, it is then optimized using the Catalyst Optimizer.
The optimized logical plan is the input for physical planning.
The catalog assigns specific resources to the physical plan.
No. The catalog stores metadata, such as a list of names of columns, data types, functions, and databases.
Spark consults the catalog for resolving the references in a logical plan at the beginning of the conversion of the query into an execution plan. The result is then an optimized logical plan.
Depending on whether DataFrame API or SQL API are used, the physical plan may differ.
Wrong - the physical plan is independent of which API was used. And this is one of the great strengths of Spark!
The catalog assigns specific resources to the optimized memory plan.
There is no specific "memory plan" on the journey of a Spark computation.
More info: Spark's Logical and Physical plans ... When, Why, How and Beyond. | by Laurent Leturgez | datalex | Medium

 

NEW QUESTION 62
Which of the following statements about Spark's execution hierarchy is correct?

  • A. In Spark's execution hierarchy, manifests are one layer above jobs.
  • B. In Spark's execution hierarchy, a job may reach over multiple stage boundaries.
  • C. In Spark's execution hierarchy, executors are the smallest unit.
  • D. In Spark's execution hierarchy, tasks are one layer above slots.
  • E. In Spark's execution hierarchy, a stage comprises multiple jobs.

Answer: B

Explanation:
Explanation
In Spark's execution hierarchy, a job may reach over multiple stage boundaries.
Correct. A job is a sequence of stages, and thus may reach over multiple stage boundaries.
In Spark's execution hierarchy, tasks are one layer above slots.
Incorrect. Slots are not a part of the execution hierarchy. Tasks are the lowest layer.
In Spark's execution hierarchy, a stage comprises multiple jobs.
No. It is the other way around - a job consists of one or multiple stages.
In Spark's execution hierarchy, executors are the smallest unit.
False. Executors are not a part of the execution hierarchy. Tasks are the smallest unit!
In Spark's execution hierarchy, manifests are one layer above jobs.
Wrong. Manifests are not a part of the Spark ecosystem.

 

NEW QUESTION 63
Which of the following code blocks returns a single-column DataFrame showing the number of words in column supplier of DataFrame itemsDf?
Sample of DataFrame itemsDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+

  • A. itemsDf.split("supplier", " ").size()
  • B. spark.select(size(split(col(supplier), " ")))
  • C. itemsDf.select(word_count("supplier"))
  • D. itemsDf.split("supplier", " ").count()
  • E. itemsDf.select(size(split("supplier", " ")))

Answer: E

Explanation:
Explanation
Output of correct code block:
+----------------------------+
|size(split(supplier, , -1))|
+----------------------------+
| 3|
| 1|
| 3|
+----------------------------+
This question shows a typical use case for the split command: Splitting a string into words. An additional difficulty is that you are asked to count the words. Although it is tempting to use the count method here, the size method (as in: size of an array) is actually the correct one to use. Familiarize yourself with the split and the size methods using the linked documentation below.
More info:
Split method: pyspark.sql.functions.split - PySpark 3.1.2 documentation Size method: pyspark.sql.functions.size - PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 2

 

NEW QUESTION 64
Which of the following describes characteristics of the Spark driver?

  • A. The Spark driver's responsibility includes scheduling queries for execution on worker nodes.
  • B. The Spark driver processes partitions in an optimized, distributed fashion.
  • C. The Spark driver requests the transformation of operations into DAG computations from the worker nodes.
  • D. If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.
  • E. In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

Answer: E

Explanation:
Explanation
The Spark driver requests the transformation of operations into DAG computations from the worker nodes.
No, the Spark driver transforms operations into DAG computations itself.
If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.
No. There is always a single driver per application, but one or more executors.
The Spark driver processes partitions in an optimized, distributed fashion.
No, this is what executors do.
In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.
Wrong. In a non-interactive Spark application, you need to create the SparkSession object. In an interactive Spark shell, the Spark driver instantiates the object for you.

 

NEW QUESTION 65
Which of the following is a characteristic of the cluster manager?

  • A. The cluster manager receives input from the driver through the SparkContext.
  • B. Each cluster manager works on a single partition of data.
  • C. In client mode, the cluster manager runs on the edge node.
  • D. The cluster manager does not exist in standalone mode.
  • E. The cluster manager transforms jobs into DAGs.

Answer: A

Explanation:
Explanation
The cluster manager receives input from the driver through the SparkContext.
Correct. In order for the driver to contact the cluster manager, the driver launches a SparkContext. The driver then asks the cluster manager for resources to launch executors.
In client mode, the cluster manager runs on the edge node.
No. In client mode, the cluster manager is independent of the edge node and runs in the cluster.
The cluster manager does not exist in standalone mode.
Wrong, the cluster manager exists even in standalone mode. Remember, standalone mode is an easy means to deploy Spark across a whole cluster, with some limitations. For example, in standalone mode, no other frameworks can run in parallel with Spark. The cluster manager is part of Spark in standalone deployments however and helps launch and maintain resources across the cluster.
The cluster manager transforms jobs into DAGs.
No, transforming jobs into DAGs is the task of the Spark driver.
Each cluster manager works on a single partition of data.
No. Cluster managers do not work on partitions directly. Their job is to coordinate cluster resources so that they can be requested by and allocated to Spark drivers.
More info: Introduction to Core Spark Concepts * BigData

 

NEW QUESTION 66
The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.
Find the error.
Code block:
transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

  • A. The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.
  • B. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.
  • C. The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.
  • D. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").
  • E. The "outer" argument should be eliminated, since "outer" is the default join type.

Answer: B

Explanation:
Explanation
Correct code block:
transactionsDf.join(itemsDf, itemsDf.itemId == transactionsDf.productId, "outer") Static notebook | Dynamic notebook: See test 1 (https://flrs.github.io/spark_practice_tests_code/#1/33.html ,
https://bit.ly/sparkpracticeexams_import_instructions)

 

NEW QUESTION 67
Which of the following describes characteristics of the Spark UI?

  • A. There is a place in the Spark UI that shows the property spark.executor.memory.
  • B. The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.
  • C. Via the Spark UI, workloads can be manually distributed across executors.
  • D. Via the Spark UI, stage execution speed can be modified.
  • E. Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Answer: A

Explanation:
Explanation
There is a place in the Spark UI that shows the property spark.executor.memory.
Correct, you can see Spark properties such as spark.executor.memory in the Environment tab.
Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.
Wrong - Jobs, Stages, Storage, Executors, and SQL are all tabs in the Spark UI. DAGs can be inspected in the
"Jobs" tab in the job details or in the Stages or SQL tab, but are not a separate tab.
Via the Spark UI, workloads can be manually distributed across distributors.
No, the Spark UI is meant for inspecting the inner workings of Spark which ultimately helps understand, debug, and optimize Spark transactions.
Via the Spark UI, stage execution speed can be modified.
No, see above.
The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.
No, there is no Scheduler tab.

 

NEW QUESTION 68
The code block displayed below contains an error. The code block should return all rows of DataFrame transactionsDf, but including only columns storeId and predError. Find the error.
Code block:
spark.collect(transactionsDf.select("storeId", "predError"))

  • A. Columns storeId and predError need to be represented as a Python list, so they need to be wrapped in brackets ([]).
  • B. The take method should be used instead of the collect method.
  • C. Instead of collect, collectAsRows needs to be called.
  • D. Instead of select, DataFrame transactionsDf needs to be filtered using the filter operator.
  • E. The collect method is not a method of the SparkSession object.

Answer: E

Explanation:
Explanation
Correct code block:
transactionsDf.select("storeId", "predError").collect()
collect() is a method of the DataFrame object.
More info: pyspark.sql.DataFrame.collect - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2

 

NEW QUESTION 69
......

Prepare Important Exam with Associate-Developer-Apache-Spark Exam Dumps: https://actualtests.torrentexam.com/Associate-Developer-Apache-Spark-exam-latest-torrent.html