exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 113 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 113
Topic #: 1
[All Certified Data Engineer Professional Questions]

A Delta Lake table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources.

Immediately after each update succeeds, the data engineering team would like to determine the difference between the new version and the previous version of the table.

Given the current implementation, which method can be used?

  • A. Execute a query to calculate the difference between the new version and the previous version using Delta Lake’s built-in versioning and lime travel functionality.
  • B. Parse the Delta Lake transaction log to identify all newly written data files.
  • C. Parse the Spark event logs to identify those rows that were updated, inserted, or deleted.
  • D. Execute DESCRIBE HISTORY customer_churn_params to obtain the full operation metrics for the update, including a log of all records that have been added or modified.
  • E. Use Delta Lake’s change data feed to identify those records that have been updated, inserted, or deleted.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
_Lukas_
1 day, 11 hours ago
Selected Answer: A
Correct: A - Since Delta Lake automatically maintains a transaction log with version history, you can easily query the table as it existed before the update (using VERSION AS OF or TIMESTAMP AS OF) and compare it to the current version. This allows you to run standard SQL set operations (like EXCEPT or ANTI JOIN) to identify exactly which rows were added, removed, or changed between the two versions without needing to parse raw logs or enable additional features
upvoted 1 times
...
KadELbied
6 months, 3 weeks ago
Selected Answer: A
Suretly A
upvoted 1 times
...
arekm
10 months, 4 weeks ago
Selected Answer: A
D - see the discussion under Jugiboss comment.
upvoted 1 times
...
Sriramiyer92
11 months, 2 weeks ago
Selected Answer: A
CDF is particularly useful in Incremental loads. In our case it is overwrite. Hence A.
upvoted 2 times
...
Jugiboss
1 year, 1 month ago
Selected Answer: E
he best method to determine the difference between the new version and the previous version of the customer_churn_params table in Delta Lake is: E. Use Delta Lake’s change data feed to identify those records that have been updated, inserted, or deleted. This approach leverages Delta Lake's built-in functionality to track changes at the record level, providing a clear view of what has changed between versions.
upvoted 1 times
Jugiboss
1 year, 1 month ago
Nevermind, it's A
upvoted 2 times
...
arekm
10 months, 4 weeks ago
It is the best method so long as it is enabled, and we don't know. Moreover, CDF is designed for a small portion of the data being changed, not a full overwrite.
upvoted 1 times
...
...
cales
1 year, 1 month ago
Selected Answer: E
Change data feed allows to check for changes between versions
upvoted 1 times
benni_ale
1 year ago
There is no info about cdf being enabled on the table
upvoted 1 times
...
...
robodog
1 year, 3 months ago
Selected Answer: A
Answer is A. The easy way to get the difference between those tables is by travel time by version
upvoted 2 times
...
HelixAbdu
1 year, 4 months ago
Answer is A. There is no clue that CDF is enabled for the table
upvoted 2 times
...
c00ccb7
1 year, 4 months ago
Selected Answer: A
Answer A
upvoted 3 times
...
Deb9753
1 year, 5 months ago
Answer : E
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...