exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 212 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 212
Topic #: 1
[All Certified Data Engineer Professional Questions]

A team of data engineers are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks. One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

  • A. Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.
  • B. Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.
  • C. Maintain data quality rules in a separate Databricks notebook that each DLT notebook or file can import as a library.
  • D. Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
benni_ale
Highly Voted 11 months, 3 weeks ago
Selected Answer: D
https://docs.databricks.com/en/delta-live-tables/expectations.html "You can maintain data quality rules separately from your pipeline implementations. Databricks recommends storing the rules in a Delta table with each rule categorized by a tag."
upvoted 7 times
...
stopthisnow
Most Recent 10 hours, 15 minutes ago
Selected Answer: D
Both C and D are correct. Databricks recommends D
upvoted 1 times
...
Ral17
1 day, 15 hours ago
Selected Answer: C
Why Option C is Correct DLT native pattern for code reuse — Create a shared notebook with common expectation functions (e.g., check_not_null(), check_valid_date()), then import it into each DLT pipeline notebook using %run or Python imports — exactly how DLT is designed to work
upvoted 1 times
...
gizzamo
2 months, 2 weeks ago
Selected Answer: C
Reasoning: DLT expectations are often repeated across multiple tables (e.g., column not null, value ranges, valid enums). To avoid duplication, the best practice is to factor them out into reusable functions or libraries, and then import them into multiple DLT notebooks. Databricks supports importing shared Python modules or notebooks to centralize and reuse logic.
upvoted 2 times
...
ealpuche
2 months, 2 weeks ago
Selected Answer: C
C, For Sure
upvoted 1 times
...
Billybob0604
3 months, 3 weeks ago
Selected Answer: C
The best practice for code reuse is write them once in a shared utility notebook
upvoted 2 times
...
RajeshMP2023
3 months, 4 weeks ago
Selected Answer: C
Reusability of Data Quality Rules: By maintaining the data quality rules in a separate notebook, the team can centralize the logic for expectations and reuse them across multiple tables and pipelines. This approach ensures consistency and reduces duplication of code. Importing as a Library: Databricks allows you to modularize code by creating reusable notebooks or Python files. These can be imported into other notebooks or DLT pipelines, making it easy to apply the same set of expectations across multiple tables.
upvoted 1 times
...
gloomy_marmot
4 months ago
Selected Answer: D
https://docs.databricks.com/aws/en/dlt/expectation-patterns#portable-and-reusable-expectations Expectations should be stored in the table
upvoted 1 times
...
happyhelppy
4 months ago
Selected Answer: C
D answer is confusing when it comes to use parameter as schema. Having expectations defined as python module and later imported is described in doc: https://docs.databricks.com/aws/en/dlt/expectation-patterns?language=Python%C2%A0Module#portable-and-reusable-expectations
upvoted 1 times
...
KadELbied
6 months, 3 weeks ago
Selected Answer: D
Suretly D
upvoted 1 times
...
lakime
8 months, 1 week ago
Selected Answer: C
Initially C, currently D
upvoted 1 times
...
arekm
10 months, 4 weeks ago
Selected Answer: D
D is what Databricks suggests as of now
upvoted 1 times
...
Thameur01
11 months, 3 weeks ago
Selected Answer: C
To reuse repetitive data quality rules across multiple tables in a Delta Live Tables (DLT) pipeline, the most efficient approach is to maintain these rules in a separate notebook or Python module and import them where needed. This promotes code reusability, maintainability, and consistency
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...