Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 212 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 212
Topic #: 1

[All Certified Data Engineer Professional Questions]

A team of data engineers are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks. One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

A. Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.
B. Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.
C. Maintain data quality rules in a separate Databricks notebook that each DLT notebook or file can import as a library.
D. Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.

Show Suggested Answer

Suggested Answer: D 🗳️

by benni_ale at Dec. 8, 2024, 6:17 p.m.

Comments

Submit Cancel

benni_ale

Highly Voted 11 months, 3 weeks ago

Selected Answer: D

https://docs.databricks.com/en/delta-live-tables/expectations.html "You can maintain data quality rules separately from your pipeline implementations. Databricks recommends storing the rules in a Delta table with each rule categorized by a tag."

upvoted 7 times

...

stopthisnow

Most Recent 10 hours, 15 minutes ago

Selected Answer: D

Both C and D are correct. Databricks recommends D

upvoted 1 times

...

Ral17

1 day, 15 hours ago

Selected Answer: C

Why Option C is Correct DLT native pattern for code reuse — Create a shared notebook with common expectation functions (e.g., check_not_null(), check_valid_date()), then import it into each DLT pipeline notebook using %run or Python imports — exactly how DLT is designed to work

upvoted 1 times

...

gizzamo

2 months, 2 weeks ago

Selected Answer: C

Reasoning: DLT expectations are often repeated across multiple tables (e.g., column not null, value ranges, valid enums). To avoid duplication, the best practice is to factor them out into reusable functions or libraries, and then import them into multiple DLT notebooks. Databricks supports importing shared Python modules or notebooks to centralize and reuse logic.

upvoted 2 times

...

ealpuche

2 months, 2 weeks ago

Selected Answer: C

C, For Sure

upvoted 1 times

...

Billybob0604

3 months, 3 weeks ago

Selected Answer: C

The best practice for code reuse is write them once in a shared utility notebook

upvoted 2 times

...

RajeshMP2023

3 months, 4 weeks ago

Selected Answer: C

Reusability of Data Quality Rules: By maintaining the data quality rules in a separate notebook, the team can centralize the logic for expectations and reuse them across multiple tables and pipelines. This approach ensures consistency and reduces duplication of code. Importing as a Library: Databricks allows you to modularize code by creating reusable notebooks or Python files. These can be imported into other notebooks or DLT pipelines, making it easy to apply the same set of expectations across multiple tables.

upvoted 1 times

...

gloomy_marmot

4 months ago

Selected Answer: D

https://docs.databricks.com/aws/en/dlt/expectation-patterns#portable-and-reusable-expectations Expectations should be stored in the table

upvoted 1 times

...

happyhelppy

4 months ago

Selected Answer: C

D answer is confusing when it comes to use parameter as schema. Having expectations defined as python module and later imported is described in doc: https://docs.databricks.com/aws/en/dlt/expectation-patterns?language=Python%C2%A0Module#portable-and-reusable-expectations

upvoted 1 times

...

KadELbied

6 months, 3 weeks ago

Selected Answer: D

Suretly D

upvoted 1 times

...

lakime

8 months, 1 week ago

Selected Answer: C

Initially C, currently D

upvoted 1 times

...

arekm

10 months, 4 weeks ago

Selected Answer: D

D is what Databricks suggests as of now

upvoted 1 times

...

Thameur01

11 months, 3 weeks ago

Selected Answer: C

To reuse repetitive data quality rules across multiple tables in a Delta Live Tables (DLT) pipeline, the most efficient approach is to maintain these rules in a separate notebook or Python module and import them where needed. This promotes code reusability, maintainability, and consistency

upvoted 2 times

...