Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 166 discussion

Exam question from Amazon's AWS Certified Data Engineer - Associate DEA-C01

Question #: 166
Topic #: 1

[All AWS Certified Data Engineer - Associate DEA-C01 Questions]

A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.

The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.

Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Choose two.)

A. Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.
B. Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.
C. Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.
D. Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.
E. Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on S3 events that the SQS queue receives.

Show Suggested Answer

Suggested Answer: AB 🗳️

by ae35a02 at Oct. 28, 2024, 2:54 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

data025

1 day, 23 hours ago

Selected Answer: AB

A = primary mechanism (event-driven incremental updates) B = fallback mechanism (periodic incremental updates)

upvoted 1 times

...

AlejandroU

1 month, 2 weeks ago

Selected Answer: AB

A,B. Answer A and B. A) S3 event-based Glue crawler: configure S3 to send object events to SQS, and set the crawler to “Crawl based on events” using that queue. The crawler ingests only changes (incremental) and avoids full listings. B) Time-based schedule. Event-based crawlers still run on a schedule to poll SQS. If there are events, they update the Catalog.

upvoted 1 times

...

AminTriton

3 months, 1 week ago

Selected Answer: AB

C has higher operational effort: you’d need to write/maintain Lambda code for schema inference, catalog updates, and error handling. Glue already provides managed crawlers.

upvoted 1 times

...

Ell89

9 months ago

Selected Answer: AC

• A leverages the event-driven capability of Glue Crawlers. • C uses AWS Lambda for direct and real-time updates to the Data Catalog. • This combination ensures incremental updates are made only when changes occur, reducing costs and operational complexity.

upvoted 1 times

...

YUICH

10 months ago

Selected Answer: AB

(A) S3 Event-Based Crawler: Automatically triggers incremental catalog updates whenever new data arrives in the S3 bucket, reducing the need for custom code and manual intervention. (B) Time-Based Schedule: Periodically runs the crawler to catch any missed events and keep the data catalog accurate and up to date. Using both methods minimizes operational overhead while ensuring comprehensive and reliable incremental updates.

upvoted 1 times

...

axantroff

11 months ago

Selected Answer: AB

Check out the design pattern documentation for this case. There's no need for Lambda here, so option C should be excluded. Option B seems viable, along with option A (A is the obvious choice for me). https://aws.amazon.com/blogs/big-data/run-aws-glue-crawlers-using-amazon-s3-event-notifications/

upvoted 1 times

...

michele_scar

1 year ago

Selected Answer: AC

B and D are wrong due too "Manually" and "Scheduling". E is too much for this use case

upvoted 3 times

...

tucobbad

1 year ago

Selected Answer: AC

- Option A suggests creating an S3 event-based AWS Glue crawler to consume events from the SQS queue. This option is appropriate as it allows the crawler to automatically respond to events, thereby reducing manual intervention and ensuring timely updates to the Data Catalog - Option C involves using an AWS Lambda function to directly update the Data Catalog based on S3 events received from the SQS queue. This is a strong candidate as it automates the update process without the need for manual scheduling or intervention, thus minimizing operational overhead. AWS Glue Crawlers can consume events from an SQS queue: https://docs.aws.amazon.com/glue/latest/dg/crawler-s3-event-notifications.html

upvoted 3 times

...

pikuantne

1 year ago

Selected Answer: AB

Based on this article (Option 1 for the architecture) it should be AB: 1. Run the crawler on a schedule. 2. Crawler polls for object create events in the SQS queue 3a. If there are events, crawler updates the Data Catalog 3b. If not, crawler stops

upvoted 3 times

...

ae35a02

1 year, 1 month ago

Selected Answer: BC

AWS Glue Crawlers can not consupe events from an SQS queue D introduce a manual operation E introduce more complexity so BC

upvoted 1 times

tucobbad

1 year ago

Answer is A and C In fact, AWS Glue Crawlers can consume events indeed: https://docs.aws.amazon.com/glue/latest/dg/crawler-s3-event-notifications.html

upvoted 2 times

...