exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 166 discussion

A data engineer configured an AWS Glue Data Catalog for data that is stored in Amazon S3 buckets. The data engineer needs to configure the Data Catalog to receive incremental updates.

The data engineer sets up event notifications for the S3 bucket and creates an Amazon Simple Queue Service (Amazon SQS) queue to receive the S3 events.

Which combination of steps should the data engineer take to meet these requirements with LEAST operational overhead? (Choose two.)

  • A. Create an S3 event-based AWS Glue crawler to consume events from the SQS queue.
  • B. Define a time-based schedule to run the AWS Glue crawler, and perform incremental updates to the Data Catalog.
  • C. Use an AWS Lambda function to directly update the Data Catalog based on S3 events that the SQS queue receives.
  • D. Manually initiate the AWS Glue crawler to perform updates to the Data Catalog when there is a change in the S3 bucket.
  • E. Use AWS Step Functions to orchestrate the process of updating the Data Catalog based on S3 events that the SQS queue receives.
Show Suggested Answer Hide Answer
Suggested Answer: AB 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
data025
1 day, 23 hours ago
Selected Answer: AB
A = primary mechanism (event-driven incremental updates) B = fallback mechanism (periodic incremental updates)
upvoted 1 times
...
AlejandroU
1 month, 2 weeks ago
Selected Answer: AB
A,B. Answer A and B. A) S3 event-based Glue crawler: configure S3 to send object events to SQS, and set the crawler to “Crawl based on events” using that queue. The crawler ingests only changes (incremental) and avoids full listings. B) Time-based schedule. Event-based crawlers still run on a schedule to poll SQS. If there are events, they update the Catalog.
upvoted 1 times
...
AminTriton
3 months, 1 week ago
Selected Answer: AB
C has higher operational effort: you’d need to write/maintain Lambda code for schema inference, catalog updates, and error handling. Glue already provides managed crawlers.
upvoted 1 times
...
Ell89
9 months ago
Selected Answer: AC
• A leverages the event-driven capability of Glue Crawlers. • C uses AWS Lambda for direct and real-time updates to the Data Catalog. • This combination ensures incremental updates are made only when changes occur, reducing costs and operational complexity.
upvoted 1 times
...
YUICH
10 months ago
Selected Answer: AB
(A) S3 Event-Based Crawler: Automatically triggers incremental catalog updates whenever new data arrives in the S3 bucket, reducing the need for custom code and manual intervention. (B) Time-Based Schedule: Periodically runs the crawler to catch any missed events and keep the data catalog accurate and up to date. Using both methods minimizes operational overhead while ensuring comprehensive and reliable incremental updates.
upvoted 1 times
...
axantroff
11 months ago
Selected Answer: AB
Check out the design pattern documentation for this case. There's no need for Lambda here, so option C should be excluded. Option B seems viable, along with option A (A is the obvious choice for me). https://aws.amazon.com/blogs/big-data/run-aws-glue-crawlers-using-amazon-s3-event-notifications/
upvoted 1 times
...
michele_scar
1 year ago
Selected Answer: AC
B and D are wrong due too "Manually" and "Scheduling". E is too much for this use case
upvoted 3 times
...
tucobbad
1 year ago
Selected Answer: AC
- Option A suggests creating an S3 event-based AWS Glue crawler to consume events from the SQS queue. This option is appropriate as it allows the crawler to automatically respond to events, thereby reducing manual intervention and ensuring timely updates to the Data Catalog - Option C involves using an AWS Lambda function to directly update the Data Catalog based on S3 events received from the SQS queue. This is a strong candidate as it automates the update process without the need for manual scheduling or intervention, thus minimizing operational overhead. AWS Glue Crawlers can consume events from an SQS queue: https://docs.aws.amazon.com/glue/latest/dg/crawler-s3-event-notifications.html
upvoted 3 times
...
pikuantne
1 year ago
Selected Answer: AB
Based on this article (Option 1 for the architecture) it should be AB: 1. Run the crawler on a schedule. 2. Crawler polls for object create events in the SQS queue 3a. If there are events, crawler updates the Data Catalog 3b. If not, crawler stops
upvoted 3 times
...
ae35a02
1 year, 1 month ago
Selected Answer: BC
AWS Glue Crawlers can not consupe events from an SQS queue D introduce a manual operation E introduce more complexity so BC
upvoted 1 times
tucobbad
1 year ago
Answer is A and C In fact, AWS Glue Crawlers can consume events indeed: https://docs.aws.amazon.com/glue/latest/dg/crawler-s3-event-notifications.html
upvoted 2 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...