[Q153-Q168] Full Professional-Data-Engineer Practice Test and 270 unique questions with explanations waiting just for you!

Share

Full Professional-Data-Engineer Practice Test and 270 unique questions with explanations waiting just for you!

Google Cloud Certified Dumps Professional-Data-Engineer Exam for Full Questions - Exam Study Guide


Career Opportunities

The certified individuals can explore a variety of job opportunities. Some of the positions that they can take up include a Software Engineer, a Cloud Architect, a Data Engineer, a Sales Engineer, a Data Scientist, a Cloud Developer, and a Kubernetes Architect, among others. The salary outlook for these job roles is an average of $128,500 per annum.

 

NEW QUESTION # 153
You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings.
Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight dat

  • A. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.
  • B. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.
  • C. How can you adjust your application design?
  • D. Re-write the application to load accumulated data every 2 minutes.
  • E. Convert the streaming insert code to batch load for individual messages.

Answer: C


NEW QUESTION # 154
Which methods can be used to reduce the number of rows processed by BigQuery?

  • A. Splitting tables into multiple tables; using the LIMIT clause
  • B. Splitting tables into multiple tables; putting data in partitions
  • C. Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
  • D. Putting data in partitions; using the LIMIT clause

Answer: B

Explanation:
If you split a table into multiple tables (such as one table for each day), then you can limit your query to the data in specific tables (such as for particular days). A better method is to use a partitioned table, as long as your data can be separated by the day.
If you use the LIMIT clause, BigQuery will still process the entire table.
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables


NEW QUESTION # 155
Which Java SDK class can you use to run your Dataflow programs locally?

  • A. DirectPipelineRunner
  • B. LocalPipelineRunner
  • C. MachineRunner
  • D. LocalRunner

Answer: A

Explanation:
DirectPipelineRunner allows you to execute operations in the pipeline directly, without any optimization.
Useful for small local execution and tests
Reference: https://cloud.google.com/dataflow/java-
sdk/JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner


NEW QUESTION # 156
Flowlogistic's management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

  • A. Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage
  • B. Cloud Pub/Sub, Cloud Dataflow, and Local SSD
  • C. Cloud Load Balancing, Cloud Dataflow, and Cloud Storage
  • D. Cloud Pub/Sub, Cloud SQL, and Cloud Storage

Answer: D

Explanation:
Topic 2, MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world.
The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
* Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
* Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments - development/test, staging, and production - to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
* Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community.
* Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
* Provide reliable and timely access to data for analysis from distributed research workers
* Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure.
We also need environments in which our data scientists can carefully study and quickly adapt our models.
Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.


NEW QUESTION # 157
You are developing an application that uses a recommendation engine on Google Cloud. Your solution
should display new videos to customers based on past views. Your solution needs to generate labels for
the entities in videos that the customer has viewed. Your design must be able to provide very fast filtering
suggestions based on data from other customer preferences on several TB of data. What should you do?

  • A. Build and train a classification model with Spark MLlib to generate labels. Build and train a second
    classification model with Spark MLlib to filter results to match customer preferences. Deploy the
    models using Cloud Dataproc. Call the models from your application.
  • B. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud
    Bigtable, and filter the predicted labels to match the user's viewing history to generate preferences.
  • C. Build and train a complex classification model with Spark MLlib to generate labels and filter the results.
    Deploy the models using Cloud Dataproc. Call the model from your application.
  • D. Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud
    SQL, and join and filter the predicted labels to match the user's viewing history to generate
    preferences.

Answer: B


NEW QUESTION # 158
You are a head of BI at a large enterprise company with multiple business units that each have different priorities and budgets. You use on-demand pricing for BigQuery with a quota of 2K concurrent on-demand slots per project. Users at your organization sometimes don't get slots to execute their query and you need to correct this. You'd like to avoid introducing new projects to your account.
What should you do?

  • A. Increase the amount of concurrent slots per project at the Quotas page at the Cloud Console.
  • B. Switch to flat-rate pricing and establish a hierarchical priority model for your projects.
  • C. Create an additional project to overcome the 2K on-demand per-project quota.
  • D. Convert your batch BQ queries into interactive BQ queries.

Answer: B


NEW QUESTION # 159
MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world.
The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
* Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
* Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments - development/test, staging, and production - to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
* Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community.
* Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
* Provide reliable and timely access to data for analysis from distributed research workers
* Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure.
We also need environments in which our data scientists can carefully study and quickly adapt our models.
Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. It is company policy to ensure employees can view only the data associated with their region, so you create and populate a table for each region. You need to enforce the regional access policy to the data.
Which two actions should you take? (Choose two.)

  • A. Ensure all the tables are included in global dataset.
  • B. Adjust the settings for each table to allow a related region-based security group view access.
  • C. Adjust the settings for each view to allow a related region-based security group view access.
  • D. Adjust the settings for each dataset to allow a related region-based security group view access.
  • E. Ensure each table is included in a dataset for a region.

Answer: C,E


NEW QUESTION # 160
You have enabled the free integration between Firebase Analytics and Google BigQuery. Firebase now automatically creates a new table daily in BigQuery in the format app_events_YYYYMMDD. You want to query all of the tables for the past 30 days in legacy SQL. What should you do?

  • A. Use WHERE date BETWEEN YYYY-MM-DD AND YYYY-MM-DD
  • B. Use the WHERE_PARTITIONTIME pseudo column
  • C. Use SELECT IF.(date >= YYYY-MM-DD AND date <= YYYY-MM-DD
  • D. Use the TABLE_DATE_RANGE function

Answer: D

Explanation:
Legacy sql uses table date range whereas standard sql uses table_sufix for wildcard.


NEW QUESTION # 161
You are building an application to share financial market data with consumers, who will receive data feeds.
Data is collected from the markets in real time. Consumers will receive the data in the following ways:
* Real-time event stream
* ANSI SQL access to real-time stream and historical data
* Batch historical exports
Which solution should you use?

  • A. Cloud Pub/Sub, Cloud Storage, BigQuery
  • B. Cloud Pub/Sub, Cloud Dataproc, Cloud SQL
  • C. Cloud Dataproc, Cloud Dataflow, BigQuery
  • D. Cloud Dataflow, Cloud SQL, Cloud Spanner

Answer: D


NEW QUESTION # 162
You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work in progress on your clusters. What should you do?

  • A. Increase the cluster size with preemptible worker nodes, and configure them to use graceful decommissioning.
  • B. Increase the cluster size with preemptible worker nodes, and configure them to forcefully decommission.
  • C. Increase the cluster size with preemptible worker nodes, and use Cloud Stackdriver to trigger a script to preserve work.
  • D. Increase the cluster size with more non-preemptible workers.

Answer: A

Explanation:
Explanation
Reference https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/flex


NEW QUESTION # 163
If you're running a performance test that depends upon Cloud Bigtable, all the choices except one below are recommended steps. Which is NOT a recommended step to follow?

  • A. Use at least 300 GB of data.
  • B. Do not use a production instance.
  • C. Before you test, run a heavy pre-test for several minutes.
  • D. Run your test for at least 10 minutes.

Answer: B

Explanation:
Explanation
If you're running a performance test that depends upon Cloud Bigtable, be sure to follow these steps as you plan and execute your test:
Use a production instance. A development instance will not give you an accurate sense of how a production instance performs under load.
Use at least 300 GB of data. Cloud Bigtable performs best with 1 TB or more of data. However, 300 GB of data is enough to provide reasonable results in a performance test on a 3-node cluster. On larger clusters, use
100 GB of data per node.
Before you test, run a heavy pre-test for several minutes. This step gives Cloud Bigtable a chance to balance data across your nodes based on the access patterns it observes.
Run your test for at least 10 minutes. This step lets Cloud Bigtable further optimize your data, and it helps ensure that you will test reads from disk as well as cached reads from memory.
Reference: https://cloud.google.com/bigtable/docs/performance


NEW QUESTION # 164
You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)

  • A. Use a larger instance type for your Cloud Dataflow workers
  • B. Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery
  • C. Change the zone of your Cloud Dataflow pipeline to run in us-central1
  • D. Increase the number of max workers
  • E. Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery

Answer: A,E

Explanation:
Explanation/Reference:


NEW QUESTION # 165
You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the
"Trust No One" (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your data. What should you do?

  • A. Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket.
    Manually destroy the key previously used for encryption, and rotate the key once.
  • B. Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AAD). Use gsutil to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google cp Cloud.
  • C. Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret.
  • D. Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access.

Answer: D


NEW QUESTION # 166
When a Cloud Bigtable node fails, ____ is lost.

  • A. no data
  • B. all data
  • C. the time dimension
  • D. the last transaction

Answer: A

Explanation:
A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance the workload of queries. Tablets are stored on Colossus, Google's file system, in SSTable format. Each tablet is associated with a specific Cloud Bigtable node.
Data is never stored in Cloud Bigtable nodes themselves; each node has pointers to a set of tablets that are stored on Colossus. As a result:
Rebalancing tablets from one node to another is very fast, because the actual data is not copied. Cloud Bigtable simply updates the pointers for each node.
Recovery from the failure of a Cloud Bigtable node is very fast, because only metadata needs to be migrated to the replacement node.
When a Cloud Bigtable node fails, no data is lost
Reference: https://cloud.google.com/bigtable/docs/overview


NEW QUESTION # 167
You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company's mobile app You have reviewed old chat logs and lagged each conversation for intent based on each customer's stated intention for contacting customer service About 70% of customer requests are simple requests that are solved within 10 intents The remaining 30% of inquiries require much longer, more complicated requests Which intents should you automate first?

  • A. Automate the more complicated requests first because those require more of the agents' time
  • B. Automate intents in places where common words such as "payment" appear only once so the software isn't confused
  • C. Automate the 10 intents that cover 70% of the requests so that live agents can handle more complicated requests
  • D. Automate a blend of the shortest and longest intents to be representative of all intents

Answer: C


NEW QUESTION # 168
......


Preparing for the exam requires a combination of hands-on experience with Google Cloud Platform, as well as studying relevant documentation and training materials. Google offers a variety of resources for exam preparation, including online courses, hands-on labs, and practice exams.

 

Authentic Best resources for Professional-Data-Engineer Online Practice Exam: https://www.prepawayete.com/Google/Professional-Data-Engineer-practice-exam-dumps.html

Get the superior quality Professional-Data-Engineer Dumps Questions from PrepAwayETE: https://drive.google.com/open?id=1quAV0hcCCAmlCKs2I4XtQ43fZtvmP4K6

Contact Us

If you have any question please leave me your email address, we will reply and send email to you in 12 hours.

Our Working Time: ( GMT 0:00-15:00 )
From Monday to Saturday

Support: Contact now