Within 30 days I passed both the Google Cloud Platform Professional Data Engineer and Architect Certification exams.
However, it took me much longer than 30 days of study and experience to pass the exams.
Fortunately, there was a lot of overlap between the two exams, so if anyone else wants to put their personal life on hold for a few months and attempt something as crazy as passing two of the hardest cloud certifications in a short period of time, here are some tips to help you out.
First, the professional certifications are just as much about technical knowledge as they are about critical thinking – meaning you will not know the right ‘correct’ answer for many questions, but you might know the wrong answers. The test requires process of elimination. When you face a question that does have an obvious answer, make sure to read the other questions to see if there are any obvious candidates for elimination.
For example, there was a question about architecting a VM hosted web application and how to best accommodate a biz requirement for http failover. You had to decide between if you should point the load balancer to individual VM instances’ ip address or to a VM instance GROUP’s ip address https://cloud.google.com/solutions/best-practices-floating-ip-addresses#option_3_failover_using_different_priority_routes and https://cloud.google.com/compute/docs/tutorials/high-availability-load-balancing
If you’ve only used deployment templates or worked more with managed services rather than compute – or focused more on development or architecture rather than networking – this is not a situation you’ll come across very often. In the relatively rare case when someone has configured an http load balancer and an instance group *within the GCP console*, they would know you can only point a load balancer to an instance group, not an instance itself; but for the rest of us there is still a way we can figure out the answer.
We should know that http failover means a load balancer, so any answer not mentioning a level 7 load balancer should be excluded. So we are left with options of either pointing the load balancer to the VM instances or the instance group. I should mention, we are technically pointing the load balancer to the instance in both options, but this is about configuration not physical architecture.
(note: Level 7 load balancing is, somewhat oversimplified, http traffic allocation with some logic, whereas level 4 is http / udp with little logic:https://www.nginx.com/resources/glossary/layer-7-load-balancing/)
Let’s assume we don’t know the right answer, but we do know managed instance groups allowing autoscaling of vms based on usage, and we know enough about IP addresses and load balancers to know if a new vm instance is created the load balancer needs to know the new IP address of the new VM instance, otherwise the load balancer won’t know where to forward traffic. So knowing managed instance groups are often used for scalable web applications it would only make sense for us to point the load balancer to the managed instance *group* and each individual instance.
Speaking of networking, you’ll need to study a know a lot of networking. Some examples of terms and concepts to be familiar with (non-exhaustive):
- CIDR: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing
- Pronounced “Cider”
- RFC 1918 Address Space: https://help.it.ox.ac.uk/network/rfc1918/index
- Subnet Mask: https://www.iplocation.net/subnet-mask
- GCP VPC: https://cloud.google.com/vpc
Related to networking, you’ll need to know how data is shared between GCP organizations, on premise data, and other cloud providers. There are a lot of options for this, and all are situational, so understand the differences between:
- Peering: https://cloud.google.com/vpc/docs/vpc-peering
- Cloud Interconnect: https://cloud.google.com/interconnect/docs/concepts
- Dedicated and Partner
- Data transfer services: https://cloud.google.com/bigquery-transfer/docs/how-to
- Data transfer appliance: https://cloud.google.com/transfer-appliance
On the subject of data, the Data Engineer certification had much more architecture than I expected, you’ll need to understand both application data architecture and analytics data architecture. There’s so much information, but at a high level, you’ll need to know when to use:
- Cloud Storage: https://cloud.google.com/products/storage
- Cloud Firestore / Datastore: https://cloud.google.com/firestore
- Firestore is next version of datastore: https://cloud.google.com/datastore/docs/firestore-or-datastore
- Cloud BigTable: https://cloud.google.com/bigtable
- Cloud Spanner: https://cloud.google.com/spanner
- Cloud SQL: https://cloud.google.com/sql
As well as understand the different business cases on when to use the different ML and AI Platform services: https://cloud.google.com/ai-platform. For example, when is it better to use one of GCPs pre-trained ML APIs (e.g. Vision API) vs. training your own in AutoML vs. deploying your own custom built models using a tool like AI Platform Prediction(https://cloud.google.com/ai-platform/prediction/docs/overview)?
It’s difficult to describe my full experience without turning this article into even more of a study guide, but allow me to give some helpful resources.
My starting point was Earl Gay’s excellent study guide on Medium: https://medium.com/@earlg3/google-cloud-architect-exam-study-materials-updates-for-2019-re-certification-c4894d3a82e7 It has a lot of helpful links which I will not reproduce in this article, so check out Earl’s guide for more info.
https://grumpygrace.dev/posts/gcp-flowcharts/: If you are able to explain why every decision was made in every single flowchart on this site, then you should be able to pass both GCP Architect and Data Engineer Professional Certifications.
In order to gain that knowledge, the most complete online courses I found were at Linux Academy. https://linuxacademy.com/course/google-cloud-certified-professional-cloud-architect/ for Cloud Architect; and https://linuxacademy.com/course/google-cloud-data-engineer/ for Data Engineer.
The Linux Academy courses also contain practice tests with different questions than the sample test provided by GCP.
Most people consider Coursera first when they want to study online. In my personal opinion, I found the Coursera options to be lacking, both in practical training and in content, so I would not recommend taking them unless you have a lot of experience in GCP and only need a refresher. Also the course progression through their certificate tracks is confusing as you often just hope you’re taking the correct class for a given certificate (I took an entire course on Kubernetes before I realized I was taking a course in the Application Developer track and not for Cloud Architect.)
The Coursera practice exam questions were almost identical to the sample practice exams provided by GCP – therefore there wasn’t a lot of benefit taking the Coursera practice exams if you already had taken the free GCP practice exam.
(Note: This article is in no way sponsored by Linux Academy, nor at the time of writing does TheoryLane have any form of business relationship with Linux Academy, these opinions are from my experience alone and may not reflect the views of others at TheoryLane.)
Stay in Touch!
I hope this information was helpful, or at least guided you to information that was helpful.
If you have any questions or would just like to connect, feel free to reach out to me on linkedin: https://www.linkedin.com/in/daniel-smith-data-scientist/ or use the contact form below.