Interpreting Patterns in Multi-Variate Multi-Horizon Time-Series Forecasts from Google’s Temporal Fusion Transformer Model

Wow that title is a mouthful… But it’s not a complicated as it seems. Let’s break it down:

  • Multi-Variate Time-Series Forecasts – Single-variate time-series forecasting uses only the historical values of the data in which we are attempting to predict future values. (For example, expontial decay, moving average, auto-regressive moving average.) Multi-Variate allows additional time-series and non-time-series variable to be including in the model to enhance the models predictive capability and give better understanding as to what influences our target predicted value(s). (For example, including the weighted seven day moving average sentiment of news articles about a company when forecasting it’s stock price for tomorrow.)
  • Multi-Horizon Time-Series Forecasts – Traditional time series forecasting is typically optimized for a specified number of period ahead (for example, a produce department predicting next week’s potato sales to determine inventory). Multi Horizon means we attempt to predict many different future periods within in the same model. (For example, predicting daily potato sales for every day over the next four weeks to reduce the number of orders and schedule times for restocking.)
  • Interpreting Patterns – A good model doesn’t only provide an accurate prediction, it also gives insights as to what inputs are driving the results, that is, the model is interpretable.
  • Temporal Fusion Transformer – The name of the proposed Multi-Horizon Time-Series Forecasting framework. It combines elements of Long-Short Term Memory (LSTM) Convolutional Neural Networks (CNNs) and a mechanism first used in image recognition called “Attention” (We’ll talk more about attention later).
Continue reading “Interpreting Patterns in Multi-Variate Multi-Horizon Time-Series Forecasts from Google’s Temporal Fusion Transformer Model”

Notes from passing both GCP Cloud Architect and Data Engineer Professional Certifications in 30 days

Within 30 days I passed both the Google Cloud Platform Professional Data Engineer and Architect Certification exams.

However, it took me much longer than 30 days of study and experience to pass the exams.

Fortunately, there was a lot of overlap between the two exams, so if anyone else wants to put their personal life on hold for a few months and attempt something as crazy as passing two of the hardest cloud certifications in a short period of time, here are some tips to help you out.

First, the professional certifications are just as much about technical knowledge as they are about critical thinking – meaning you will not know the right ‘correct’ answer for many questions, but you might know the wrong answers. The test requires process of elimination.  When you face a question that does have an obvious answer, make sure to read the other questions to see if there are any obvious candidates for elimination.

For example, there was a question about architecting a VM hosted web application and how to best accommodate a biz requirement for http failover. You had to decide between if you should point the load balancer to individual VM instances’ ip address or to a VM instance GROUP’s ip address https://cloud.google.com/solutions/best-practices-floating-ip-addresses#option_3_failover_using_different_priority_routes and https://cloud.google.com/compute/docs/tutorials/high-availability-load-balancing

If you’ve only used deployment templates or worked more with managed services rather than compute – or focused more on development or architecture rather than networking – this is not a situation you’ll come across very often. In the relatively rare case when someone has configured an http load balancer and an instance group *within the GCP console*, they would know you can only point a load balancer to an instance group, not an instance itself; but for the rest of us there is still a way we can figure out the answer.

We should know that http failover means a load balancer, so any answer not mentioning a level 7 load balancer should be excluded.  So we are left with options of either pointing the load balancer to the VM instances or the instance group. I should mention, we are technically pointing the load balancer to the instance in both options, but this is about configuration not physical architecture.

(note: Level 7 load balancing is, somewhat oversimplified, http traffic allocation with some logic, whereas level 4 is http / udp with little logic:https://www.nginx.com/resources/glossary/layer-7-load-balancing/)

Let’s assume we don’t know the right answer, but we do know managed instance groups allowing autoscaling of vms based on usage, and we know enough about IP addresses and load balancers to know if a new vm instance is created the load balancer needs to know the new IP address of the new VM instance, otherwise the load balancer won’t know where to forward traffic. So knowing managed instance groups are often used for scalable web applications it would only make sense for us to point the load balancer to the managed instance *group* and each individual instance.

Speaking of networking, you’ll need to study a know a lot of networking.  Some examples of terms and concepts to be familiar with (non-exhaustive):

Related to networking, you’ll need to know how data is shared between GCP organizations, on premise data, and other cloud providers. There are a lot of options for this, and all are situational, so understand the differences between:

On the subject of data, the Data Engineer certification had much more architecture than I expected, you’ll need to understand both application data architecture and analytics data architecture. There’s so much information, but at a high level, you’ll need to know when to use:

As well as understand the different business cases on when to use the different ML and AI Platform services: https://cloud.google.com/ai-platform. For example, when is it better to use one of GCPs pre-trained ML APIs (e.g. Vision API) vs. training your own in AutoML vs. deploying your own custom built models using a tool like AI Platform Prediction(https://cloud.google.com/ai-platform/prediction/docs/overview)?

Learning Resources

It’s difficult to describe my full experience without turning this article into even more of a study guide, but allow me to give some helpful resources.

My starting point was Earl Gay’s excellent study guide on Medium: https://medium.com/@earlg3/google-cloud-architect-exam-study-materials-updates-for-2019-re-certification-c4894d3a82e7  It has a lot of helpful links which I will not reproduce in this article, so check out Earl’s guide for more info.

https://grumpygrace.dev/posts/gcp-flowcharts/: If you are able to explain why every decision was made in every single flowchart on this site, then you should be able to pass both GCP Architect and Data Engineer Professional Certifications.

In order to gain that knowledge, the most complete online courses I found were at Linux Academy. https://linuxacademy.com/course/google-cloud-certified-professional-cloud-architect/ for Cloud Architect; and https://linuxacademy.com/course/google-cloud-data-engineer/ for Data Engineer.

The Linux Academy courses also contain practice tests with different questions than the sample test provided by GCP.

Most people consider Coursera first when they want to study online. In my personal opinion, I found the Coursera options to be lacking, both in practical training and in content, so I would not recommend taking them unless you have a lot of experience in GCP and only need a refresher. Also the course progression through their certificate tracks is confusing as you often just hope you’re taking the correct class for a given certificate (I took an entire course on Kubernetes before I realized I was taking a course in the Application Developer track and not for Cloud Architect.)

The Coursera practice exam questions were almost identical to the sample practice exams provided by GCP – therefore there wasn’t a lot of benefit taking the Coursera practice exams if you already had taken the free GCP practice exam.

(Note: This article is in no way sponsored by Linux Academy, nor at the time of writing does TheoryLane have any form of business relationship with Linux Academy, these opinions are from my experience alone and may not reflect the views of others at TheoryLane.)

Stay in Touch!

I hope this information was helpful, or at least guided you to information that was helpful.

If you have any questions or would just like to connect, feel free to reach out to me on linkedin: https://www.linkedin.com/in/daniel-smith-data-scientist/  or use the contact form below.

New DataFlow Job Metrics vs. StackDriver

Promoted as new capabilities in “DataFlow observability”, GCP is finally giving us the ability to see CPU a time series graph of cpu utilization and throughput for a given DataFlow job within the DataFlow console.

https://cloud.google.com/blog/products/data-analytics/better-data-pipeline-observability-for-batch-and-stream-processing

Before we used stackdriver (which is getting rebranded, by the way) to view the VM CPU utilization from our DataFlow jobs. The new DataFlow capabilities do not replace StackDriver aggregate monitoring and alerting in StackDrivery; however StackDriver and Obersavility serve different use cases – where the observability functions more for DataFlow job debugging and optimization while StackDriver is for holistic tracking and monitoring. I.e. the DataFlow UI is for job specific DataFlow Ops, StackDriver is for Administration.

Job details 
JOB GRAPH 
BACK TO OLD JOB PAGE 
JOB METRICS

Those of us who have used StackDriver appreciate more visibility in CPU and throughput as all had in the console was the resource metrics on the right of the job topology.

I did a quick run of the standard wordcount example to generate so data. The new graphs are simple and to the point. I like them.

Throughput (elements/sec) 
7:37 
group/Read 
7:38 
7:39 
Create alerting policy 
Mar 2, 2020 7:40 PM 
• group/ Reify 
. group/Write 
• split 
• read/Read 
17 lines below 
261.9/s 
261.9/s 
261.9/s 
261 S/s 
50.97/s 
2S0's 
200's 
ISO's 
IDO/s 
111

Now we see specifically which ops taking the most IO and CPU for a given job – without the overhead of creating a new StackDriver dashboard or filtering to a specific job. In fact, there’s no real way to get this level of visual detail out of the box in StackDriver. (At least not that I’m aware of, let me know if there is a simple configuration setting I’ve been overlooking!) In StackDriver the minimum alignment period is 1 minute, so the best we can do is see operation counts or vCPUs per minute. In our new DataFlow UI we can see throughput and vCPU per second.

For a StackDriver workflow, per second detail is way to granular; however, when testing DataFlow jobs prior to a large scale deployment, lower level detail is important for introspection prior to rolling out inefficient – and expensive – DataFlow jobs.