Interpreting Patterns in Multi-Variate Multi-Horizon Time-Series Forecasts from Google’s Temporal Fusion Transformer Model

Wow that title is a mouthful… But it’s not a complicated as it seems. Let’s break it down:

  • Multi-Variate Time-Series Forecasts – Single-variate time-series forecasting uses only the historical values of the data in which we are attempting to predict future values. (For example, expontial decay, moving average, auto-regressive moving average.) Multi-Variate allows additional time-series and non-time-series variable to be including in the model to enhance the models predictive capability and give better understanding as to what influences our target predicted value(s). (For example, including the weighted seven day moving average sentiment of news articles about a company when forecasting it’s stock price for tomorrow.)
  • Multi-Horizon Time-Series Forecasts – Traditional time series forecasting is typically optimized for a specified number of period ahead (for example, a produce department predicting next week’s potato sales to determine inventory). Multi Horizon means we attempt to predict many different future periods within in the same model. (For example, predicting daily potato sales for every day over the next four weeks to reduce the number of orders and schedule times for restocking.)
  • Interpreting Patterns – A good model doesn’t only provide an accurate prediction, it also gives insights as to what inputs are driving the results, that is, the model is interpretable.
  • Temporal Fusion Transformer – The name of the proposed Multi-Horizon Time-Series Forecasting framework. It combines elements of Long-Short Term Memory (LSTM) Convolutional Neural Networks (CNNs) and a mechanism first used in image recognition called “Attention” (We’ll talk more about attention later).
Continue reading “Interpreting Patterns in Multi-Variate Multi-Horizon Time-Series Forecasts from Google’s Temporal Fusion Transformer Model”

Journey to an Enterprise Service Bus Integration – Part 4

This article is the fourth in a series dedicated to the ESB integration architecture. The initial article is located here.

Publish-Subscribe asynchronous messaging

In a preceding article, I talked about 3 limitations of a traditional integration:

  1. Lack of scalability and fault tolerance.
  2. There are no economies of scale.
  3. Sender and receiver are tied by a contract.

In my previous article, I presented Point-to-Point asynchronous messaging as a way to alleviate limitation #1 by adding a persistent, scalable and fault-tolerant layer to decouple the flow between a sender and a receiver.

In this article, I am going to discuss another form of asynchronous messaging called Publish-Subscribe often referred to as Pub/Sub.

Pub/Sub is an asynchronous communication method in which messages are exchanged between applications without knowing the identity of the sender or recipient.

Pub/Sub messaging solves limitation #2. This is a One-to-Many delivery: several receivers (referred to as subscribers) can receive a message from a sender (referred to as a publisher) without having to duplicate the data flow. Rather than using queues like the Point-to-Point messaging, Pub/Sub makes use of topics, which allow several receivers to process the same message. In this case, the message is removed from the topic after all subscribers processed it.


  1. There is no more compounded complexity when multiple systems or applications need to receive the same data, as they can simply subscribe to the topic.
  2. As the inbound flows (publisher to bus) become reusable, there is economy of scale.
  3. Similar as Point-to-Point messaging, there is loose coupling between publishers and subscribers allowing them to operate independently of each other.
  4. Scalability, achieved through multi-threading, message caching, etc. as well as reliability / fault tolerance achieved through clustering and load balancing.


One of the biggest limitation of this model is in its main advantage. As publishers and subscribers are not aware of each other, the architecture itself does not guarantee delivery: it guarantees the availability of a message for a subscriber to consume, but not the fact that the subscriber will consume it. Therefore, designs outside of the architecture must be in place if guaranteed delivery is a requirement.

Similar to the Point-to-Point messaging, the scalability of the message bus is relative. Slow subscribers, large messages, may overwhelm the message broker to the point of exhausting its available resources and bring it down. I experienced these limitations with a slow subscriber combined with a verbose publisher, leading to accumulation of messages beyond what the broker could handle. Mitigation actions can include cleaning all the messages (but leading to information loss), or routing messages to a disaster recovery area specifically designed to handle large volume of messages. These messages might have to be “re-played” once the situation returns to normal, depending of the business case.

When to use

This model is typically used for scenarios when the publisher does not need to know if a message has been successfully distributed to a subscriber, because by design the publisher is not aware of the subscribers.


In the next article, I will discuss alleviating limitation #3 through the use of Canonical Models.

If you have any questions, feel free to leave a comment below or contact us here.

Journey to an Enterprise Service Bus Integration – Part 3

This article is the third in a series dedicated to the ESB integration architecture. The initial article is located here.

Point-to-Point asynchronous messaging

In my previous article, I talked about 3 limitations of a traditional integration:

  1. Lack of scalability and fault tolerance.
  2. There are no economies of scale.
  3. Sender and receiver are tied by a contract.

In this article, I will describe a form of asynchronous messaging called Point-to-Point.

Point-to-Point asynchronous messaging solves limitation #1. It does so by the introduction of a messaging layer, often referred to as a message bus or event bus and the use of queues. It is still a one-to-one delivery though, similar as traditional integration.


A couple of advantages of this model:

  1. Senders and receivers are now physically loosely coupled. For example, if the receiver is down, or slows down (the receiver is still working but it is not able to accept the messages as quickly as they are being sent because it is not sized appropriately for example) the messaging layer can still receive messages and hold them until the receiver is ready to consume them, effectively playing a regulating role.
  2. Scalability and fault tolerance are built-in the model: persistence layer (file or database), vertical scalability since the bus is running on a separate set of servers that can be scaled separately from the sender or receiver, horizontal scalability through the ability to create multiple reception threads.


However, this form of messaging still does not allow for economies of scale as there can be only one receiver for a given message. It also ties all parties to a data schema and any change to this schema often requires all parties to be involved.

When to use

This model is typically used to facilitate communication between 2 components (whether they are tightly coupled or written in a different language/technology platform), and/or when you need to ensure a message is processed by one receiver only. If multiple receivers point to the same queue, only one will dequeue it and the message will be removed once acknowledged.


In the next article, I will discuss alleviating limitation #2 above through Publish-Subscribe asynchronous messaging.

If you have any questions, feel free to leave a comment below or contact us here.

Journey to an Enterprise Service Bus Integration – Part 2

This article is the second in a series dedicated to the ESB integration architecture. The initial article is located here.

Traditional data integration

In my previous article, I talked how an ESB architecture is – to my opinion – the go to architecture to tackle an Enterprise challenges in today’s world.

But in order to understand why I value an ESB, I find essential to explain first the shortcomings of a traditional/point-to-point data integration as well as some of the push backs I initially experienced.

In a traditional data integration, interfaces are built between applications and systems through direct connections as a Point-to-Point integration. As such, the number of interfaces tends to grow exponentially as the data needs increase, and/or the landscape of application and systems expands.

Some refer to this architecture as a “spaghetti architecture”, because all systems have to be interconnected with each other in order to speak to each other.

Traditional interfaces are punctually easy and cheap to build. By punctually I mean: each interface taken separately without looking at the system in its globality, in terms of flexibility and scalability.

Push back

Below are some of the arguments I faced when talking about the need to introduce an integration layer:

  1. There are only 2 teams involved. No need for a third “integration team” that needs to be brought up to speed on each system’s requirements.
  2. There are only 2 systems involved in each interface, limiting the points of failure and troubleshooting efforts.
  3. Sender and receiver teams can quickly discuss the data requirements and come up with a format satisfying the needs of the interface as well as a transport method satisfying each system’s technical limitations.
  4. With this type of interfacing, the sender and/or the receiver remain in control of the data and transformations hence they can troubleshoot issues faster without having to rely, once again, on a “third team”.


The points above are very valid. When you operate in a world where there are few senders and receivers, when you mostly need to send data once a day/week/month in a large batch always at the same time, and when the data exchanged only need to go to one place.

However, there are as well very valid limitations to this model:

  1. Lack of scalability and fault tolerance. For example, an overloaded receiver could start running slower, or lose connections leading to delay in processing or worse, message loss. 
  2. There are no economies of scale. If (when) 2 receivers are interested in the same information, the integration needs to be duplicated either by the sender, or by one of the receivers to be then forked to the other, or in a staging area which adds another point of failure (and defeats one of the “pro traditional” arguments). All these duplications cause compounded complexity.
  3. Sender and receiver are tied by a contract (a.k.a. the data schema) that cannot be changed unless both agree on the terms.

Through these limitations, you can see how 2 systems interfaced using this model are tightly coupled and one change in one system may require intervention of all parties involved.

Another consequence: because of the lack of governance that this model entails, all these disparate interfaces built overtime by different developers with different naming conventions start to weigh heavily on the support side, creating pockets of expertise. These pockets become a liability for an enterprise due to the amount of effort and training required to maintain knowledge.

When to use

I personally would not use Traditional Integration end-to-end. But, system limitations sometimes force the use of traditional integration as a step in an overall integration. For example: when a 3rd party customer or supplier is only able to send a file through [s]FTP. In this case, the receiver first task would be to decouple the flow as I will discuss in the next articles.


In the next article, I will discuss a first step in alleviating Limitation #1 above: Point-to-Point asynchronous messaging.

If you have any questions, feel free to leave a comment below or contact us here.

Journey to an Enterprise Service Bus Integration


In the 1990s, Enterprise Resource Planning solutions rose tremendously. At the time, they were the new “grail”, concentrating every single one of an enterprise needs in a packaged solution. It did not take long, 10 years roughly, for executives to determine that their bloated, overly customized and overly expensive-to-upgrade solutions would never bring the promised reductions of complexity and savings they were expecting.

Worse, each new change would compound the existing complexity: changing Chart of Accounts, product codification, pricing structure, customer structure, etc.

An idea started to form: rather than concentrating everything in one place, a better approach would be to leave best in breed applications alone and integrate them. How?

The Enterprise Service Bus is arguably the “grail” of data integration solutions, especially when it comes to Enterprise data integrations.

I experienced several reasons for that:

  • For the last 20 years, enterprise growth has been greatly accelerated through acquisitions, creating the need of being able to integrate quickly and in a repeatable way different business models built on different IT systems. But what if I’m on Oracle Applications and they’re on SAP, now what?
  • The speed at which companies are doing business, and the amount of actors involved, created the need of disseminating crucial information in real-time across numerous disparate IT systems. How do I enable customer registration in my e-commerce portals, and propagate in real-time this information into my order entry systems, my warehouse management systems, financial systems and Salesforce CRM so the customer can start placing orders right away?
  • “Data is the new gold”: Being able to gain insights by introspecting and analyzing in real-time data passing through enterprise systems in order to make on-the-spot operational decisions. How do I route a customer order so it is served by a location that will provide the fastest service, at the cheapest cost?

At 50,000 ft, an ESB architecture looks a lot like a Hub & Spoke:

However, the distributed architecture in the central Hub alleviates the bottleneck challenges that the Hub & Spoke model introduced. I will come back later on this aspect.

To that extent, I see the ESB architecture as a natural progression of the Hub & Spoke model to better face today’s enterprise challenges.

In this series of articles, I will review various data integration models and how they address the shortcomings of a Traditional integration, paving the way for the utilization of an Enterprise Service Bus architecture. I will not enter into deep technical details, rather I will provide a practical review of each model and applicability to an enterprise.

The next article in the series can be found here.

If you have any questions, feel free to leave a comment below or contact us here.

How to configure a local Pystan environment

Despite all the hype around Deep Learning Models, and AI as a Service APIs, there’s still a need for Data Scientists to explain – in simple terms – what factors influence a given prediction.  And even more importantly, sometimes we want to construct a model that represents real world process, rather than have input values feed into a programmatically optimized series of neural networks and produce a predicted value.

I am not attempting to argue Deep Learning is not effective. Deep Learning is the best tool we have right now for predicting outcomes. However, prediction alone does not necessarily lead to a better human understanding of what influences said prediction. In many cases a human can’t understand why a Deep Learning model calculates its predictions because by the time we understand the current model, the Deep Learning routine has already updated based on new information. Case in point, Reinforcement Learning routines are always adapting to the decisions of Actors in near real time. We typically do not judge the efficiency of a Reinforcement Learning model based on the decisions it makes, but on the associated effects of the human actors interacting with the model – are players in the battle arena giving up when playing against bots? Are chatbot responses receiving poor reviews? 

Bayesian Analytics or Bayesian Analysis uses a Bayesian Approach to create human understandable insights from models. “But Bayesian Statistics is so hard? It has lots of weird symbols and probabilities and stuff?”  Yes, learning Bayesian Statistics can be a challenge, but the notation is no more complex than any other type of statistics, most of us were just taught frequentist stats first so it seems more intuitive.

Regardless, an easy way to learn Bayesian Analysis is using this book: followed by this book:

Between Kruschke and Gelman’s book, you can get a strong foundation on using Bayesian statistics for analysis. (See Andrew Gelman’s recent review of the Santa Clara SARS-COV-2 antibody prevalence study for why a strong foundation of Bayesian statistics is important for analysis:

Unfortunately, both Kruschke and Gelman use R rather than Python in their examples. Fortunately, the MCMC sampling applications BUGS, JAGS, and Stan are not actually an R or Python program, R and Python merely call their APIs. So, setting up Python to use Stan, for example, is no harder than using R.

My process for setting up a local virtual environment is below. Please note, if you just want to get started, Google Colab is much easier. As an example, see the notebook provided by Ethan Steinburg in the comments of Gelman’s article:

For a local environment, it’s a little more complex, but not too bad.

Pystan’s repo documentation isn’t bad either:; in this article I’m providing a supplement with my typical workflow.

This configuration assumes you have Anaconda installed and are able to set up a virtual environment on your machine.

Once you have Anaconda installed and accessible via command line, simply run the following commands for the first time you use the environment:

conda create -n stan_env python==3.7 numpy scipy matplotlib libpython m2w64-toolchain  -c conda-forge -c msys2

conda activate stan_env

python -m pip install pystan arviz scikit-learn statsmodels plotly seaborn nbformat

The first line creates a python3 environment with the necessary packages required for pystan installation.

The next activates the environment.

The third installs pystan and packages I often use for analysis.

Additionally, you probably want to use Jupyter Lab for development, so here are some additional configurations, again only necessary the first time you activate the environment.

pip install --user ipykernel
python -m ipykernel install --user --name=stan_env
conda install ipywidgets
conda install -c conda-forge nodejs
jupyter labextension install jupyterlab-plotly

These commands install necessary widget for visualization, nodejs for rendering the widgets, and the plotly extension for interactive visuals.

Now you should be ready to launch Jupyter in your new pystan environment!

Make sure you have the stan_env virtual environment active by typing…

conda activate stan_env

… in your terminal / command line / powershell

Then type “Jupyter Lab”  (after enabling the virtual environment).

Once Jupyter Lab loads attempt to execute “import pystan”, if there are no errors, congrats! You now have a functional Pystan Jupyter Notebook!

Next time you need to use the notebook, you only need to type

conda activate stan_env

And you are ready to launch your Jupyter Lab or Jupyter Notebook.

Notes from passing both GCP Cloud Architect and Data Engineer Professional Certifications in 30 days

Within 30 days I passed both the Google Cloud Platform Professional Data Engineer and Architect Certification exams.

However, it took me much longer than 30 days of study and experience to pass the exams.

Fortunately, there was a lot of overlap between the two exams, so if anyone else wants to put their personal life on hold for a few months and attempt something as crazy as passing two of the hardest cloud certifications in a short period of time, here are some tips to help you out.

First, the professional certifications are just as much about technical knowledge as they are about critical thinking – meaning you will not know the right ‘correct’ answer for many questions, but you might know the wrong answers. The test requires process of elimination.  When you face a question that does have an obvious answer, make sure to read the other questions to see if there are any obvious candidates for elimination.

For example, there was a question about architecting a VM hosted web application and how to best accommodate a biz requirement for http failover. You had to decide between if you should point the load balancer to individual VM instances’ ip address or to a VM instance GROUP’s ip address and

If you’ve only used deployment templates or worked more with managed services rather than compute – or focused more on development or architecture rather than networking – this is not a situation you’ll come across very often. In the relatively rare case when someone has configured an http load balancer and an instance group *within the GCP console*, they would know you can only point a load balancer to an instance group, not an instance itself; but for the rest of us there is still a way we can figure out the answer.

We should know that http failover means a load balancer, so any answer not mentioning a level 7 load balancer should be excluded.  So we are left with options of either pointing the load balancer to the VM instances or the instance group. I should mention, we are technically pointing the load balancer to the instance in both options, but this is about configuration not physical architecture.

(note: Level 7 load balancing is, somewhat oversimplified, http traffic allocation with some logic, whereas level 4 is http / udp with little logic:

Let’s assume we don’t know the right answer, but we do know managed instance groups allowing autoscaling of vms based on usage, and we know enough about IP addresses and load balancers to know if a new vm instance is created the load balancer needs to know the new IP address of the new VM instance, otherwise the load balancer won’t know where to forward traffic. So knowing managed instance groups are often used for scalable web applications it would only make sense for us to point the load balancer to the managed instance *group* and each individual instance.

Speaking of networking, you’ll need to study a know a lot of networking.  Some examples of terms and concepts to be familiar with (non-exhaustive):

Related to networking, you’ll need to know how data is shared between GCP organizations, on premise data, and other cloud providers. There are a lot of options for this, and all are situational, so understand the differences between:

On the subject of data, the Data Engineer certification had much more architecture than I expected, you’ll need to understand both application data architecture and analytics data architecture. There’s so much information, but at a high level, you’ll need to know when to use:

As well as understand the different business cases on when to use the different ML and AI Platform services: For example, when is it better to use one of GCPs pre-trained ML APIs (e.g. Vision API) vs. training your own in AutoML vs. deploying your own custom built models using a tool like AI Platform Prediction(

Learning Resources

It’s difficult to describe my full experience without turning this article into even more of a study guide, but allow me to give some helpful resources.

My starting point was Earl Gay’s excellent study guide on Medium:  It has a lot of helpful links which I will not reproduce in this article, so check out Earl’s guide for more info. If you are able to explain why every decision was made in every single flowchart on this site, then you should be able to pass both GCP Architect and Data Engineer Professional Certifications.

In order to gain that knowledge, the most complete online courses I found were at Linux Academy. for Cloud Architect; and for Data Engineer.

The Linux Academy courses also contain practice tests with different questions than the sample test provided by GCP.

Most people consider Coursera first when they want to study online. In my personal opinion, I found the Coursera options to be lacking, both in practical training and in content, so I would not recommend taking them unless you have a lot of experience in GCP and only need a refresher. Also the course progression through their certificate tracks is confusing as you often just hope you’re taking the correct class for a given certificate (I took an entire course on Kubernetes before I realized I was taking a course in the Application Developer track and not for Cloud Architect.)

The Coursera practice exam questions were almost identical to the sample practice exams provided by GCP – therefore there wasn’t a lot of benefit taking the Coursera practice exams if you already had taken the free GCP practice exam.

(Note: This article is in no way sponsored by Linux Academy, nor at the time of writing does TheoryLane have any form of business relationship with Linux Academy, these opinions are from my experience alone and may not reflect the views of others at TheoryLane.)

Stay in Touch!

I hope this information was helpful, or at least guided you to information that was helpful.

If you have any questions or would just like to connect, feel free to reach out to me on linkedin:  or use the contact form below.

New DataFlow Job Metrics vs. StackDriver

Promoted as new capabilities in “DataFlow observability”, GCP is finally giving us the ability to see CPU a time series graph of cpu utilization and throughput for a given DataFlow job within the DataFlow console.

Before we used stackdriver (which is getting rebranded, by the way) to view the VM CPU utilization from our DataFlow jobs. The new DataFlow capabilities do not replace StackDriver aggregate monitoring and alerting in StackDrivery; however StackDriver and Obersavility serve different use cases – where the observability functions more for DataFlow job debugging and optimization while StackDriver is for holistic tracking and monitoring. I.e. the DataFlow UI is for job specific DataFlow Ops, StackDriver is for Administration.

Job details 

Those of us who have used StackDriver appreciate more visibility in CPU and throughput as all had in the console was the resource metrics on the right of the job topology.

I did a quick run of the standard wordcount example to generate so data. The new graphs are simple and to the point. I like them.

Throughput (elements/sec) 
Create alerting policy 
Mar 2, 2020 7:40 PM 
• group/ Reify 
. group/Write 
• split 
• read/Read 
17 lines below 
261 S/s 

Now we see specifically which ops taking the most IO and CPU for a given job – without the overhead of creating a new StackDriver dashboard or filtering to a specific job. In fact, there’s no real way to get this level of visual detail out of the box in StackDriver. (At least not that I’m aware of, let me know if there is a simple configuration setting I’ve been overlooking!) In StackDriver the minimum alignment period is 1 minute, so the best we can do is see operation counts or vCPUs per minute. In our new DataFlow UI we can see throughput and vCPU per second.

For a StackDriver workflow, per second detail is way to granular; however, when testing DataFlow jobs prior to a large scale deployment, lower level detail is important for introspection prior to rolling out inefficient – and expensive – DataFlow jobs.

Applying Continuous Delivery Patterns to Data Development

Historically, application development and data pipeline development have been kept separate. We are seeing this pattern begin to change. (See posts on Conway’s Law for reasons why.)

This means ETL/ELT development will almost certainly begin to model application development. App dev contains far more controls and business continuity, but more importantly Application Development has spent the past two decades refining coding patterns consistent with reusable and extensible code objects.

App DevOps has so refined their ability to quickly modify and deploy changes to app code that the current state of the industry is talking about not only pushing code for testing thousand of times a day, but also how they can automate pushing code changes to production! Continuously!

That sounds like an impossibility to many ETL/ELT devs, but the truth is – there is nothing stopping continuous deployment patterns in data dev. In fact, there is a movement toward Behavioral Driven Design (BDD) as an extension of Test Driven Design (TDD).

BDD and TDD are development patterns which integrate acceptance criteria into the code itself, meaning the first round of quality assurance must happen before any human lays eyes on the data output. App DevOps has found this can help find root causes to code issues as teams can focus on specific problems (e.g. “Are the acceptance criteria correct? if so, is Dave’s code correctly testing for them?”) rather than general problems (e.g. “Dave is an idiot”)

Databricks has a great article on how to architect dev/prod CI/CD.This shows details such how each developer has their own ‘development’ environment but with managed configurations and plugins to ensure all development occurs in the same configuration.

I would personally love to see data science engineering development patterns move into Test / Behavioral Driven Design – mostly because it makes things a lot easier on data end users; but also because it forces strict requirement definitions:…

It’s a little late to pick up on HumbleBundle, but Continuous Delivery with Docker and Jenkins by Rafal Lesko is a great read on the topic of Continuous Delivery, even if you are unfamiliar with the technologies.

At TheoryLane our architects help dis-entangle your existing data processes to help operationalize machine learning and data science solutions. Our data development patterns construct reusable, governed information objects.  Combined, innovative data architecture and development patterns provide reusable streaming context to break the barriers to continuous data deployment and create true value added applications!

Contact us for more information.

Where is Artificial Intelligence in the Hype Cycle?

Artificial Intelligence has an adoption problem.

There are plenty of articles about what AI can do for your business, but why isn’t everyone using it?

We are likely seeing AI going into the downward slope of the “Hype Cycle”

The Hype Cycle is a visual representation tool from American Research. It shows how for a new technology people get really excited at first, are disappointed by their inflated expectations, then people slowly start getting real work done by the new tech.

Why does the ‘trough of disillusionment occur’? Many reasons – but often it’s because the other parts of organization are unable to fully support whatever tech is currently being hyped. AI is no different.

AI is often sold as a ‘spot solution’ – this is industry jargon for a specific tool to solve a specific problem. Unfortunately it doesn’t really work that way. A constant flow of relevant and accurate data is required for an AI to learn and improve – there is a reason why “Artificial Intelligence” and “Machine Learning” are used together so much.

So we are seeing organizations struggle to force their current data architecture to support AI frameworks. Unfortunately this often fails and leads to disappointment. Thus resulting in said disillusionment and, worse, abandoning it altogether.

At TheoryLane our architects help dis-entangle your existing data processes to help operationalize machine learning and data science solutions. Our data development patterns construct reusable, governed information objects.  Combined, innovative data architecture and development patterns provide reusable streaming context to break the barriers of the hype cycle and create true value added applications!

Contact us for more information.