Delivering Digital Analytics Through Auto-Capture

Over recent years I have managed a team that implements digital analytics solutions by integrating third party tooling within the browser. Historically this approach requires in-browser code for every analytic you want to report on. The solution worked, in that it provided the reports our analysts required, but it felt off to me. We were not collecting the raw events, and therefore potential data science use cases could not be satisfied. If an Issue occurred with the deployment of these in-browser solutions the result was incorrect and unrecoverable data. Not to mention the concern of using third party solutions in an era of increasing scrutiny and regulation in privacy.

Anatomy of Digital Analytics

There are three phases of a digital analytics solution:

  1. Collection: the process of capturing raw data in the browser
  2. Refinement: the process of transforming the raw data into “business” events/data
  3. Fulfillment and consumption: the process of pushing the refined data into the write reporting and analytics solutions.

The remainder of this article will focus on discussing how one would perform the collection.

Auto-capture

Auto-capture is the method of identifying raw technical events, such as mouse clicks and field completion, attaching an event handler to those events and forwarding that data to the server.

The identification of the elements that need the event handlers attached is as simple as looking for certain element types and / or looking for existing event handlers in place (e.g. onClick). Once the page is loaded one can simply walk the DOM and examine each element independently. However the advent of DHTML (the tech is actually that old that this term applies) created an ecosystem where the walking the DOM on page load is not enough. MV* frameworks essentially deliver code that generates the UI as the end user interacts with that applications. Elements can be introduced that may need to be auto-captured (and not part of that initial walk).

Enter the Mutation Observer API

While it is a relatively new api it has pretty good browser support [1] with polyfills that broaden its availability [2]. The value of the API is its ability to manipulate elements as they are added to the page. That happens to be exactly what we want to do in our auto-capture scenario, modify the element to chain an additional OnClick or OnBlur event to the element.

Creating a function observeDom which takes an object / element to serve as the starting point for observation and a callback to invoke when a mutation occurs. In my implementation I chose to create that function via a self-invoking function expression we can capture and normalize the name of the MutationObserver. The rest of the code is the mechanics of attaching to the MutationObserver or the legacy event listener code.

In usage this would look like this:

You start by identifying the parent element attach it through the observeDOM function. The callback is invoked for each mutation, with an array of mutation records [3], and each record comes with a collection of both added and removed nodes. In our case we don’t really care about the removed nodes (since you cannot interact with them any more). For each of the added nodes, we need to do some work… potentially attach our event handler.

In addition to handling elements as they are added to the DOM we also need to handle all the initial elements. We probably should keep the observeDOM as an implementation detail of our application too. A better API would be something like instrumentAt(element). This implementation includes the recursive function innerInstrument, that does the work of instrumenting the initial DOM and follows that by setting up MutationObserver API through the observeDOM function.

Data Transfer

Sending the event has a few components

  1. a mechanism to remove duplicates(hasNotFired)
  2. a transformation of HTML / DOM objects to a representation to transfer to the server (makeBasicEventObject, exercise left to the reader)
  3. a bit to perform the transfer (executeSendEvent, preferably fetch-api — again left to the reader)

Supplemental code

The last bits are the predicates that the instrumentAt code uses to identify the elements that will generate the events. The code isn’t comprehensive, but should provide the intuition to build out on your own.

There are many changes and optimizations that can make this code production worthy. There is the need to present an endpoint to post the data to which needs to be peformant and scalable (perhaps FaaS and Queues), and the work to refine raw events into things you are into things you are interested in reporting on (perhaps something ETL-ish).

Obviuosly this doesn’t solve any of the BI and analytics problems — but reporting on data seems like a very different concern than capturing.

About me

References

  1. https://caniuse.com/mutationobserver

2. https://www.npmjs.com/package/mutationobserver-polyfill

3. https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver/MutationObserver

4. https://github.com/tb01923/bobo/blob/master/bobo.js

A 25 year software industry veteran with a passion for functional programming, architecture, mentoring / team development, xp/agile and doing the right thing.