Delivering Digital Analytics Through Auto-Capture

Over recent years I have managed a team that implements digital analytics solutions by integrating third party tooling within the browser. Historically this approach requires in-browser code for every analytic you want to report on. The solution worked, in that it provided the reports our analysts required, but it felt off to me. We were not collecting the raw events, and therefore potential data science use cases could not be satisfied. If an Issue occurred with the deployment of these in-browser solutions the result was incorrect and unrecoverable data. Not to mention the concern of using third party solutions in an era of increasing scrutiny and regulation in privacy.

Anatomy of Digital Analytics

There are three phases of a digital analytics solution:

  1. Collection: the process of capturing raw data in the browser

The remainder of this article will focus on discussing how one would perform the collection.

Auto-capture

Auto-capture is the method of identifying raw technical events, such as mouse clicks and field completion, attaching an event handler to those events and forwarding that data to the server.

The identification of the elements that need the event handlers attached is as simple as looking for certain element types and / or looking for existing event handlers in place (e.g. onClick). Once the page is loaded one can simply walk the DOM and examine each element independently. However the advent of DHTML (the tech is actually that old that this term applies) created an ecosystem where the walking the DOM on page load is not enough. MV* frameworks essentially deliver code that generates the UI as the end user interacts with that applications. Elements can be introduced that may need to be auto-captured (and not part of that initial walk).

Enter the Mutation Observer API

While it is a relatively new api it has pretty good browser support [1] with polyfills that broaden its availability [2]. The value of the API is its ability to manipulate elements as they are added to the page. That happens to be exactly what we want to do in our auto-capture scenario, modify the element to chain an additional OnClick or OnBlur event to the element.

Creating a function observeDom which takes an object / element to serve as the starting point for observation and a callback to invoke when a mutation occurs. In my implementation I chose to create that function via a self-invoking function expression we can capture and normalize the name of the MutationObserver. The rest of the code is the mechanics of attaching to the MutationObserver or the legacy event listener code.

const observeDOM = (function () {
const MutationObserver = window.MutationObserver ||
window.WebKitMutationObserver;

return function (obj, callback) {
if (!obj || !obj.nodeType === 1) return; // validation

if (MutationObserver) {
// define a new observer
const obs = new MutationObserver(
function (mutations, observer) {
callback(mutations);
})
// have the observer observe foo for changes in children
obs.observe(obj, {childList: true, subtree: true});
}

else if (window.addEventListener) {
obj.addEventListener('DOMNodeInserted', callback, false);
obj.addEventListener('DOMNodeRemoved', callback, false);
}
}
})();

In usage this would look like this:

const parent = document.querySelector('html > body');
observeDOM(parent, function (domMutation) {
const handleRecord = (record) => {
record.addedNodes.forEach(/* some callback */)
}

domMutation.forEach(handleRecord)
});

You start by identifying the parent element attach it through the observeDOM function. The callback is invoked for each mutation, with an array of mutation records [3], and each record comes with a collection of both added and removed nodes. In our case we don’t really care about the removed nodes (since you cannot interact with them any more). For each of the added nodes, we need to do some work… potentially attach our event handler.

In addition to handling elements as they are added to the DOM we also need to handle all the initial elements. We probably should keep the observeDOM as an implementation detail of our application too. A better API would be something like instrumentAt(element). This implementation includes the recursive function innerInstrument, that does the work of instrumenting the initial DOM and follows that by setting up MutationObserver API through the observeDOM function.

const instrumentAt = (parentElement) => {

const innerInstrument = (element) => {
if (isClickable(element)) {
element.addEventListener('click',
sendEvent.bind(null, element));
}

if (isBlurable(element)) {
element.addEventListener('blur',
sendEvent.bind(null, element));
}

for (let item of element.children) {
innerInstrument(item);
}
}

innerInstrument(parentElement);

///////////////////////////////////////////////////////////////
// observe changes to the DOM, and instrument each of them
///////////////////////////////////////////////////////////////
observeDOM(parentElement, function (domMutation) {
const handleRecord = (record) => {
record.addedNodes.forEach(innerInstrument)
}

domMutation.forEach(handleRecord)
});
}

Data Transfer

Sending the event has a few components

  1. a mechanism to remove duplicates(hasNotFired)
var alreadyFired = {};
const makeKey = (element, event) =>
'' + event.type + event.timeStamp;
const hasNotFired = (element, event) => {
const key = makeKey(element, event);

if (alreadyFired[key]) {
return false;
}
alreadyFired[key] = 1;
return true;
}
const executeSendEvent = (object) => {
/* some fetch-api code */
}
const makeBasicEventObject = (element, event) => {
/* some code to convert this event ->
to an object to send to the srver */
}
const sendEvent = (...args) => {
const event = args.pop();
const element = args.pop();

//////////////////////////////////////////////////
// we might not want all clicks to fire
//////////////////////////////////////////////////
if (hasNotFired(element, event)) {
const object = makeBasicEventObject(element, event);
executeSendEvent(object);
}
}

Supplemental code

The last bits are the predicates that the instrumentAt code uses to identify the elements that will generate the events. The code isn’t comprehensive, but should provide the intuition to build out on your own.

// helpers
const or = (boolean1, boolean2) => boolean1 || boolean2;
const any = (arr) => arr.reduce(or, false);
const identity = i => i;
const leftToRightComposition = (f, g) => (x) => g(f(x));
const pipe = (functions) =>
functions.reduce(identity, leftToRightComposition);

// predicates
const isTag = (tagName) => (element) =>
element.tagName.toLowerCase() === tagName;

const hasEventHandler = (eventName) => (element) =>
element[eventName] !== null &&
element[eventName] !== undefined;

const isInput = isTag('input');
const isInputType = (inputType) => (element) =>
isInput(element) && element.type.toLowerCase() === inputType;


// elementPassesAnyPredicate: input an array of (DOM -> boolean)
// "predicate" functions and then a DOM element.return true if
// the DOM element passes any of the predicates
const elementPassesAnyPredicate = (predicateArray) => (element) => {
// apply the DOM element to some function f
const applyelement = (f) => f(element);
// convert Array<predicate> to Array<boolean>
const bools = predicateArray.map(applyelement);
// if any of the bools are true the element is clickable
return any(bools)
}

// these are things that can be changed by typing
const isBlurable = elementPassesAnyPredicate([
isInputType('text'),
isInputType('textarea')
]);

// these are things that are inherently clickable
const isClickable = elementPassesAnyPredicate([
isTag('button'),
isTag('a'),
isInputType('submit'),
isInputType('button'),
isInputType('radio'),
isInputType('checkbox'),
hasEventHandler('onclick')
]);

There are many changes and optimizations that can make this code production worthy. There is the need to present an endpoint to post the data to which needs to be peformant and scalable (perhaps FaaS and Queues), and the work to refine raw events into things you are into things you are interested in reporting on (perhaps something ETL-ish).

Obviuosly this doesn’t solve any of the BI and analytics problems — but reporting on data seems like a very different concern than capturing.

About me

References

  1. https://caniuse.com/mutationobserver

2. https://www.npmjs.com/package/mutationobserver-polyfill

3. https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver/MutationObserver

4. https://github.com/tb01923/bobo/blob/master/bobo.js

A 25 year software industry veteran with a passion for functional programming, architecture, mentoring / team development, xp/agile and doing the right thing.