Datadog Real User Monitoring (RUM) for a faster payment page

Smitvyas52/ July 21, 2020/ Data Engineering, Engineering, Frontend, Web

At Shaadi.com we focus a lot on customer experience and one of the key outcomes we strive to achieve is giving a seamless experience to our users across our website. In our effort to do that, we are working on – making our payment pages blazing fast!

While planning this, one of the first things that we had to determine was our current performance baseline. Once we established a baseline, we had to measure the improvements while tweaking the performance. Initially, we tried getting the current performance data using various lab tools like Lighthouse.

Shortcomings of lab tools
While lab tools give a general idea of ‘how the application is performing’, overall it may not be a true representation. It does not give us the data variability based on the device type, network connections, browsers based versions, etc. Also, using lab tools did not give us the flexibility to be able to autolog the gathered data.

These shortcomings led us to research more on how we could get a real baseline for the performance of our current payment pages. This is where we started toying with the idea of gathering more meaningful real data – which is popularly known as Real User Monitoring (RUM)

So what is RUM?

focus

Real User Monitoring (RUM) is a type of performance monitoring that captures and analyzes each transaction by users of a website or application. It’s also known as real user measurement, real user metrics, end-user experience monitoring, or simply RUM.

Real User Monitoring vs. Synthetic Monitoring

Real User Monitoring is a form of ‘passive‘ web monitoring. We say passive because it relies on services that constantly observe the system in the background, tracking availability, functionality, and responsiveness.

Synthetic Monitoring is an ‘active‘ web monitoring. In synthetic monitoring, behavioural scripts are deployed in a browser to simulate the path an end-user takes through a website.

Active monitoring permits webmasters to test the application before launch. That makes it an essential tool for sites with a high volume of traffic. Unlike synthetic monitoring, RUM never rests. It collects data from each user using every browser across each request.

How does It Work?

RUM technology collects a website or app’s performance measures straight from the browser of the end-user. A small amount of JavaScript is embedded in each page. This script then collects data from each user as he or she explores the page, and transfers that data back for analysis.

So, was RUM the solution for us – YESdefinitely!

So what next?
While we knew RUM would solve our problem, our next discussions were on how to do this. While building something inhouse was definitely possible, we contemplated on the ‘Buy v/s Build‘ options and explored the available tools. While exploring the market tools some of the must haves in addition to measuring the FCP and load times, were getting cuts around the country, device and URL path. The tool also needed to be able to measure the time spent in loading resources, API calls, and in general, allow us to do a post mortem of detailed time spent in loading a page. That’s when we finalized Datadog RUM.

How we are improving OUR Payment Page performance using RUM?

The first thing we did to enable us to start improving our performance was to build our performance dashboards. We built them using RUM and started analyzing the data flowing in it.

Integration

There are two ways through which we can integrate Datadog RUM. But at Shaadi, we have used the npm setup.

1) Bundle Setup

<script
   src="https://www.datadoghq-browser-agent.com/datadog-rum-us.js"
   type="text/javascript">
</script>
<script>
   if ("%NODE_ENV%" === "development") {
      window.DD_RUM && window.DD_RUM.init({
        clientToken: 'pubXXXXXXXXXXXXX',
        applicationId: 'APPXXXXXXXXXXXXX',
        sampleRate: 0.5
      });
   }
</script>

2) npm Setup (Integrated in Shaadi React)

After adding @datadog/browser-rum to your package.json file

Initialization

Adding custom data
While datadog RUM itself captures some information, it also provides ways of capturing custom data using the global context. This can be used to capture data like the deployment version, payment modes, and build relevant dashboards around it.

1) Add Global Context
Once the RUM has been initialized, add extra context to all RUM events collected from your application

import { datadogRum } from '@datadog/browser-rum';

datadogRum.addRumGlobalContext('<CONTEXT_KEY>', <CONTEXT_VALUE>);

// Code example
datadogRum.addRumGlobalContext('usr', {
    id: 123,
    plan: 'premium'
});


2) Replace Global Context
Once the RUM has been initialized, replace the default context for all your RUM events

import { datadogRum } from '@datadog/browser-rum';

datadogRum.setRumGlobalContext({ '<CONTEXT_KEY>': '<CONTEXT_VALUE>' });

// Code example
datadogRum.setRumGlobalContext({
    codeVersion: 34,
});


3) Customer Users Actions
Once the RUM has been initialized, you can use to generate user actions when you want to monitor specific interactions on your application pages or measure custom timings

import { datadogRum } from '@datadog/browser-rum';

datadogRum.addUserAction('<NAME>', '<JSON_OBJECT>');

// Code example
datadogRum.addUserAction('checkout', {
    cart: {
        amount: 5057,
        currency: 'Rs',
        productCode: 'SSP_SPlus',
        discountPercentage: 15,
    },
});
What data are we measuring?

Using the preset template that Datadog provides, we extended it to build our custom dashboards.

  • Performance metrics: For all views, four browser metrics are highlighted: Loading Time, First Contentful Paint, DOM Content Loaded, and Load Event. For each one of these metrics, widgets show the median, the 75th percentile, and the 90th percentile.
  • Trends: Visualize the evolution of page views, frontend errors related to backend calls failing, JS errors, and long tasks.
  • Page views breakdown: Analyze the nature of your traffic and the associated loading time for each segment.
  • Categorization: Analyse performance based on countries, browsers, device and URL paths.
  • Page performance details: Details on every resource loading time, API loading time, time taken for loading chunks.

How did this help us?
With RUM, we solved our long-pending problem of knowing how we stand in terms of performance for our payment pages. We were able to identify the variations on the performance of each page based on countries, browser, and much more. This helped us identify our focus areas.
With our performance baseline in place, we started shipping performance improvements. After every shipping, we are now able to measure the improvements we are making and accordingly tweak our development based on the data we are gathering.

What is missing?

While RUM definitely has helped us move one step ahead, there are some shortcomings still –

1) Lack of benchmarking
RUM is unable to examine competitor’s websites. That means bench-marking our site’s performance against others is difficult.

2) Effectiveness in pre-production settings is limited
RUM does not work in pre-production settings, which means its a reactive mechanism. If we can simulate our preproduction to get the same level of load and variations, it would be a definite success.

3) Too much data
Some users complain that sifting through the sheer volume of data provided can be daunting.

4) Available on web/msite only
It is not yet supported for mobile apps which is where most of our traffic is. (Though we do hear about it being in the works and should be available soon).

It’s a Win-win!

These dashboards allow us to monitor the FCP and page load time trends. We can also see the trends based on country, device and browser. With one of the performance improvements that we did, we saw an improvement of 20%. This directly resulted in a ~10% win in Orders.

data dog RUM

We continue our journey of making our applications faster by using the extensive dashboards that we have. The improvements that we plan are a direct result of the performance analysis using the RUM. With a dashboard first approach, we are seeing ourselves going in the right direction.

In conclusion…

While we continue to improve our application performance, adopting a data-first and dashboard-first strategy has helped us measure our outcomes while also ensuring us to define a clear direction for future tweaks. 

References :
https://docs.datadoghq.com/
https://www.eginnovations.com/real-user-monitoring

https://stackify.com/what-is-real-user-monitoring/
https://edition.cnn.com/2014/02/03/health/digital-eyestrain/index.html
http://www.soleengineering.com.my/how-walkie-talkies-are-still-beneficial-at-residential-as-well-as-commercial-level/walkie-talkies-benefits/
https://docs.datadoghq.com/real_user_monitoring/dashboards/
https://docs.datadoghq.com/real_user_monitoring/installation
https://docs.datadoghq.com/monitors/