_ January 22, 2021_ Ferenc Varga

CACHING AND INDEXING STRATEGIES FOR PWAS – HOW TO BOOST THE PERFORMANCE OF MAGENTO’S VENIA THEME

Progressive Web Applications – PWAs – are meant to be fast by design. One of their greatest promises is that they are faster after the initial loading has finished, as the page elements do not have to be re-downloaded each time. Progressive enhancement refers to the process of providing access to the same content for users with a slower internet connection or less feature-rich browsers and progressively enhance the user experience as the browser and network capabilities increase.

The easier way to fulfill this promise is to optimize every aspect of the application from the bottom-up for the slower networks. In this way, PWAs, optimized by design, should be fast enough on slow networks as well and should fly on faster ones. This involves a number of steps, both in terms of architectural design and implementation: from separating the presentation semantics from the presentation layer to making sure the data is stored, processed, cached and returned in an efficient way.

As far as the progressive browser features are concerned, the ones that have to do the most with performance are the browser service workers. Service workers act as proxies between the web application and the network. They are running in the background and allowing a very efficient caching. Besides them, browser caches are more generally available browser functions. As they can be turned off or can be not available in certain scenarios, PWA is supposed to be able to fall back gracefully to some kind of caching strategy.

Caching in the PWA world has three main aspects, each of them important:

Server-side caching and other types of performance optimization
CDN-level caching
Browser caching

PWA SERVER SIDE CACHING, INDEXING AND OTHER PERFORMANCE IMPROVEMENTS

The Magento backend and PWA Studio use GraphQL as the API between them. GraphQL is extremely flexible as it allows the client-side to pull the data they need. This is good for performance if used to grab only partial data but potentially bad for performance when the queries are too complex or too many nested objects are retrieved.

A good example for nested queries is a cart query which pulls the related products of all the child products of bundle items in the cart or a product query which pulls all information needed to build a product page, including all pricing info, child products, product relations, attributes, reviews and ratings, etc. Though GraphQL allows these types of complex queries this approach can become contra-productive, especially in the case of frameworks such as Magento with a relational database as a storage engine.

When it comes to boosting GraphQL performance, we have the following options:

Writing a series of simple queries on the client-side instead of a fewer number of huge convoluted ones – this a good practice as allows parts of the page rendered earlier during page build.
Optimizing the internals of query resolvers and loaders with batch processing a deferred solution – the Magento 2 team is working hard on this with great success but this approach has its own limitations.
Caching the output of the queries themselves on the server-side, for example with Varnish’s “full page caching”.
Storing, restructuring or indexing data in a smart way so that it is best suited for GraphQL .

VARNISH AS A CACHING SOLUTION FOR PWA GRAPHQL REQUESTS?

Full-page caching (FPC) is relatively easy to use to cache full GraphQL query responses. GraphQL uses the HTTP protocol, so the response of a GraphQL query can be cached the same way as rendered HTML. This approach works fine, but there are at least two reasons why it cannot be considered ideal.

One drawback of Varnish is that Varnish is a frontend cache sitting in front of the backend server as an independent entity and any request should flow once through it in order to be cached. Of course, there are ways to pre-warm the cache but the first hit will always be served by the backend and be slower. In the case of traditional HTML websites where the content, as well as the markup, is actually generated by the backend, Varnish is the single best idea but for GraphQL responses, which essentially contain pure data, it cannot be fully validated.

Another problem with using Varnish for caching GraphQL responses is that normally a small change in the query parameters requires a completely new cache entry so it cannot be efficiently used for caching the layered navigation or the search queries which often allow for billions of different permutations of filter variables.

ELASTICSERCH AS A BACKEND OPTION FOR GRAPHQL

When it comes to the performance of a relational database – MySQL in Magento’s case – it is not necessarily an ideal data storage solution for complex GraphQL queries. The fastest way to find and grab the data needed is to store and organize them in a way that is optimized both for finding and collecting the necessary bits and pieces.

Magento’s layered navigation is a good example of this. Although all the data needed to be displayed on a catalog page within the layered navigation can be pulled from the backend in a single GraphQL query, a number of different SQL queries are required to filter the catalog and build all the different filters and collect product data.

Elasticsearch on the other hand – and a number of similar solutions – is optimized for filtering the data, creating the faceted search filters and retrieving the result in a single, extremely fast operation on a huge dataset.

At ITG Commerce we leveraged these features of Elasticsearch to create our PWA Performance Package to boost search and catalog performance of GraphQL queries. To make it happen, we built our own GraphQL parser, implemented as a lightweight drag and drop replacement of Magento’s original GraphQL engine and Elasticsearch as a backend. The same method can be used to fetch product data for the product details pages.

This approach combines the flexibility of Magento and the power of PWA Studio, GraphQL and Elasticsearch and results in super-fast catalog performance.

SERVER-SIDE RENDERING, PRE-RENDERING AND SEO

Currently, a huge challenge for PWA sites is the search engine compatibility,. There are many aspects to it, such as making the site crawlable via internal links and providing a clean URL structure. Although Google and other bots can run javascript applications in headless browsers and cache the content rendered on the client-side, the best practice is still to implement some kind of server-side rendering, pre-rendering or to provide SEO-relevant content in the initial payload of the web response.

The main difference between SSR – server-side rendering and pre-rendering is that SSR is an integral part of the server-side of the application while pre-rendering involves placing a headless browser between the bots and the application so that they get rendered HTML-content, which they can index as regular content.

Pre-rendering is much easier to implement for PWS but it is less sophisticated. The generated HTML content is only suitable for the search bots as the JS business logic is lost during the rendering process.

SSR – Server-Side rendering is more flexible and renders the content on the server in a way that it can stay dynamic and functional in browsers for the end-users. The process of enlivening the server-generated code on the client-side is called (re)hydration. SSR also allows hybrid methods when only certain parts of the HTML content is rendered on the server-side. SSR also can be static (content rendered during build time) or dynamic (content generated on page load). Of course, any dynamically generated content can be cached by a frontend cache, for example, Varnish.

STATIC CONTENT CACHING WITH CDNS

CDNs are essential for good performance and lift a huge weight from the server infrastructure at the same time.

CLIENT SIDE CACHING

PWAs, at least after first load and re-hydration, start to behave as single page apps and render content from data pulled via some kind of API. To eliminate the need to pull the same information multiple times, API responses can be efficiently cached on the client side. Magento PWA uses GraphQL, which turns out to be especially straightforward to cache as the graphQL response can split to its constituent pieces, stored in an optimized way and re-organized and upgraded freely when requested again.

Magento PWA Studio uses the Apollo GraphQL client, which has an excellent and highly customizable built-in graphQL cache. In PWA Studio, Apollo’s GraphQL works perfectly out of the box in most cases without the need of much fine-tuning.

If it comes to client side caching in PWAs, service workes must be mentioned. The Service Worker in PWAs sits between the network and the application making it possible to cache files and content for offline access and use different caching and fallback strategies depending on the network resource and the nature of the application.

Caching, as well as different forms of pre-rendering indexing, are at the heart of PWAs both on the client and server-side. Without caching done right on all levels, PWAs cannot be both dynamic and performant at the same time.

Interested to know more about caching, Elastic Search and PWA?

In case you are keen to explore our solutions further, we welcome you to talk with one of our professional Magento eCommerce specialists to answer your questions now.

Ferenc Varga

As Chief Technology Officer, Ferenc leads the technology department at ITG Commerce, Ferenc is a recognized Magento Master and certified Shopware Developer.