How to Use Cache Busting to Clear Long-Term Cache

What Is Cache Busting?

Caching is a simple process used to enhance the user experience when someone visits your website. Upon first visiting all necessary files are downloaded into the visitor’s browser. Placing the data in the visitor’s browser prevents assets from downloading every single visit. This process is the best solution to maintain a quick download speed for visitors, but a problem arises when files are marked for further caching in the future. For example, if your HTML, JavaScript, or CSS are noted to be cached for a year, the browser will ignore new versions of these files, after these are initially downloaded, for the designated time frame. Regardless of any updates made to your files, a second-time visitor will not experience these amendments until the browser cache expires.

There are two solutions to rectify this problem. The first, and most unlikely, is the visitor will think to clear his or her cache. The second solution is to change the server settings to invalidate the previous cache files and push new files, though this process defeats the purpose of setting up a cache in the first place. There is a better solution for this, cache busting, which forces the browser to download new files. You can enable this by renaming your files every time you make an update to your site.

This fix may sound unnecessarily complicated at first, but there are tools you can use to create a file hash, which eliminates the need to rename your files manually. A file hash is a unique number generated from the contents of a file. This generation means when a file is revised the file hash automatically updates to reflect the renovated contents. Renaming your files automatically with each change to your site says these files will be recognized as new and push to the browser.

 The Plan of Attack

Here is what this looks like: first generate file hashes for each different cached file. The numbers tacked on the to the end of each name is the file hash. This number is the differentiator that will change with each update to the site. These individualizers are auto-generated using a command line tool. The first time the site loads, all these files are transferred into the browser cache and set to expire in one year. The cache busting happens when an update is made to the site, instigating an update to the file hashes which causes the browser to recognize the updated files as new files allowing these files to download to the browser despite the cache expiration date. This process circumnavigates the issue of caches not updating for long periods of time.

Tools of the Trade

This solution not without its faults. With each update and a slew of new files, more and more assets downloading into the browser cache, which will cause a severe slowdown for your visitors. While the browser cache will eliminate whichever files are not used, without a few additional tools on the backend, the congestion is still noticeable.

The first recommended command line tool is gulp-rev. This function automatically attaches file hashes attached to each name. Gulp-rev-replace then automatically amends these hashes with each file update by going into the index.html folder to find attributions to the changes made and re-hashes the files. Finally, gulp-rev-del deletes all inactive and out of date files, so you are left with only the most recent hashed file. When used together, these three commands eliminate the impossible task of manual updates.

A Real Life Scenario

In a test site, add the three main command line functions we just discussed: ‘gulp-rev’, ‘gulp-rev-replace’, and ‘rev-del’. Before beginning, set up a folder and name it ‘limbo’, this folder will capture optimized files for HTML, JavaScript, and CSS.

It acts as a placeholder in between steps. Now let’s look at the steps to take to implement these functions and run a cache bust.

Below is a sample of code for you to work on.

var gulp = require('gulp'),
webserver = require('gulp-webserver'),
postcss = require('gulp-postcss'),
autoprefixer = require('autoprefixer'),
precss = require('precss'),
image = require('gulp-image'),
htmlmin = require('gulp-htmlmin'),
minify = require('gulp-minify'),
cssnano = require('cssnano'),
rev = require('gulp-rev'),
revReplace = require('gulp-rev-replace'),
revDel = require('rev-del'),

limbo = 'limbo/',
source = 'development/',
dest = 'production/';

// Optimize images through gulp-image
gulp.task('imageoptim', function() {
gulp.src(source + 'images/**/*.{jpg,JPG}')
.pipe(image())
.pipe(gulp.dest(dest + 'images'));
});

// HTML
gulp.task('html', function() {
return gulp.src(source + '*.html')
.pipe(htmlmin({
collapseWhitespace: true,
minifyJS: true,
removeComments: true
}))
.pipe(gulp.dest(limbo));
});

// JavaScript
gulp.task('javascript', function() {
return gulp.src(source + 'JS/**/*.js')
.pipe(minify({
// exclude the libs directory from minification
exclude: ['libs']
}))
.pipe(gulp.dest(limbo + 'JS'));
});

// CSS
gulp.task('css', function() {
return gulp.src(source + '**/*.css')
.pipe(postcss([
precss(),
autoprefixer(),
cssnano()
]))
.pipe(gulp.dest(limbo));
});

// Rename assets based on content cache
gulp.task('revision', ['html','css','javascript'], function() {
return gulp.src(limbo + '**/*.{js,css}')
.pipe(rev())
.pipe(gulp.dest(dest))
.pipe(rev.manifest())
.pipe(revDel({dest: dest}))
.pipe(gulp.dest(dest));
});

// Replace URLs with hashed ones based on rev manifest.
// Runs immediately after revision:
gulp.task('revreplace', ['revision'], function() {
var manifest = gulp.src(dest + 'rev-manifest.json');

return gulp.src(limbo + '**/*.html')
.pipe(revReplace({manifest: manifest}))
.pipe(gulp.dest(dest));
});

// Watch everything
gulp.task('watch', function() {
gulp.watch(source + '**/*.{html,css,js}', ['revreplace']);
gulp.watch(source + 'images/**/*.{jpg,JPG}', ['imageoptim']);
});

// Run a livereload web server because lazy
gulp.task('webserver', function() {
gulp.src(dest)
.pipe(webserver({
livereload: true,
open: true
}));
});

// Default task (runs at initiation: gulp --verbose)
gulp.task('default', ['imageoptim', 'revreplace', 'watch', 'webserver']);

Step 1: Implement two new functions: ‘revision’, which runs rev, and ‘rev-replace’, which enters the HTML file and replaces all files. Using the ‘revision’ function, grab all files in limbo with a .js or .css extension.

 

Step 2: Run ‘rev’ on these files to add the file hashes to each file.  Place hashed files into the production folder.

Step 3: Generate a manifest, which is a .json file that lists the original file name along with the new file hashed file name to allow for comparison.  Use ‘rev-del’ to compare the original manifest with the new manifest file and delete all unused files from the production and old manifest file.

Step 4: Print the manifest into the destination folder to use next time the process runs.

Step 5: Run the ‘rev-replace’ function.  This function will run on all HTML files using the manifest for an association.  Original file names replace the appropriate filename from the files in the manifest.  This ensures any site updates update the file and generates a new hash number.  The HTML file will be updated to get the correct reference, so you are always using the latest version of the file.

Note: This combination of coding allows for everything to run automatically, but each gulp task is still reliant on other functions.  There are a few things to remember to take care of those dependencies. To run ‘rev-replace’, you must first run ‘revision’, which runs only after running HTML, CSS, and JavaScript.  There is a specific order the commands must run for the entire proceeding to run.

Step 6: Create a ‘watch’ task can track any changes made to HTML, CSS, and Javascript to set off ‘revision’ and ‘rev-replace’.

The file names have to change for the HTML files to update in the manifest so old files can be deleted to clear out the unnecessary trash.

Step 7: Change the default task to include ‘revreplace’.  This is the task carried out when you type ‘gulp — verbose’ into the command line.

Note: If you execute this task (using the ‘gulp—verbose’ command) without previously establishing a limbo folder, or CSS, javascript, or index.html in production, the web browser will probably run into a “cannot get function” because the HTML file has not yet generated.

However, reloading the webpage, you will see the page refresh as it working correctly.

If you go to your test folder the CSS and JavaScript files should have all received a hash adhered to the end of the file name.  There should also be a new folder called ‘rev-manifest.json’ containing the list of old file names along with the new file names.  ‘Rev.manifest.json’ is the function that replaces the URLs with the correct versions, so the right files are always used.

After these internal changes take place, making any changes to your site will automatically run these tasks allowing for file hash name updates, old file removals, and an uncluttered cache allowing for optimal performance for your user.

Author

Emin is a web developer at Amberd Web Design at Los Angeles, CA. On his free time, he enjoys painting, reading and writing tutorials and “how to” guides to help others with their website issues.

Leave a Comment