Karine Bosch’s Blog

On SharePoint

Part 13: The Etag header


Introduction

Entity tags (ETags) are a mechanism that web servers and browsers use to validate cached components. As the browser downloads components, it stores them in its cache. On subsequent page views, the cached components are read from disk on the condition that they are still “fresh”.

One of the checks the browser executes to decide whether a component is “fresh”or not, is by evaluating the Expires header.  This Expires header is sent by the server. If the component is not expired yet, no additional HTTP request is sent to the server.

ETags provide another way to determine whether the component in the browser cache still matches the component or entity on the server.

The ETag HTTP header was introduced with HTTP/1.1. It is a string that uniquely identifies a specific version of a component. The ETag is sent by the server as HTTP response header.

The ETag was introduced to provide a more flexible mechanism for validating entities than the last-modified date. If for example a component is different based on the User-Agent or Accept-Language headers, the state of the entity can be reflected in the ETag.

If the visitor requests the page again, the browser can validate the component in its cache by sending a If-None-Match header to the server. If the ETag on the server matches the one sent by the brwoser, a 304 Not Modified status code is returned and the component stored in the browser cache can be used. The ETag HTTP header will always cause a conditional GET request towards the server, while the Expires header only executes a GET request when the date has expired.

There are cases that there is no Expires header, but a ETag header and a Last-Modified header. In that case validation of a resource is done based on the last-modified date and the ETag. The browser looks at its own cache and then issues a request with request headers using If-Modified-Since and If-None-Match headers. If the If-None-Match header matches the ETag and the If-Modified-Since is still the same date as the Last-Modified header then the web server responds with a 304 Not Modified response. The moment you change the file the cache becomes invalid and the web server issues a new 200 OK response and sends the file to the browser.

If both the Last-Modified and the ETag HTTP response headers are available, the  If-None-Match header takes precedence over If-Modified-Since. Even if your components have a far future Expires header, a conditional GET request is still made whenever the user clicks Reload or Refresh, resulting in a small degrade of the performance.

The problem with ETags

As the ETag HTTP header is typically issued by a server, the ETag becomes an issue when a web site is hosed on more than one server. As already mentioned, the ETag is a string that uniquely identifies a version of a component. When the web site is hosted on more than one server, the string is extended to make it unique to a specific server.

Let’s say you have a SharePoint farm with 4 web front-end servers (WFEs). If on a first request the component was served from the first WFE, the ETag will be different then when served from a different WFE. When later on the visitor wants to view the page again and the browser sends a conditional GET request, it is possible that the load balancer sends this request to a different WFE. In most cases the ETag will be different and the component will be resent to the browser, although the component in itself hasn’t changed. Instead of the small 304 Not Modified response, the browser will get a normal 200 OK response along with the component, which of course degrades the performance.

The ETag issue also degrades the effectiveness of proxy caches. The ETag cached by users behind the proxy frequently won’t match the ETag cached by the proxy, resulting in unnecessary requests back to the origin server. Instead of one 304 Not Modified response between the user and the proxy, there will betwo (slower, bigger) 200 OK responses: one from the origin server to the proxy, and another from the proxy to the user.

How is an ETag generated in SharePoint 2010?

SharePoint resources that are typically needed on web pages are CSS files, JavaScript files, images and mutli-media files. These can be stored in different ways in SharePoint. The most popular ways are:

  • Uploading files in docuent libraries. This way they can be authored by content editors. It is not that abnormal that even CSS files and JavaScript files are uploaded to document libraries if your content editors are powerful enough to make regular changes to it.
  • Deploying files in the _layouts folder of the SharePoint root.

Files uploaded in document libraries

When a file is uploaded in a document library, it is stored in the content database. When the file is requested, it is retrieved from the content database. When BLOB caching is activated on the SharePoint web application, the files will be stored in the BLOB cache for a well-defined period of time.

In SharePoint 2010 the ETag is part of the file properties which are stored together with the file in the content database.

Etag Powershell

You see that the vti_etag property is composed of a GUID and an integer number. The integer number comes from the vti_docstoreversion property which indicates how much times the file has been changed. We can find the GUID also in the AllDocs table in the content database where it is the ID of the document.

Etag content db

We did the test by restoring the same content database on a number of different environments (development, test, acceptance and production). When we requested the home page, we always saw that the ETags for the requested components contained that UniqueId, suffixed with an integer number.

This Id is created during execution of the [proc_AddDocument] stored procedure, which is called when a document is added to a document library.

Files deployed in the _layouts folder

But how is the ETag generated in SharePoint when the file is deployed in the _layouts folder? It looks like an identifier that is generated on the fly. And in that case there is a difference in the ETag from WFE to WFE. Following pictures are taken from 2 subsequent page loads clearly showing a different ETag for the same file:

Different ETags

I assume that the ETag is calculated based on the Last-Modified date and as there is a small difference in there, the ETag is different. We assume that the small difference in Last-Modified date has is caused by the deployment.

This makes that the WFE considers the resource as modified and returns it with a 200 OK response, resulting in a small performance decrease.

How to solve the issue with ETags in SharePoint 2010?

If you have components that have to be validated based on something other than the last-modified date, ETags are a powerful way of doing that. If you don’t have the need to customize ETags, it is best to simply remove them. Both Apache and IIS have identified ETags as a performance issue, and suggest changing the contents of the ETag or even remove them.

There are two solutions to the problem:

  • Configure the ETag sent
  • Remove the ETag from the HTTP response header

Remove the ETag from the HTTP response header

You can remove the ETag in 2 different ways:

  • with a IIS rewrite module
  • with a HttpModule

With an IIS rewrite module, you could add the following rewrite rule:

<rewrite>
   <outboundRules>
      <rule name="Remove ETag">
         <match serverVariable="RESPONSE_ETag" pattern=".+" />
         <action type="Rewrite" value="" />
      </rule>
   </outboundRules>
</rewrite>

I found this solution here.

Within a HttpModule class, you can remove the HTTP response header in the PostReleaseRequestState event handler.

public class CustomHttpModule : IHttpModule
{
        public void Init(HttpApplication context)
        {
            context.PostReleaseRequestState += context_PostReleaseRequestState;
        }

        private static void context_PostReleaseRequestState(object sender, EventArgs e)
        {
            HttpApplication httpApplication = sender as HttpApplication;
            if (httpApplication != null)
            {
                HttpResponse response = httpApplication.Response; 
                response.Headers.Remove("Server");
                response.Headers.Remove("X-AspNet-Version");
                response.Headers.Remove("ETag");
            }
        }

        public void Dispose()
        {
        }
}

You have to register your custom HttpModule in the web.config:

<system.webServer>
...
    <modules runAllManagedModulesForAllRequests="true">
        <add name="CustomHttpModule" type="MyNamespace.CustomHttpModule, MyAssembly"/>
    </modules>
...
</system.webServer>

If you already have a HttpModule class in your assemblies, I would opt for this solution.

Remark: a lot of developers remove the HTTP response headers in the PreSendRequestHeaders event handler but it seems this can cause a corruption in the heap when the HttpCacheModule is enabled on IIS 7.0/7.5. You can read the details here.

2 Comments »

  1. very informative article! Thanks

    Comment by Ruchi | February 26, 2015 | Reply

  2. Thank you – you saved me a lot of work 🙂

    I am using Sublime Text editor, which have a plugin allowing you to do request simply by writing an URL. The response opens in a new tab with HTTP headers, info of load time and of cause the response body. It also allows you to add HTTP Headers for your request by adding them below the URL like this:

    http://www.altistore.no/Files/Templates/Designs/Altistore-2012/javascript/addToCart.js
    If-None-Match:”5941dbdf826bd01:0″

    And that is where your post came in handy: You described how to handle ETags in the request, and it proved to me that the solution handled the ETags correctly. I got the response:

    304 Not Modified
    Accept-Ranges:bytes
    ETag:”5941dbdf826bd01:0″
    Server:Microsoft-IIS/8.5
    X-Powered-By:ASP.NET
    Date:Mon, 13 Apr 2015 09:30:05 GMT

    Latency: 6ms
    Download time:0ms

    Comment by netsi1964 (@netsi1964) | April 13, 2015 | Reply


Leave a comment