Speed Kit: Making Websites Faster With Service Workers and Dynamic Caching
Jul 21, 2023 • 21 min read
Table of Contents
- Targeting large gains in web performance
- Bottlenecks in modern web performance
- Speed Kit: How it works
- Other optimizations powered by Speed Kit
- Evaluating performance with A/B tests and RUM
- Future work
Targeting large gains in web performance
Twelve years ago, we started our research on web performance and caching of dynamic content with the goal of making the web a much faster place.
In this article, we share the results of this journey and explain how the tool Speed Kit enables large performance gains A/B tested and measured with Real User Monitoring like the one in the chart below.
Largest Contentful Paint comparison from an A/B test measured via Real User Monitoring (RUM).
For a comprehensive list, you can scroll down to our publications list at the very end.
Bottlenecks in modern web performance
On a high level, all pages on the web load in the same way and experience the same sources of performance bottlenecks.
Let’s start with the basics here, to better highlight what changes about the page loading process when Speed Kit is used. If you are a web expert, feel free to skip to the next section.
On a high level, loading a website is very simple:
The client requests the HTML resource.
The server generates a response and sends it to the client.
The client loads linked resources like CSS, JS, images, and other assets.
These steps also make up the main performance bottlenecks; or to frame it more positively, they highlight the common areas of optimization:
Improve the first request with protocol optimizations (fast DNS, 0 round-trip TLS, OCSP stapling, HTTP/3).
On the backend, try to keep processing overhead to a minimum, optimize database queries and slow code paths, and use caching wherever possible.
Keep your critical rendering path (CRP) as short as possible by reducing the number of bytes sent over the network and avoiding render- or parser-blocking resources.
Most likely, any company operating a website has lots of optimizations like these in their backlog, some being harder to implement than others.
With Speed Kit, we will go a slightly different route and not try to solve all of a website’s bottlenecks individually. Instead, we use Speed Kit to change how the browser loads the page.
That does not mean Speed Kit is there to just mask issues in the website’s stack, though. It still makes sense to optimize the current stack to achieve optimal performance in combination with Speed Kit.
Speed Kit: How it works
In contrast to traditional optimizations implemented on an infrastructural or backend level, Speed Kit comes in on the client side and optimizes how the browser loads the page and how it makes use of available caches. This also means that Speed Kit does not require any changes to the backend, frontend or infrastructure of the site.
To implement Speed Kit, you add an async script to your HTML template and host a Service Worker file. Once activated, the service worker runs in a background thread in the user’s browser and acts as a network proxy, seeing every request that the browser sends to the network. Service workers are supported in 97% of browsers, do not require permission to be used, and are completely invisible to users.
Speed Kit uses the service worker to reroute requests to a distributed caching network, shown as Speed Kit service in the graph below. The unique part here is that not only static files like JS, CSS, and images are cached, but also dynamic resources like APIs or HTML that can change at any point in time.
The Speed Kit service uses Fastly as a CDN layer with code execution on the edge and AWS services for application servers, backend caching, and RUM data processing.
For HTML resources, Speed Kit ensures that personalized content (e.g. login state, recommendations, cart) is still loaded from the website's servers. It also automatically keeps the cached content up-to-date. Finally, it enables caching of these dynamic resources on every level of the cache hierarchy, including the user’s device itself.
Caching HTML resources of any website in a generic way is quite a complex task — for e-commerce pages in particular — and we are faced with three main challenges:
How can HTML be cached if it is personalized?
How can caches be kept up to date for millions of pages when content changes unpredictably?
How can dynamic content be cached in the client without an API to purge it?
The next three sections cover how Speed Kit tackles those three challenges.
Challenge 1: Caching personalized content
Personalization and caching are usually not compatible, since multiple users cannot receive the same cached response without losing their personalization. So, HTML that contains personalization is generally considered uncacheable.
That said, in most scenarios, only parts of a page’s content are personalized, and large sections look the same for all users. A good example for this is product detail pages on e-commerce platforms where prices, recommendations or the cart icon may be personalized, but the product image, title and description are the same for all users.
Speed Kit uses this fact to its advantage and loads personalized (non-cachable) and common (cacheable) parts separately by changing how the browser loads the initial HTML. In e-commerce in particular, the common part usually contains important information for the user that needs to be loaded as fast as possible.
The loading process works like this and is shown here:
A user navigates to a page
Speed Kit issues two requests in parallel that race each other:
One request is sent to the Speed Kit service, loading an anonymous version of the page. The request does not include any cookies or session information, and the page is the same for all users and therefore cacheable. It is the one you would see if you opened the page in an Incognito or private tab.
The other request is sent to the origin backend. It contains all the usual cookies and returns the personalized page tailored to the exact user. This is the exact same request that the browser would send without Speed Kit.
Usually, the cached HTML response is much faster and wins the race. The browser receives the HTML and can immediately start loading all the dependencies and render the anonymous page.
Once the backend response from the origin server with the personalized HTML is received, it is merged with the already-rendered HTML and the personalized sections of the page become visible. To the user, the personalized parts are progressively rendered.
Challenge 1.1: Delaying JS to avoid problems from merging personalized content
The following GIF shows how requesting and merging two HTML files plays together with delaying the JS execution.
Speed Kit delays JS execution to avoid breakage after merging anonymous and personalized HTML
The details on how JS is delayed are very important to ensure correct execution and good performance. This is how Speed Kit deals with this challenge:
The cached HTML needs a few preparations that are applied automatically:
The code responsible for merging the personalized HTML document is inserted at the end of the
Since the merging happens asynchronously, we place a blocking external script after that to pause JS execution until after the document merge.
Next, all external scripts are re-inserted after the blocking script, preserving their order. All inline scripts are removed (they tend to contain personalization and need to be executed from the personalized HTML).
An inline script is placed before each external
scripttag that executes all inline scripts from the personalized HTML.
Once the cached HTML is handled by the browser and the parser gets to the end of the body, it will detect the blocking script and request it. As long as that request is pending, JS execution cannot continue.
The Service Worker will receive that special JS request and withhold the response.
Once the code for merging the personalized HTML into the rendered document has been completed, it sends a message to the Service Worker.
On receiving the message, the Service Worker responds to the pending script request with an empty script, causing the JS execution to continue normally.
The inline script between the external script now has access to the personalized HTML and can simply insert and execute the inline script in the correct order.
Delaying JS execution through this approach has some important advantages. Firstly, it prevents DOM events like
domContentLoaded from firing before the actual JS is executed. Secondly, since the external scripts are in the cached document already, the preload scanner of the browser will discover them early, then download and compile them for faster execution. Thirdly, personalized or dynamic inline scripts that are very common in e-commerce applications are easily covered by this.
Challenge 2: Automatic cache sync via Change Detection
Not only is personalization a challenge when caching dynamic resources like HTMLs or API responses; even the anonymous version of the main document can change at any point in time, making it necessary to update caches accordingly. With potentially millions of cache entries necessary, this is a big challenge.
Speed Kit tackles this challenge with a concept called Change Detection. It builds on the mechanism used for personalization to detect changes to the HTML document.
Since every user loads both the anonymous cached version of the HTML and the personalized version from origin, we already have everything we need to validate the cache entry on the client side. If something in the cached version has changed, this is communicated to the Speed Kit service, which refreshes the resource and updates the caches only if the content has actually changed.
As an example: let’s say someone changes the product title for a product detail page in the CMS. The cached version of the page will contain the old product title and is no longer up-to-date. When a user navigates to the product page, they receive the stale cache entry with the old title. Once the personalized HTML with the new product title arrives in the browser, the versions are merged and the new product title appears. As both HTML versions are compared, a cache refresh is triggered because of the changed title.
The first user to detect the change will experience a flicker of the title when the up-to-date title appears upon merge. All other users and even the first user upon reload will not experience that flicker and receive an up-to-date cached version. This effectively crowdsources the challenge of discovering outdated cache entries among millions of cache entries without increasing the origin server load.
Additionally, we use the “cache hotness” information from our shared caches to sample the change detection on frequently-visited pages to reduce overhead.
Deployments of the whole application are treated separately from page-level differences because they can result in structural changes that make it impossible to merge the cached version with the new original version. Speed Kit handles new deployments in a process we call Deployment Detection: If the site links new JS and CSS versions, Speed Kit infers that a deployment was rolled out, so it can purge the entire cache and rebuild it asynchronously.
Speed Kit also has the option to define periodic refreshes (mostly used for dynamic 3rd-party scripts) and a Purge API for deeper integration. Whenever content is updated and caches are purged, Speed Kit automatically pre-warms cache PoPs in regions with a lot of traffic to prevent cache misses.
Challenge 3: Browser caching without staleness
Now that we’ve covered how Speed Kit handles server-side caching, let’s talk about browser caching. The main open question is how can dynamic content be cached in the Browser without an API to purge it.
Standard HTTP caching is not equipped to deal with resource changing irregularly. The standard procedure with HTTP caching is:
The client sends a request to the server.
The server responds with a resource and attaches a
Cache-Controlheader, which contains a time-to-live (TTL).
On the way to the client, the resource is stored in caches. We distinguish between two kinds of caches:
Invalidation-based caches, which have APIs to remove (purge) content from them. This is your standard CDN or server-side cache that is shared by all users. It will generally hold on to your cache entry until the TTL expires, but you can send purge requests to evict cache entries immediately.
Expiration-based caches hold on to resources until their TTL expires without an API to purge them. This is your client-side cache, like browser cache or service worker cache storage.
On subsequent requests, these caches serve the stored content as long as the TTL permits.
This makes it hard to cache HTML files in the client because the content is stale when it changes before the TTL expires. While invalidation-based caches can be purged, expiration-based caches will continue to send stale content back to clients.
With Speed Kit, all resources are cached in the browser and set to high TTL values that are dynamically estimated. So what happens when cached resources are changed before their TTL expires?
When the Speed Kit service detects a resource change that is still cached by clients (e.g. an HTML file), it does two things:
It purges the content in the invalidation-based cache in the CDN and backend.
It adds the URL of the resource in a probabilistic set data structure called a counting
Bloom filters for the remainder of the highest delivered TTL.
The counting Bloom filter is then flattened into a regular Bloom filter and transferred to every connecting client.
When the client loads a resource, the service worker first checks whether the URL is contained in the Bloom filter. If it is not, it can be safely taken from the browser cache. If it is, the worker forwards the request to the network, loads the resource from the CDN, and updates the local cache with it.
The great thing about Bloom filters is that they are very compact by allowing for false positives: sometimes the Bloom filter will match a URL that was not inserted. False negatives, on the other hand, cannot happen — a URL that was inserted will always be matched by the Bloom filter.
To give an example of compactness: A Bloom filter with a false positive rate of < 5% can store 20k different URLs in under 10 KB. So it fits within the initial TCP congestion window and can usually be transferred to the client in a single round trip.
Of course, there are lots of intricate details and trade-offs involved — Bloom filter size, update cycles, TTL estimation, distributed server implementations to name a few — to make sure the performance is maximized.
The way Speed Kit works makes it a very powerful tool, but as with any technology, there are trade-offs to consider that make it more suitable on some pages compared to others.
The way personalized content is loaded and rendered with Speed Kit, means that the biggest performance impact is on pages where the main content, including the
largest contentful paint (LCP) element is not personalized and can be rendered from the cached HTML. If the main content is personalized, performance is still improved by Speed Kit, but the optimization is less impactful. On e-commerce pages, we, therefore, focus mostly on home, category, and product pages and exclude fully personalized pages like cart or checkout.
Client-side rendered sites are a nightmare when it comes to web performance with their
Core Web Vitals range among the worst. On those sites, Speed Kit can struggle to achieve a large performance impact since as long as the HTML is personalized, JS execution needs to be delayed, which also delays the rendering of the page. On the positive side, most e-commerce pages we deal with already use server side rendering (SSR) and we are testing ways to render above-the-fold content on our cache servers for pages that cannot migrate easily.
The first loads of a completely new user are not accelerated with Speed Kit because the service worker needs to be installed on the first-page load. It is very persistent afterwards so returning users and any navigation on the site are fully accelerated.
Change Detection is a great technology that works extremely well in production due to an important trade-off in its design. It takes one user to load a stale cache entry to detect the change and update the cache. This means that one user will see a flicker from old content to the new content when the fresh document from the original server is merged. The flicker is fixed for other users and further reloads due to the change detection. In many scenarios, this is a worthwhile trade-off. Content, where this is not acceptable, needs to be hidden until after the merge.
Other optimizations powered by Speed Kit
Speed Kit’s use of a service worker coupled with our access to Real User Monitoring (RUM) data enable other powerful optimizations (we’ll do a separate blog post with more detail in the future):
Predictive Preloads: RUM data enables learning algorithms to predict where the user can navigate next. This means the cached HTML can already be preloaded before the navigation. Usually, servers of shop systems cannot cope with the added load (+40%-600%) of such preloads, but cache servers do not mind the extra traffic.
3rd-Party Caching: 3rd-party resources can be cached by Speed Kit as well to optimize their caching, save on the additional TLS connection and use a connection that already has bigger bandwidth, avoiding TCP slow start for the new connection.
Image Optimization: By running on the user’s device, the service worker has access to the screen size and DPR of the user. It attaches this information to the image requests and uses an image service to transcode, resize and recompress images automatically.
Evaluating performance with A/B tests and RUM
Of course, performance bottlenecks vary from site to site. That is why we constantly evaluate and test how Speed Kit can improve performance on new websites using A/B tests and Real User Monitoring (RUM) data.
After a few weeks of data collection, we can compare the performance between the two groups and are able to prove the significant improvements to our customers. The RUM tool also stays in place after the A/B test to monitor performance and work with our clients to achieve optimal performance in the long run.
We also use the same methodology of A/B tests with RUM tracking to develop and evaluate new product features and optimizations. Access to detailed RUM data from about 300 million users per month is what enables us to perfect Speed Kit and achieve this great performance.
The next graph shows an example of our A/B test reporting for the Largest Contentful Paint (LCP) enabled by RUM data and the impact that Speed Kit can have on websites.
Even though Speed Kit as a product is already a proven technology and accelerates hundreds of millions of user experiences on a variety of large scale website websites (mostly e-commerce), we still have strong ties to research and are always working on exciting new projects.
Current projects include:
Predictive preloads that use RUM data and on-page signals to learn navigation patterns and preload pages that are likely to be visited (see this talks for details).
Signed Exchanges for e-commerce pages. The technology from Google lets the browser preload pages on Google search results for instant navigation. Usually, this technology is not usable for dynamic sites, but Speed Kit caching enables even for e-commerce sites.
Offline mode is a commonly discussed PWA feature. Speed Kit can support both an offline mode for users who can surf already visited pages and get an offline page as a fallback, as well as a server offline mode (when the original server is not reachable) that serves the cached version of the page where interaction is limited (often called catalog mode).
Speed Kit Delivery Network (SDN) is a variation of Speed Kit where the service is used as a reverse proxy (like a CDN) to accelerate the first loads of new users as well.
Fit to layout image optimization aims to improve the current version of optimizing images according to the device screen size and learns actually rendered image size from RUM data to request images with the exact resolution needed.
Delta encodes HTML and other resources to reduce the food print of pages and optimize bandwidth usage.
Wolfram Wingerath, Benjamin Wollmer, Markus Bestehorn, Stephan Succo, Florian Bücklers, Jörn Domnik, Fabian Panse, Erik Witt, Anil Sener, Felix Gessert, and Norbert Ritter. Beaconnect: Continuous Web Performance A/B-Testing at Scale. Proceedings of the 48th International Conference on Very Large Data Bases, 2022.[ Bibtex | Slides | Video | Paper | Url ]
Felix Kiehn, Mareike Schmidt, Daniel Glake, Fabian Panse, Wolfram Wingerath, Benjamin Wollmer, Martin Poppinga, and Norbert Ritter. Polyglot Data Management: State of the Art & Open Challenges. Proceedings of the 48th International Conference on Very Large Data Bases, 2022.[ Website | Bibte | Slides | Paper ]
Fabian Panse, Meike Klettke, Johannes Schildgen, and Wolfram Wingerath. Similarity-driven Schema Transformation for Test Data Generation. In Proceedings of the 25th International Conference on Extending Database Technology, EDBT 2022, Edinburgh, UK, March 29-April 1, 2022, 2022.[ Dblp | Paper ]
Benjamin Wollmer, Wolfram Wingerath, Sophie Ferrlein, Fabian Panse, Felix Gessert, and Norbert Ritter. The Case for Cross-Entity Encoding in Web Compression. In Web Engineering - 22nd International Conference, ICWE 2022, Bari, Italy, from July 5th until July 8th, 2022. Springer, 2022.[ Bibtex | Paper ]
Benjamin Wollmer, Wolfram Wingerath, Sophie Ferrlein, Felix Gessert, and Norbert Ritter. Compaz: Exploring the Potentials of Shared Dictionary Compression on the Web. In Web Engineering - 22nd International Conference, ICWE 2022, Bari, Italy, from July 5th until July 8th, 2022. Springer, 2022.[ Bibtex | Paper ]
Wolfram Wingerath, Benjamin Wollmer, Felix Gessert, Stephan Succo, and Norbert Ritter. Going for Speed: Full-Stack Performance Engineering in Modern Web-Based Applications. In Companion Proceedings of the 30th World Wide Web Conference, WWW 2021, Ljubljana, Slovenia, 2021.[ Website | Handout |Bibtex | Slides | Video | Paper |Url ]
Wolfram Wingerath and Michaela Gebauer. Handsfree Coding: Softwareentwicklung ohne Maus und Tastatur. iX, 9:70-73, August 2021. (PDF version: wingerath.cloud/2021/ix).[ Website | Bibtex | Video | Paper | Url ]
Fabian Panse, André Düjon, Wolfram Wingerath, and Benjamin Wollmer. Generating Realistic Test Datasets for Duplicate Detection at Scale Using Historical Voter Data. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23-26, 2021, 2021.[ Dblp | Paper ]
Wolfram Wingerath, Benjamin Wollmer, Markus Bestehorn, Daniel Zaeh, Florian Bücklers, Jörn Domnik, Anil Sener, Stephan Succo, and Virginia Amberg. How Baqend Built a Real-Time Web Analytics Platform Using Amazon Kinesis Data Analytics for Apache Flink. AWS Big Data Blog, February 2021.[Bibtex | Url ]
Wolfram Wingerath, Felix Gessert, and Norbert Ritter. InvaliDB: Scalable Push-Based Real-Time Queries on Top of Pull-Based Databases (Extended). Proceedings of the 46th International Conference on Very Large Data Bases, 2020.[ Dblp | Paper ]
Wolfram Wingerath, Felix Gessert, Erik Witt, Hannes Kuhlmann, Florian Bücklers, Benjamin Wollmer, and Norbert Ritter. Speed Kit: A Polyglot & GDPR-Compliant Approach For Caching Personalized Content. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20-24, 2020, 2020.[ Dblp | Paper ]
Wolfram Wingerath, Felix Gessert, and Norbert Ritter. InvaliDB: Scalable Push-Based Real-Time Queries on Top of Pull-Based Databases. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20-24, 2020, 2020.[ Dblp | Paper ]
Benjamin Wollmer, Wolfram Wingerath, and Norbert Ritter. Context-aware encoding & delivery in the web. In Web Engineering - 20th International Conference, ICWE 2020, Helsinki, Finland, June 9-12, 2020, Proceedings. Springer, 2020.[ Dblp | Paper ]
Wolfram Wingerath, Felix Gessert, and Norbert Ritter. Twoogle: Searching twitter with mongodb queries. In Datenbanksysteme für Business, Technologie und Web (BTW), 18. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2019 in Hamburg, Germany. Proceedings, 2019.[ Dblp | Paper ]
Wolfram Wingerath, Felix Gessert, and Norbert Ritter. Nosql & real-time data management in research & practice. In Datenbanksysteme für Business, Technologie und Web (BTW 2019), 18. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS), 4.-8. März 2019, Rostock, Germany, Workshopband, pages 267-270, 2019.[ Dblp | Paper | Url ]
Wolfram Wingerath. Skalierbare und Push-basierte Echtzeitanfragen für Pull-basierte Datenbanken. In Steffen Hölldobler, editor, Ausgezeichnete Informatikdissertationen 2019, volume D-20 of LNI. GI, 2019.[ Dblp ]
Wolfram Wingerath, Felix Gessert, Erik Witt, Steffen Friedrich, and Norbert Ritter. Real-Time Data Management for Big Data. In Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018. OpenProceedings.org, 2018.[ Dblp | Slides | Paper ]
Wolfram Wingerath, Felix Gessert, Steffen Friedrich, Erik Witt, and Norbert Ritter. The Case For Change Notifications in Pull-Based Databases. In Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2.-3. März 2017, Stuttgart, Germany, 2017.[ Dblp | Slides | Paper ]
Felix Gessert, Wolfram Wingerath, and Norbert Ritter. Scalable Data Management: An In-Depth Tutorial on NoSQL Data Stores. In Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2.-3. März 2017, Stuttgart, Germany, volume P-266 of LNI, pages 399-402. GI, 2017.[ Dblp | Slides | Paper ]
Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Erik Witt, Eiko Yoneki, and Norbert Ritter. Quaestor: Query Web Caching for Database-as-a-Service Providers. Proceedings of the 43rd International Conference on Very Large Data Bases, 2017.[ Dblp | Video | Paper ]
Steffen Friedrich, Wolfram Wingerath, and Norbert Ritter. Coordinated Omission in NoSQL Database Benchmarking. In Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2.-3. März 2017, Stuttgart, Germany, 2017.[ Dblp | Slides | Paper ]
Thomas Seidl, Norbert Ritter, Harald Schöning, Kai-Uwe Sattler, Theo Härder, Steffen Friedrich, and Wolfram Wingerath, editors. Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Germany. Proceedings, volume 241 of LNI. GI, 2015.[ Website | Dblp | Proceedings | Url ]
Norbert Ritter, Andreas Henrich, Wolfgang Lehner, Andreas Thor, Steffen Friedrich, and Wolfram Wingerath, editors. Datenbanksysteme für Business, Technologie und Web (BTW 2015) - Workshopband, 2.-3. März 2015, Hamburg, Germany, volume 242 of LNI. GI, 2015.[ Website | Dblp | Proceedings | Url ]
Wolfram Wingerath, Steffen Friedrich, Felix Gessert, and Norbert Ritter. Who Watches the Watchmen? On the Lack of Validation in NoSQL Benchmarking. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Germany. Proceedings, pages 351-360, 2015.[ Dblp | Slides | Paper | Url ]
Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. The Cache Sketch: Revisiting Expiration-based Caching in the Age of Cloud Data Management. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Germany. Proceedings, pages 53-72, 2015.[ Dblp | Paper | Url ]
Felix Gessert, Steffen Friedrich, Wolfram Wingerath, Michael Schaarschmidt, and Norbert Ritter. Towards a Scalable and Unified REST API for Cloud Data Stores. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, pages 723-734, 2014.[ Dblp | Paper | Url ]
Steffen Friedrich, Wolfram Wingerath, Felix Gessert, and Norbert Ritter. NoSQL OLTP Benchmarking: A Survey. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, pages 693-704, 2014.[ Dblp | Slides | Paper | Url ]
Fabian Panse, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. Key-Based Blocking of Duplicates in Entity-Independent Probabilistic Data. In Proceedings of the 17th International Conference on Information Quality, IQ 2012, Paris, France, November 16-17, 2012., pages 278-296, 2012.[ Dblp | Paper ]
Steffen Friedrich and Wolfram Wingerath. Evaluation of Tuple Matching Methods on Generated Probabilistic Data. Master's thesis, University of Hamburg, 2012.[ Bibtex ]
Steffen Friedrich and Wolfram Wingerath. Search-Space Reduction Techniques for Duplicate Detection in Probabilistic Data. Master's thesis, University of Hamburg, 2010.[ Bibtex ]