Code reviewers: Enigel, james_
Testers: Ariana, Lady Oscar, mumble, Ridicully, sarken
With today's deploy we're making some changes to our search index code, which we hope will solve some ongoing problems with suddenly "missing" works or bookmarks and inaccurate work counts.
In order to improve consistency and reduce the load on our search engine, we'll be sending updates to it on a more controlled schedule. The trade-off is that it may take a couple of minutes for new works, chapters, and bookmarks to appear on listing pages (e.g. for a fandom tag or in a collection), but those pages will ultimately be more consistent and our systems should function more reliably.
You can read on for technical details!
We use a software package called Elasticsearch for most of our search and filtering needs. It's a powerful system for organizing and presenting all the information in our database and allows for all sorts of custom searches and tag combinations. To keep our search results up to date for everyone using the Archive, we need to ensure that freshly-posted works, new comments and kudos, edited bookmarks, new tags, etc. all make it into our search index practically in real time.
As the volume of updates has grown considerably over the last couple of years, however, that's increased the time it takes to process those updates and slowed down the general functioning of the underlying system. That slowness has interacted badly with the way we cache data in our current code: works and bookmarks seem to occasionally appear and disappear from site listings and the counts you see on different pages and sidebars may be significantly different from one another.
That's understandably alarming to anyone who encounters it, and fixing it has been our top priority.
The First Step
We are making some major changes to our various "re-indexing" processes, which take every relevant change that happens to works/bookmarks/tags and update our massive search index accordingly:
- Instead of going directly into Elasticsearch, all indexing tasks will now be added to a queue that can be processed in a more orderly fashion. (We were queueing some updates before, but not all of them.)
- The queued updates will then be sent to the search engine in batches to reduce the number of requests, which should help with performance.
- Cached pages get expired (i.e., updated to reflect new data) not when the database says so, but when Elasticsearch is ready.
- Updates concerning hit counts, kudos, comments, and bookmarks on a work (i.e. "stats" data) will be processed more efficiently but less frequently.
As a result, work updates will take a minute to affect search results and work listings, and background changes to tags (e.g. two tags being linked together) will take a few minutes longer to be reflected in listings. Stats data (hits, kudos, etc.) will be added to the search index only once an hour. The upside of this is that listings should be more consistent across the site!
(Please note that this affects only searching, sorting, and filtering! The kudos count in a work blurb, for example, is based on the database total, so you may notice slight inconsistencies between those numbers and the order you see when sorting by kudos.)
The Next Step
We're hoping that these changes will help to solve the immediate problems that we're facing, but we're also continuing to work on long-term plans and improvements. We're currently preparing to upgrade our Elasticsearch cluster from version 0.90 to 1.3 (which has better performance and backup tools), switch our code to a better client, and make some changes to the way we index data to continue to make the system more efficient.
One big improvement will be in the way we index bookmarks. When we set up our current system, we had a much smaller number of bookmarks relative to other content on the site. The old Elasticsearch client we were using also had some limitations on its functionality, so we ended up indexing the data for bookmarked works together with each of their individual bookmarks, which meant that updates to the work meant updates to dozens or hundreds of bookmark records. That's been a serious problem when changes are made to tags, in particular, where a small change can potentially kick off a large cascade of re-indexes. It's also made it more difficult to keep up with regular changes to works, which led to problems with bookmark sorting by date. We're reorganizing that, using Elasticsearch's parent-child index structure, and we hope that this will also have positive long-term effects on performance.
Overall, we're continuing to learn and look for better solutions as the Archive grows. We apologize for the bumpy ride lately, and we hope that the latest set of changes will make things run more smoothly. We should have more improvements for you in the coming months, and in the meantime, we thank you for your patience!