AO3 News

AO3 performance and growth: some details

Published: 2012-07-15 05:23:30 -0400

Everyone at the Archive of Our Own has been working hard dealing with the recent site expansion and performance problems. Now that we've been able to deal with the immediate issues, we wanted to give everyone a bit more detail on what's happening and what we're working on.

The basics

Our recent performance problems hit when a big increase in users happened, putting pressure on all the bits of the site not optimised for lots of users. We were able to make some emergency fixes which targeted the most problematic points and thus fixed the performance problems for now. However, we know we need to do quite a bit more work to make sure the site is scalable. The good news is there are lots of things we know we can work on, and we have resources to help us do it.

Some users have been concerned that the recent performance problems mean that the site is in serious trouble. However, we've got lots of plans in place to tackle the growth of the site, and we're also currently comfortable about our financial prospects (we'll be posting about this separately). As long as we are careful and don't rush to increase the number of users too fast, the site should remain stable.

The tl;dr details

What level of growth are we experiencing?

The easiest aspect of site growth for us to measure is the number of user accounts. This has definitely grown significantly: since May 1 almost 12,000 new user accounts have been created, which means a 25% increase in user numbers in the past two months. However, the number of new accounts created is only a small proportion of the overall increase in traffic.

We know that lots more people are using the site without an account. There are currently almost 30,000 people waiting for an invitation, but even that is a very, very partial picture of how many people are actually visiting the site. In fact, we now have approximately one and a half million unique visitors per month. That's a lot of users (even if we assume that some of those visitors represent the same users accessing the site from different locations)!

A bit about scalability

The recent problems we've been experiencing were related to the increase in the number of people accessing the site. This is a problem of scalability: the requirements of a site serving a small number of users can be quite different to those of a site with a large userbase. When more users are accessing a site, any weak points in the code will also become more of a problem: something which is just a little bit slow when you have 20,000 users may grind to a halt entirely by the time you hit 60,000.

The slightly counterintuitive thing about scalability is that the difference between a happy site and an overwhelmed one can be one user. Problems tend to arise when the site hits a particular break point - for example, a database table getting one more record than it can handle - and so performance problems can appear suddenly and dramatically.

When coding and designing a site, you try to ensure it is scalable: that is, you set up the hardware so that it's easy to add more capacity, you design the code so it will work for more users than you have right now, etc. However, this is always a balancing act: you want to ensure the site can grow, but you also need to ensure there's not too much redundancy and you're not paying for more things than you need. Some solutions simply don't make any sense when you have a smaller number of users, even if you think you'll need them one day in the future. In addition, there are lots of factors which can result in code which isn't very scalable: sometimes it makes sense to implement code which works now and revise it when you see how people are using the site, sometimes things progress in unexpected ways (and testing for scalability can be tricky), sometimes you simply don't know enough to detect problem areas in the code. All of these factors have been at work for the AO3 at one time or another (as for most other sites).

Emergency fixes for scalability

When lots and lots of new users arrived at the Archive at once, all the bits of the site which were not very scalable began to creak. This happened more suddenly than we were anticipating, largely because changes at the biggest multifandom archive, Fanfiction.net, meant that lots of users from there were coming over to us en masse. So, we had to make some emergency fixes to make the site more able to cope with lots more users.

In our case, we already knew we had one bit of code that was extremely UNscalable - the tag filters used to browse lists of works. These were fine and dandy when we had a very small number of works on the Archive, but they had a big flaw - they were built on demand from the list of works returned when a user accessed a particular page. This made them up-to-the-minute and detailed, but was a big problem once the list of works returned for a given fandom were numbering in the thousands - a problem we were working around while we designed a new system by limiting the number of returned works to 1000. It was also a problem because building the filters on demand meant that our servers had to redo the work every time someone hit a page with filters on it. When thousands of people were hitting the site every minute, that put the servers under a lot of strain. Fortunately, the filters happen to be a bit of code that's relatively easy to disable without hitting anything else, so we were able to remove them as an emergency measure to deal with the performance problems. Because they were such a big part of the problem, doing this had a dramatic effect on the many 502s and slowdowns.

We also did some other work to help the site cope with more users: largely this involved implementing a lot more caching and tuning our servers so they manage their workload slightly differently. All these changes were enough to deal with the short-term issues, but we need to do some more, and more sustained work to ensure that the site can grow and meet the demands of its users.

Scalability work we're doing right now

We've got a bunch of plans for things which will help scalability and thus ensure good site performance. In the short term (approximate timescales included below) we are:

  • Installing more RAM - within the next week. This will allow us to run more server processes at once so we can serve more users at the same time. This is a priority right now because our servers are running out of memory: they're regularly going over 95% of usage, which is not ideal! We have purchased new RAM and it will be installed as soon as we can book a maintenance slot with our server hosts.
  • Changing our version of MySQL to Percona - within the next week. This will give us more information about what our server is doing, helping us identify problem spots in the site which we need to work on. It should also work a bit faster. We've currently installed Percona on our Test Archive and have been checking to see it doesn't cause any unexpected problems - we'll be putting it on the main site in the next week or so. Percona is an open source version of MySQL which has additional facilities which will help us look at our problems. In addition we hope to draw on the support of the company who produce it (also called Percona).
  • Completing the work on our new tag filters - within the next month. These will (we hope!) be much, much more scalable than the old ones. They'll use a system called Elasticsearch, which is built on Solr/Lucene. These are solutions which don't use the MySQL database, so they cut down on a lot of database calls.

Scalability stuff we're doing going forward

We want to continue working on scalability going forward. We've reached a point where the site is only going to get bigger, so we need to be ready to accommodate that. This involves some complex work, so there are a bunch of conversations ongoing. However, this will involve some of the following:

  • Analysis of our systems and code to identify problem spots. We've installed a system called New Relic which can be used to analyse what's going on in the site, how scalable it is, and where problems are occurring. Percona also provides more tools to help us analyse the site. In addition, Mark from Dreamwidth has kindly offered to work with us to take a look at our Systems setup - Mark runs the Systems side of things at Dreamwidth and has lots of experience in scalability issues, so having his fresh eyes on the performance site will help us figure out the work we need to do.
  • Caching, caching and more caching. We've been working on implementing more caching for some time, and we added a lot more caching as part of our emergency fixes. However, there is still a LOT more caching we can do. Caching essentially saves a copy of a page and delivers it up to the next person who wants to see the page, instead of creating it fresh each time. Obviously, this is really helpful if you have a lot of page views: we now have over 16 million page views per week, so caching is essential. We'll be looking to implement three types:
    • Whole page caching. This is the type we implemented as an emergency fix during the recent performance issues. It uses something called Squid, and it's the best performance saver because it can just grab the whole page with no extra processing. Unfortunately, this can also cause some problems, since we have a lot of personalised pages on the site - for example, when we first implemented it, some people were getting cached pages with skins on they hadn't chosen to use. There are ways around this, however, which allow you to serve a cached page and then personalise it, so we'll be working on implementing those.
    • Partial page caching. This is something we already do a lot of - if there are bits that repeat a lot, you can cache them so that everything isn't generated fresh each time. For example, the 'work blurbs' (the information about individual works in a list of search results) are all cached. This uses a system called memcached. We'll be looking to do more, and better, partial caching.
    • Database caching. This would mean we use a secondary server to do complex queries and then put the results on the primary server, so all the primary server is doing is grabbing them.
  • Adding more servers. We’re definitely going to need more database servers to manage site growth, and we’re currently finalising some decisions on that. At the moment, it looks like the way we’re going to go is to add a new machine which would be dedicated to read requests (which is most of our traffic – people looking at works rather than posting them) while one of our older machines will be dedicated to write requests (posting, commenting, etc). Once we've confirmed the finer details (hopefully this week), we expect it to take about two months for the new server to be purchased and installed.

Resources: finances

We'll be posting separately about the financial setup for the AO3, but the key thing to say is that we're currently in a healthy financial state. :D However, as the site gets bigger its financial needs will also get bigger, and we always welcome donations - if you want to donate and you can afford to do so, then donating to the OTW will help us stay on good financial footing. We really appreciate the immense generosity of the fannish community for the support already you've shown us. <3

Resources: people

A lot of supporting the site and dealing with scalability is down to the people. As we grow, we need to ensure we have the people and expertise to keep things running. We are a volunteer-run site and as such our staff have varying levels of time, expertise, and so on. One important part of expanding slowly is ensuring that we don't get into crisis situations which not only suck for our users (like when the 502s were making the site inaccessible) but also cause massive stress for the people working to fix the problems. So, we're proceeding cautiously to try to avoid those situations.

We've been working hard over the last year or so to make it easier for people to get involved with coding and working on the site. We're happy to say this is definitely paying off: we've had eight new coders come on board during the last few months who have already started contributing code. Our code is public on github, and we welcome 'drive by' code contributions: one thing we'd like to do is make that a bit easier by providing more extensive setup instructions so people who want to try running the code on their own machines can do so.

If you'd like to get more involved in our coding teams, then you can volunteer via our technical recruitment form. Please note that at the moment, we're only taking on fairly experienced people - normally we very much welcome absolute beginners as well, but we're taking a brief break while our established team get some of the performance problems under control so that we don't wind up taking on more people than we can support. We love helping people to acquire brand-new skills, but we want to be sure we can mentor and train them when they join us.

Lots of people have asked whether we'd consider having paid employees. It's unlikely that we'll have permanent employees in the foreseeable future, for a number of reasons (taxes, insurance, etc), but we are considering areas where we would benefit from paid expertise for particular tasks. Ideally, this would enable us to offer more training to our volunteers while targeting particularly sticky sections of code. Paying for help has a lot of implications (most obviously, it would add to our financial burden) and we want to think carefully about what makes sense for us. However, the OTW Board are discussing those options.

We're incredibly grateful to the hard-working volunteers who give their time and energy to all aspects of running the AO3. They are our most precious resource and we would like to take the opportunity to say thanks to all our volunteers, past, present and future. <3

Comment

Farewell Sidra

Published: 2012-07-05 07:27:07 -0400

The OTW Board announces with regret the resignation of Sidra, the co-chair and technical lead of our Systems committee, who has been one of our senior technical staff members from our early days. She has supported virtually all our projects with her vast technical skills and immense generosity with her time, and has been one of the foremost contributors to the Archive of Our Own.

Sidra is staying on as a consultant with Systems, so she is not vanishing completely, but we're hugely grateful to her for all her work and want to thank her publicly for her years of incredible service to the org, and we hope all our members and the users of our varied projects will join us in those thanks!

Mirrored from an original post on the OTW blog. Find related news by viewing our tag cloud.

Comment

Release Notes for Release 0.8.20 and 0.8.21

Published: 2012-07-04 15:58:31 -0400

These release notes bring together details for two small updates, one we deployed shortly after Release 0.8.19 to address a couple of issues which arose, and the one we deployed today.

Some of you may have noticed a temporary site glitch on June 30/July 1. This was caused by a leap second in the UTC time standard which caused problems for a number of sites across the web. Our servers got all confuzzled and had to be rebooted by our ever-alert Systems team. Unfortunately, a small number of kudos and subscriptions emails were lost during this time: many apologies for this!

Today's deploy comprises a handful of small tweaks to the Archive, which make up the last of our performance-related "emergency" updates. We will now focus on the upcoming 0.9.0 release, which we're hoping to have ready to go later this month. This release will come with a major rewrite of the browse & search navigation (replacing the currently disabled filtering system) and a host of bug fixes. Expect significant improvements to rich text editing, collections and challenges, and the help pop-ups all around the site, among many other things.

Over the past few months, we've had a total of eight new coder volunteers coming in, many of whom dived right into our code and already submitted bug fixes or are currently working on feature enhancements. In addition, the Testers group welcomed several new volunteers. We are grateful to anyone donating their time, skills and passion to the Archive, be it as coders, testers, tag wranglers or Support staff. Many thanks also to the members of the Volunteers & Recruiting committee, who have been tireless in getting new people sorted and settled in. \o/

Mini-release: 0.8.20 (deployed 17 June 2012)

  • We made some tweaks to the caching system introduced earlier (more details in our Update on AO3 performance issues), because it was generating error messages for some tag pages.

Current release: 0.8.21

  • We added a help pop-up to the Work Search form, including some tips for searching by tag, which lets you combine ratings, warnings, fandoms, characters etc. into a host of search options.
  • Since it was taking a very long time to generate the search index, making it impossible to search for new works, we've disabled searching by the number of kudos (which were the major bottleneck) until the new system is up and running.
  • Several users contacted Support in alarm when it looked like they were suddenly logged into someone else's account. At no point was security actually compromised, this was merely the result of faulty cookie handling in some browsers confusing the caching and serving up "logged in" pages to guests. Since guests cannot access areas restricted to account holders to begin with, this only affected public pages. We believe the problem is now fixed.
  • Assigning numbered IDs to each invite request created a potential security issue which we fixed in Release 0.8.19. Due to this change, the previous, bookmarkable page to check one's position in the queue wasn't working anymore. We fixed the message and page text accordingly.
  • When checking up on an invite request on the requests page, the search button would just hang without a message if your email wasn't found in the queue. This has been fixed.

Known issues

See our Known Issues page. For the latest site status information, check our Twitter AO3_Status.

Comment

From FF.net to AO3 - some frequently asked questions

Published: 2012-06-21 10:13:09 -0400

The last month or so has seen lots of Fanfiction.net users joining the Archive of Our Own - you are all very welcome! As such, we've had a number of Support questions about the ways in which the AO3 differs from FF.net, so we've put together a quick primer to let you know about a few key details. (Although they're focused around the questions we've received from FF.net users, this post will also be relevant to any new user of the site.)

Getting an account

How do I create an account?

You need an invitation to create an account. This is to help us manage site growth (as you may have noticed, the recent expansion has caused a few performance issues). You can request an invitation by adding your name to the invitations queue.

The invitations queue is really long! Can I get an invitation any quicker?

Because so many people have recently moved to the AO3, the invitations queue is very long and wait times currently reach until next year. At the moment, we're still working on dealing with some performance issues, so we're not able to issue invitations any faster. As soon as we're confident we have those under control we'll review the number of invitations we're issuing and take measures to try to reduce the queue. We're really sorry we can't issue invitations to everyone who wants one right away, but we need to ensure the site can cope with the demand first. In the meantime, if you are concerned that your work might be deleted without warning, we recommend saving a copy to your hard drive so that you have a backup you can repost here or elsewhere. (And don't forget to save your reviews, which can't be reproduced elsewhere!)

I have an account - can I get an invitation to give to a friend?

We normally allow existing users to request invitation codes to give to friends, but because of the very high demand we've had to stop issuing invitation requests for now. We'll reenable this option when we have the performance issues under control - we're sorry to have to disappoint you right now.

Posting

Can I import my stories directly from FF.net?

Unfortunately, no. Fanfiction.net is blocking requests from our server. You can read more about this issue in our news post, Problems with imports from FF.net.

Can I import stories from other sites?

Yes, although there may be issues, depending on where you're importing from. To import a story, choose "post new" from the header menu and then click on the "Import From An Existing URL Instead?" button above the text entry fields. You can also go directly to the Import New Work page. For more information, see the Importing and Mass Editing FAQ and our Known Issues relating to imports.

Can I upload my story from a file?

This is not currently possible. You can copy and paste your text from a Word file or the FF.net document editor, but you can't upload or manage files.

How do I keep my formatting when pasting into the work text box?

By default, the text entry box comes up in the mode for directly entering HTML code. To paste formatted text in, click the "Rich Text" button above the box to switch to the Rich Text editor, then paste your text.

Since the Rich Text option relies on a third-party tool, which comes with its own set of bugs, there are currently several issues when pasting in formatted text. In particular, it does not work properly with Internet Explorer 9. Some of these issues should be fixed in an update fairly soon, but in the meantime you can read about them in our Known Issues. If you do have trouble formatting your work, please contact Support and we'll be happy to help you try to work around the difficulties.

How does tagging work?

There are 7 categories of tags you can enter: Rating, Archive Warnings, Category, Fandom, Character, Relationship, Additional Tags.

The Rating, Archive Warnings, and Category tags for your work are set by the choices you tick in the form while posting. You can read more about our policies on ratings and warnings in our Terms of Service FAQ. The important thing to note is that we usually consider ratings to be in the eye of the beholder, so you should use the rating that seems right to you. "Category" is the Archive term for describing a work based on its main relationship (or lack thereof), i.e. M/M, F/M, Gen and so on.

The Fandom, Character, Relationship, and Additional Tags categories are created by typing into the appropriate boxes on the posting form. Tags should be separated by commas. The autocomplete will show suggestions for existing tags, but you can also create completely new tags. The AO3's tag wranglers will link new tags to the "canonical tag" with the same meaning where possible (for example, all versions of a given character's name will be linked). To make it easier for readers to find your work, it's best to choose clear, non-ambiguous tags. You can read more about how tagging works on the Archive in the Tags FAQ.

How do I post a crossover?

Whereas on FF.net there's a special section for crossovers, a work on the AO3 can be marked as a crossover by adding all applicable fandoms as tags and, optionally, using keywords such as "Crossover" or "Fusion" under Additional Tags.

If your work is based on more than one fandom, all you need to do is to enter all the fandom names in the "Fandom" field when you post, separated by commas. So, instead of having to choose a separate "crossover" category, you would simply enter Bleach, Homestuck, Hawaii Five-O (1968) into the Fandom field. Your work will then be listed under all three fandoms. If you include the "Crossover" tag in the Additional Tags, users will be able to more easily find (or avoid) your work in the appropriate tag searches.

Can I post explicit works? Can I post explicit works featuring underage characters? Can I post Real Person Fiction?

Yes, yes, and yes. If your work is fannish in nature and abides by our Terms of Service, you can post it here. We do ask that you label your work appropriately, which can include using the "Choose Not To Use Archive Warnings" and "Not Rated" options if you prefer not to warn or pick a rating. You may also make use of Additional Tags to add content notes that aren't covered by the Archive Warnings.

Does AO3 have "communities"?

The closest thing to FF.net's "communities" on AO3 are Collections. The main difference is that the collection moderator cannot post stories to the collection personally; that has to be done by the authors of the stories. The moderator can add bookmarks to a collection, which will point to the works instead of gathering them up directly.

You can read more about this feature in the Collections and Challenges FAQ (although this section is currently in need of updating). Collections can also be used to set up gift exchanges and prompt memes.

Search

The Archive offers a search form to find exactly the works you're looking for. However, documentation of all its features is in need of an update. In the meantime, you can find lots of tips & tricks, including several example searches, in this post posts: Disabling filters: information and search tips.

One particularly useful thing to keep in mind is that because Ratings, Warnings, and Categories are all tags, you can search for (or exclude) them. So, for example, if you're looking for explicit slash but don't care for violence, you could enter "Explicit" "M/M" -"Graphic Violence" in the Tag search box along with whatever other terms (fandoms, pairings, etc.) you're looking for.

At the time of posting, browsing filters are turned off for performance reasons (you'll see the grey box where they would usually be on work pages). They'll be replaced in a few weeks with all-new filters which will give you more ways to find things on the site.

Reading

Can I make the text bigger or smaller?

Yes! We don't have these options on the individual work pages, like on FF.net. If you're a logged-in user, then you can change the text size (and most other display features) with a site skin: see the Skins FAQ for more information. If you're logged out, or if you just want a quick and easy way to change the text size, then most modern browsers will let you do this by hitting Control + or Control -, or Cmd +/- on a Mac. (This will work on any site!)

Can I make the text light on dark, change the margins, or choose a sans-serif font?

Yes, if you're a logged-in user. These options aren't on individual work pages like on FF.net, but you can change most aspects of the way the site looks with a site skin: see the Skins FAQ for more information. For performance reasons, we've had to disable skins for logged-out users. We're working on ways of bringing them back. In the meantime, if you need a modified display for accessibility reasons and you don't have an account, please contact Support and they'll help you out.

Communication and Feedback

How do I leave a review? Do I need an account?

To leave a review, just type your feedback into the comment field at the bottom of the work you enjoyed (surrounded by a grey box). You don't need to have an account - if you're not a logged-in user, you'll need to leave a name and email address (your email won't be displayed, but the name you give will).

Can I send a private message to another user?

Currently, no. This is a very frequently requested feature that has been approved for future implementation, but at the moment the only option for private communication is email. Some users opt to display an email address publicly; you can check for one by going to the user's home page and choosing "Profile" from their dashboard. Some also link to their journal accounts (such as Livejournal or Dreamwidth) or blogs in their Bio section.

Can I block a user from leaving me feedback?

No. You can, however, delete user comments on your works, including comments left by users who are logged in. If you feel a user's comments constitute harassment (see our TOS) please submit an abuse report. Note that you should not delete the comments in this case, as once deleted they cannot be recovered for review.

Can I turn off comments/kudos from users who are anonymous/not logged in?

No. For several reasons, including avoiding the exclusion of users who have not yet been able to get an invite, this is not a preference we offer to authors. Comments can be deleted, and spam comments should be marked using the "Spam" button to notify our automated spam-tracking software.

Does the Archive have author/story alerts?

Yes, called "subscriptions"--when an update is posted, subscribed users are sent a notification. You can currently subscribe to authors, stories, and series. To manage your subscriptions, click on the "Subscriptions" link in your Dashboard.

Can I see who is subscribed to me/my works?

No. We know that many users want to know how many people are following their works, so we made the numbers available in our Stats feature. (You can read more about it in our admin post about the Statistics Page.) However, to protect user privacy, you cannot see specific information about subscribers.

I want to favorite an author/work/series. How do I do that?

You can bookmark works and series. In this case, unless a bookmark is set to private, an author can see who has bookmarked their works and what notes they've added. An option to bookmark authors is planned, but hasn't been implemented yet.

More Questions and Troubleshooting

What should I read to learn more about how to use the site?

After reviewing our Terms of Service, we recommend that new users should look through our FAQ pages. We also regularly update our news posts with Tutorials, so check there often!

Where can I read updates on changes to the site or known problems?

All updates and changes will be publicised in our News Posts, many of which are mirrored in several locations, including the OTW's Dreamwidth, Livejournal, and Tumblr pages. All code updates (deploys) come with a set of detailed Release Notes, listing all bug fixes and describing major changes.

Is there a way to know if the site is down/having issues?

You can always check our Twitter feed at @AO3_Status.

Where do I go for help?

Whether it's a question, a bug report, or a feature request, you can submit it all through our Support form. We promise to take your question, suggestion, or problem seriously. You can submit anonymous feedback if you desire, but if you leave an email address we will get back to you with an answer as soon as we are able! Your IP address will be registered for spam protection issues, but that information is never available to our support staff.

Comment

Release Notes 0.8.19

Published: 2012-06-13 07:46:29 -0400

Yet another update from your tireless archive volunteers! James from our Systems Committee has been making adjustments behind the scenes to stabilize the servers and get the most out of our caching, and we've seen some good improvements there. At the same time, we've been working on improving or scaling back the areas of our code where changes will give us the biggest gains.

Filtering

In order to improve performance further, tag filtering on work listing pages is disabled for the time being, until we roll out our new system. You can read more about this change in our post on disabling filters. We know this is an inconvenience for many users, but the filters are really the 800-pound gorilla sitting on top of our database - the works pages are both the most popular and the slowest on the site, which is a bad combination. We've had plans to fix them for a while, and that's underway. However, we need a few more weeks to finish and deploy the upgrade, since it also affects our search engine and quite a lot of our code. Our top priority is to make sure works remain accessible to users, and that new works and feedback can be posted and accessed. Looking carefully at our code and our stats, we concluded that removing filtering was the best way to ensure these goals in the short-term.

You'll still be able to view all the works for a particular tag, view the works for a user or collection in a particular fandom, and use our search feature to refine your results. Our post on disabling filters includes some handy tips to help you find what you're looking for. We hope to have full functionality restored to you soon! As a bonus side effect of this change, we've been able to remove the 1000 work limit on lists of works. This is because without the filters we can rely on the pagination system to limit the amount that we retrieve from the database at one time. So, while you can't filter your results any more, you CAN go through and read every work posted in your fandom! We hope this will compensate a little for the inconvenience.

Work Stats Caching

We've also done more caching of work stats (all the counts of comments, bookmarks, hits, etc.), so you may notice that these update more slowly on index pages now. The information is all still being recorded; we're just waiting a little longer to go get the counts for each work to spread out the load.

People Listings

The alphabetical people listings on the People page weren't actually that useful for finding users, and they were another performance drain.

We've replaced the full alphabetical listing with a listing of 10 random users, and added emphasis on the search. Note you can use wildcards in the search, so if you're not sure of someone's name you can enter part of it followed by an asterisk to get similar names. For example, entering Steve* would get Steve_Rogers_lover, SteveMcGarrettsGirl, stevecarrellrocks, etc.

Invitation Requests

We've suspended user requests for additional invitations for now as well. If you need invitations urgently for a challenge or for an archive rescue project, please contact Support. We also fixed an issue that potentially allowed users to snoop for other emails in the waiting queue.

Thank you!

Thanks to everyone who has been working hard on these issues, especially James, who has put in lots of hours tweaking the servers, and Elz, who has been doing the heavy lifting on code changes. Thanks also to all of you for your patience and understanding while we work!

And finally...

The great news is that so far, this emergency measure does seem to be having a noticeable effect. Our server load has diminished dramatically since we deployed this change:

Graph showing server load, with a mark showing the time of the deploy. The load drops dramatically from this time onwards.

Comment

Disabling filters: information and search tips

Published: 2012-06-12 19:58:16 -0400

Key information: As an emergency measure to deal with recent performance issues, we have disabled browsing filters on the site (the grey box of choices which appears on work index pages). This is a temporary measure to ensure that as many people as possible can access the site. You can still use our tags and advanced search feature to find the works you want. As an additional bonus, removing the filters has allowed us to remove the 1000 works cap on lists of works, so you can browse through all the works in your fandom! Read on for more information!

What's happening

As detailed in our recent post on performance, our coders and sys-admins are continuing to work on the performance issues we've been experiencing. We've made some server adjustments which have alleviated some of the worst problems, but we still need to make some substantial changes to fix the issues. We're aware that lots of users are still unable to access the site; as an emergency measure, we've decided to disable tag filters, which put a very heavy load on our servers. This means that the grey box with tags you can check to filter a list of works will no longer appear on the work index pages. We know this will be an inconvenience for many users, but the filters are really the 800-pound gorilla sitting on top of our database. Removing them for now will mean that people can access the site, even if they can't browse quite as easily as usual.

We've been working on significantly redesigning the part of our code that handles filtering for a while - because it's a major performance burden on some of the most popular pages of the site, refactoring this code to make it more efficient has been a priority for some time now. We're almost done with the rewritten version, but it needs more work and extended testing before we roll it out. (We want to be sure it doesn't introduce new bugs.) So, the filters will go away for a few weeks, and will then be replaced by the new, rewritten version.

One major disadvantage of the way the filters were designed was that they needed to retrieve the tags from the list of works found in order to build the filter options. This meant that we had to limit the number of works returned at one time to 1000, because otherwise building the filters would take too long. A side bonus of removing the filters is that we've been able to remove the 1000 works cap! The browsing redesign in progressaims to work around this issue, so we hope to avoid re-introducing this limitation when filtering returns.

How can I find the works I want?

Although the removal of the filters will make it harder to browse the works listings for specific things, there are still lots of ways to find the works you need.

Fandoms page

If you're looking for a specific fandom, you can browse the Fandoms page. Fandoms are organised by media type; the easiest way to find a particular fandom is to use Ctrl + F (or Command + F on a Mac) to search the page in your browser. The fandom pages will give you a list of all the works in your fandom; unfortunately there will be no way to filter that list down further.

Tags

Clicking on any tag will still bring up works with that tag, or with any tag marked as a synonym. So, if you click on Riza Hawkeye you'll get all the works tagged with 'Riza Hawkeye', 'Riza', 'Riza is awesome', etc. Again, while the filters are disabled there'll be no way to filter this list further.

Advanced Search

If you want more refined control over which works you find, you will need to use our Work Search. This feature could use a little bit of prettifying, but the underlying search is quite powerful. Use the following tips to help you find exactly the works you want:

  • A space equals AND. So, entering Fluff Friendship would find you works tagged with both 'fluff' and 'friendship'
  • | equals OR. So, entering Homestuck | My Little Pony will find you works tagged with 'Homestuck' AND/OR 'My Little Pony'
  • - equals NOT. So, entering Supernatural - Castiel/Dean Winchester will find works tagged Supernatural, but will exclude those tagged Castiel/Dean Winchester.
  • Fandom, Character, Relationship, Rating, Category, and Warning are all classed as tags (as well as the 'Additional tags'). So, you can search for works which are Explicit, or exclude works tagged 'Major Character Death'.
  • Using quotes around a phrase will search for that exact phrase. So, "Harry Potter" will get works tagged with 'Harry Potter', whereas Harry Potter will get works tagged with 'Harry' and works tagged with 'Potter'.
  • Entering a term in the tag field will only find works with exactly that tag - so searching for Charles/Erik will bring up only the few works tagged with exactly that tag, not the ones tagged 'Erik Lehnsherr/Charles Xavier' (whereas if you click on the 'Charles/Erik' tag you'll get works with all variations of that pairing).
  • The search has trouble with tags which have dashes in them. If you search for X-Men, for instance, you noticed you'll get lots with X and no X-Men. To get around this, put the tag in quotes: "X-Men".

As well as searching tags, titles, and authors, you can also search for specific word counts, hits, kudos, and dates - including ranges, which is a useful tool for finding fics in a fandom. For example, you can search for all Stargate Atlantis fics published 5-6 years ago.

Some search examples!

  • Find an explicit Fullmetal Alchemist work with the pairing Riza Hawkeye/Roy Mustang, with no Archive Warnings: Enter "Fullmetal Alchemist" "Riza Hawkeye/Roy Mustang" "No Archive Warnings Apply" Explicit.
  • Find works with Rodney McKay but without John Sheppard: Enter "Rodney McKay" -"John Sheppard".
  • Find works tagged with "Alternate Universe" in either the Homestuck or White Collar fandoms: Enter "Alternate Universe" Homestuck | "White Collar".
  • Find all explicit works tagged as angst, but excluding M/M pairings: Enter Angst Explicit -"M/M"

Search bookmarklets

If you find yourself re-using the same search parameters (only T-rated works, only works under 5,000 words, only works with over 10 kudos) for new fandoms or characters you fall in love with, you could give these custom search bookmarklets a try. They are not official AO3 tools, but made by one of our own and utilizing the Advanced Search functionality. Think of them as a saved search that lets you enter a keyword (such as a fandom name or specific kink) and spits out only the kind of work you want to see. For help in putting together your own saved search, don't hesitate to comment on the post or here.

What next?

This is definitely a short term measure, but we think it will have a big effect on site performance. In a few weeks we hope to deploy our all new search and browse features, which will restore more browsing functionality without placing the same load on the servers. We thank you for your patience and understanding while we continue to work on the problem areas.

Post edited 2012-06-13, 12.00 UTC to reflect some minor changes in functionality & bring it up to date.

Comment

Update on AO3 performance issues

Published: 2012-06-11 08:12:17 -0400

Since last month, we've been experiencing frequent and worsening performance problems on the Archive of Our Own as the site has expanded suddenly and dramatically. The number of new users joining the site doubled between April and May, and we currently have over 17,000 users waiting for an invitation. We've been working hard to deal with the 502 errors and site slowdowns, and we've implemented a number of emergency fixes which have slightly alleviated the issues, but these haven't been as effective as we'd hoped. We're confident that we will be able to fix the problems, but unfortunately we expect the next round of fixes to take at least two weeks to implement.

We know that it's really frustrating for users when the site is inaccessible, and we're sorry that we're not able to fix the problems more quickly. We wanted to give you an update on what's going on and what we're doing to fix it: see below for some more details on the problems. While we work on these issues, you should get better performance (and alleviate the load on the servers) by browsing logged-out where possible (more details below).

Why so many problems?

As we mentioned in our previous post on performance issues, the biggest reason for the site slowdowns is that site usage has increased dramatically! We've almost doubled our traffic since January, and since the beginning of May the pace of expansion has accelerated rapidly. In the last month, more than 8,000 new user accounts were created, and more than 31,000 new works were posted. This is a massive increase: April saw just 4,000 new users and 19,000 new works. In addition to the growing number of registered users, we know we've had a LOT more people visiting the site: between 10 May and 9 June we had over 3,498.622 GB of traffic. In the past week, there were over 12.2 million page views - this number only includes the ones where the page loaded successfully, so it represents a lot of site usage!

This sudden and dramatic expansion has come about largely as a result of changes on Fanfiction.net, who have recently introduced more stringent enforcement of their policies relating to explicit fanworks which have resulted in some fans no longer being able to host their works there. One of the primary reasons the AO3 was created was in order to provide a home for fanworks which were at risk of deletion elsewhere, so we're very keen to welcome these new users, but in the short term this does present us with some challenges!

We'd already been preparing for site expansion and identifying areas of the site which needed work in order to ensure that we could grow. This means some important performance work has been ongoing; however, we weren't expecting quite such a rapid increase, so we've had to implement some changes on an emergency basis. This has sometimes meant a few additional unexpected problems: we're sorry if you ran into bugs while our maintenance was in progress.

What we've done so far

Our sys-admins and coders have implemented a number of things designed to reduce the load on the site over the last week:

  • Implemented Squid caching for a number of the most performance intensive places on the site, including work index pages. For the biggest impact, we focused on caching the pages which are delivered to logged-out users. This is because all logged-out users usually see the same things, whereas logged in users might have set preferences (e.g. to hide warnings) which can't be respected by the cache. We initially implemented Squid caching for individual works, but this caused quite a few bugs, so we've suspended that for now while we figure out ways of making it work right. (You can read more about what Squid is and what it does in Release Notes 0.8.17.
  • Redistributed and recalibrated our unicorns (which deliver requests to the server and retrieve the data) to make sure they're focused on the areas where we need them most. This included setting priorities on posting actions (so that you're less likely to lose data when posting or commenting), increasing the numbers of unicorns, and adjusting the time they wait for an answer.
  • Simplified bookmark listings, which were using lots of processing power. We'll be looking into revamping these in the future, but right now we've stripped them back to the basics to try to reduce the load on the site.
  • Cached the listing of guest kudos so the number doesn't have to be fetched from the database every time there are new kudos (which caused a big strain on the servers)

Implementing these changes has involved sustained work on the part of our sys-admins, coders and testers; in particular, the Squid caching involved a great deal of hard work in order to set up and test. Several members of the team worked through the night in the days leading up to the weekend (when we knew we would have lots of visitors) in order to implement the performance fixes. So, we're disappointed that the changes so far haven't done as much as we'd hoped to get rid of the performance problems - we were hoping to be able to restore site functionality quickly for our users, but that hasn't been possible.

What we're going to do next

Although the emergency fixes we've implemented haven't had as much impact as we'd hoped, we're confident that there are lots of things we can do to address the performance problems. We're now working on the following:

  • New search and browse code. As we announced in our previous post on performance issues, we've been working for some time on refactoring our search and browse code, which is used on some of the most popular pages and needs to be more efficient. This is almost ready to go -- in fact, we delayed putting it onto our test archive in order to test and implement some of the emergency fixes -- so as soon as we have been able to test it and verify that it's working as it should, then we will deploy this code.
  • More Squid caching. We weren't able to cache as many things as we'd initially hoped because the Squid caching threw up some really tricky bugs. We're continuing to work on that and we'll implement more caching across the site once we've tested it more thoroughly.
  • More servers. We're currently looking at purchasing a more robust database server and moving our old database server (aka 'the Beast') into an application slot, giving us three app servers. We'll also be upgrading the database software we use so that we can make the most of this server power.

When we'll be able to implement the fixes

We're working as fast as we can to address the problems -- we poured all our resources into the emergency fixes this week to try to get things up and running again quickly. Now that we've implemented those emergency fixes, we think that we need to focus on making some really substantive changes. This means we will have to slow down a little bit in order to make the bigger changes and test them thoroughly (to minimise the chances of introducing new bugs while we fix the existing problems). Buying servers will also take us some time because we need to identify the right machines, order them and install them. For this reason, we expect it to take at least two weeks for us to implement the next round of major fixes.

We're sorry that we're not able to promise that we'll fix these problems right away. We're working as hard as we can, but we think it's better to take the time to fix the problems properly rather than experimenting with lots of emergency fixes that may not help. Since the AO3 is run entirely by volunteers, we also need to make sure we don't burn out our staff, who have been working many hours while also managing their day jobs. So, for the long term health of the site as a whole, we need to ensure we're spending time and resources on really effective fixes.

Invitations and the queue

As a result of the increasing demand for the site, we're experiencing a massive increase in requests for invitations: our invitations queue now stands at over 17,000. We know that people are very disappointed at having to wait a long time for an invitation, and we'd love to be able to issue them faster. However, the main reason we have an invitations system for creating accounts is to help manage the growth of the site -- if the 16,000 people currently waiting for an invitation all signed up and started posting works on the same day the site would definitely collapse. So, we're not able to speed up issuing invitations at this time: right now we're continuing to issue 100 invitations to the queue each day, but we'll be monitoring this closely and we may consider temporarily suspending issuing invitations if we need to.

Until recently, we were releasing some invitations to existing users who requested them. However, we've taken the decision to suspend issuing invitations this way for the present, to enable us to better monitor site usage. We know that this will be a disappointment to many users who want to be able to invite friends to the site, but we feel that the fairest and most manageable way to manage account creation at present is via the queue alone.

What can users do?

We've been really moved by the amount of support our users have given us while we've been working on these issues. We know that it's incredibly annoying when you arrive at the Archive full of excitement about the latest work in your fandom, only to be greeted by the 502 error. We appreciate the way our users have reached out to ask if they can help. We've had lots of questions about whether we need donations to pay for our servers. We always appreciate donations to our parent Organization for Transformative Works, but thanks to the enormous generosity fandom showed in the last OTW membership drive, we aren't in immediate need of donations for new servers. In fact, thanks to your kindness in donating during the last drive, we're in good financial shape and we're able to buy the new server we need just as soon as we've done all the necessary work.

As we've mentioned a few times over the weekend, we can always use additional volunteers who are willing to code and test. If this is you or anyone you know, stop by Github or our IRC chat room #otw-dev!

There are a few things users can do when browsing which will make the most of the performance fixes we've implemented so far. Doing the following should ease the pressure on the site and also get you to the works you want to see faster:

  • Browse while logged out, and only log in when you need to (e.g. to leave comments, subscribe to a work, etc). Most of our caching is currently working for logged-out users, as those pages are easier to cache, so this will mean you get the saved copies which come up faster.
  • Go direct to works when you can - for example, follow the feeds for your favourite fandoms to keep up with new works without browsing the AO3 directly, so you can click straight into the works you like the sound of.

Support form

Our server problems have caused some problems accessing our support form. If you have an urgent query, you can reach our Support team via the backup Support form. It's a little more difficult to manage queries coming through this route, so we'd appreciate it if you'd avoid submitting feature requests through this form, to enable us to keep on top of bug reports. Thanks!

Thank you

We'd like to say a big, big thank you to all our staff who have been working really hard to address these problems. A particular shoutout to James, Elz, Naomi and Arrow, who have been doing most of the high level work and have barely slept in the last few days! We're also incredibly grateful to all our coders and testers who have been working on fixing issues and testing them, to our Support team, who have done an amazing job of keeping up with the many support tickets, and to our Communications folk who've done their best to keep our users updated on what's going on.

We'd also like to say a massive thank you to all our users for your incredible patience and support. It means so much to us to hear people sending us kind words while we work on these issues, and we hope we can repay you by restoring the site to full health soon.

A note on comments: We've crossposted this notice to multiple OTW news sites in order to ensure that as many people see it as possible. We'll do our best to keep up with comments and questions; however, it may be difficult for us to answer quickly (and on the AO3, the performance issues may also inhibit our responses). We're also getting lots of traffic on our AO3_Status Twitter! Thanks for your patience if we don't respond immediately.

Comment

Release Notes 0.8.17

Published: 2012-06-09 01:42:50 -0400

Welcome to our third Release in this week! Elz, James, and Naomi contributed code to this release, and Ariana, bingeling, Enigel, Jenn, and Kylie from our testing teams worked it over. Our sysadmins and coders have done more work to address the performance issues that have been affecting the archive as well as several other bugfixes.

PLEASE NOTE: in the name of drastically improving performance, this deploy may have a few side effects that appear at first to be errors or confusing! Please do read over these release notes and make sure that they don't cover a problem you are experiencing before you contact support.

Further efforts to battle the 502 errors!

This release includes caching of most pages for guests using Squid! Squid will serve up saved versions of pages without hitting our database or application, which increases speed and decreases server load for everyone.

The tricky part is making this work with all of the dynamic elements of the site: skins, content that gets updated by users, personalized messages, etc. We have decided to turn squid on quickly to keep the Archive running smoothly but we'll be working on finding the right balance between customization and performance as we go forward, so you may see some tweaks to different aspects of the site as we fine-tune this.

Current issues related to the caching:

  • Site skins have been disabled for logged out users for the time being - if you rely on this feature for accessibility needs, please contact support and we will get you an account ASAP so you can use the skins again.
  • Comments and kudos from guests may not show up at once for other guests. When a guest leaves kudos or posts a comment, they will see the comment/kudos added. If another guest then visits that same page (or the same guest reloads the page), however, they will see the most-recently-cached version, which may not yet show their comment/kudos count.
  • Guests may occasionally see a stray error message or notice appearing at the top of a page that does not appear to be related to anything you've done. We are working to track all of these errors down but it is hard to be sure we've gotten them all. The messages should not affect using the archive.
  • Hits that are handled by Squid (most hits from guests) will not appear in the hit count immediately. The hit counts will be updated once a day from the squid logs.
  • Duplicate hits from the log files (for instance on page reloads by the same guest) will no longer be removed because of technical limitations, so hit counts may increase more quickly in some cases.

Squid will be enabled after we update the code, so you may not notice any changes right away.

For those interested in knowing more about Squid, see the detailed explanation below!

Changes to Subscription emails

We've gotten feedback about how people use their subscription emails and in response we have adjust the subject lines and message content to allow people to identify the content more easily. Emails will now contain subs of one type (author, series, or work) and the name/title of the first one in the subject together with the number of other updates.

Details

  • Subscriptions:
    • Email subjects will now say [AO3] instead of [Archive Of Our Own].
    • Subscriptions will be bundled by type with subject lines of the form [authors] posted [first item] and [#] more, where first item will be one of: [Work Title], [Chapter Title] of [Work Title], [Work Title] in [Series Name].
  • Performance:
    • Skin chooser is turned off for logged out users.
    • Nearly all pages will be cached for logged out users.
    • Comment forms and other forms that are getting data for logged out users will have their details remembered in cookies and filled in by Javascript rather than remembered in the page.
  • Bug Fixes:
    • 500 errors were appearing on some work listings because of an interaction between caching and time zone conversion - this should be fixed now.

Details About Squid

Senior coder Ana has written up some helpful information about Squid for those who are curious:

"Squid is a really powerful tool that does a lot of things, but we’re using it primarily as one thing: a reverse-proxy cache. A reverse-proxy cache is a system designed to cache (that is, store copies of) web pages. It sits between users’ requests and the rest of the site and stores the responses to some requests so that instead of making the server build the page from scratch again, Squid can check to see if someone’s looked at that page recently and pass on the cached version. This is really useful when you want to send the same page to lots and lots of users because it means that instead of forcing the servers to generate the pages over and over, we can store a copy and give that copy out to everyone.

Of course, sometimes pages change: an author edits a story, or someone leaves kudos, so you don’t want to let Squid keep those copies around forever. Right now we let Squid keep copies for 20 minutes, and then it throws them away and gets a new one. This feels like the right balance between keeping things up to date, but not overloading the servers.

In addition, logged in users get customization on every page, in the form of the user bar at the top of the page if nothing else, which means that we don’t want Squid to store or give pages to logged in users. If it did, then every user would see the user bar for whoever made the request that Squid saved, and it would only change every twenty minutes.

This same principle holds true for all on-page customization (such as the skin-chooser), and finding the right balance between customization and cacheability (how suitable a page is for storing and giving out to everyone) is going to be an ongoing project as we try to weigh site performance against nifty features and information."

These release notes written and compiled by Ana, Claudia, Elz, Enigel, Jenn, Lucy, and Naomi.

Comment


Pages Navigation