Black Friday: How It Played Out, An Eyewitness Account

by Alex Dault, NGSIS Program Assistant

“If we make it to 9:10 am, we’ll be okay.”

Miki Harmath utters these words at 8:54 AM, August 4th in the EASI war room. The room is lit up with projections of ACORN log-on stats and server usage. There is a box of untouched doughnuts and muffins on the table but no one has much of an appetite. All eyes are glued to the screens indicating the load balance of the four servers and the steadily increasing number of users attempting to access the system.

Laurel Williams, IT Analyst and Java Build Coordinator, nurses her coffee and murmurs:

“I told the barista that it was a big day for enrolment. He asked if I was a registrar. “No, I’m in IT” I said. He gave me this coffee for free.”

At 9:00 am, registration opens to the 4th year students. The number of active users starts to leap upwards, jumping up into the thousands in a matter of seconds.

This is Black Friday, the first Friday in August which is the “priority drop” enrolment period for all Faculty of Arts & Sciences (FAS) students. The day has become infamous among IT and Enrolment staff because for the past two years (2015, 2016) the massive volume of log-ins (33,000 anticipated but as many as 60,000 possible) to the ACORN and SWS site has overloaded servers and crashed the site.

A reddit thread from the r/UofT subreddit records the frustration of last year’s meltdown.

“RIP ACORN” posted user BenZion last year, “Couldn’t even make it to the login page”

“There is no hope.” wrote back user SoupDoge (also back in 2016)

Late in the afternoon in 2016, a representative from the ACORN project team replied to the chorus of acrimony with the following response:

We built ACORN to improve the student user experience. This takes into account not only features and interactions on the app itself but the entire experience of using it. Today we clearly failed big time. For those impacted we apologize sincerely. There’s not much to say that will change or help what happened this morning. The idea that this is an inevitable occurrence each year is not in line with our goals for ACORN or our values as a team. It’s obviously something we need to change and improve upon so this doesn’t happen again.

-ACORN Project Team

It was a contrite and heartfelt apology and sincere in that the ACORN and IT systems teams were determined not to allow this to happen next year.

* * *

“Well, it’s 9:11 so I think we’ll make it….” murmurs Miki Harmath, his knuckle tapping superstitiously on the wooden table. Up on the screen, the number of active users is dropping and the rush is coming to an end.

But the worst is yet to come. This was just the 4th Year registration. In less than an hour, the third years will log-on at 10:00 am, then the second years at 11:00 am.

“But the real tsunami wave comes at noon,” cautions Marilee Keogh, Manager for Technical Services.

“That’s when the first years come online.”

* * *

Over the course of the last six months, a number of initiatives have been undertaken to prepare for the Black Friday slam; some short term and some long term.

In the short term, the EASI/NGSIS team has developed Peak Load Mode ACORN (PLM ACORN) that disables some non-essential functionality (Notifications, Invoice, Financial Awards & Aid information and multiple degree invitations) to improve performance. This new lightweight landing web page presented to students on heavy registration days that has less overhead on the system resources than the regular ACORN dashboard and thus loads faster.

A specialized team within EASI and ITS has also improved load balancing on the ACORN application servers, ensuring the server performance is better shared amongst the multiple servers. Furthermore, they’ve streamlined the login process that speeds up the communications between the different systems involved in authenticating and authorizing a student when accessing ACORN and made a number of important technical changes that improve overall performance in ACORN’s backend systems.

ITS has also set its eyes on the bigger ‘backend’ issue, which is the Mainframe. The University is badly in need of new infrastructure to support the student database. Efforts to replace this back-end system are well underway but such a massive effort in a single year was not even a remote possibility.

* * *

It is 11:35 am. Fourth, Third and Second Year students have all been granted access to their enrolment cart. There are 1728 active users on the ACORN site and beginning to descend.

The EASI Black Friday Team in their War Room

Frank Boshoff, Data Architect and Client Services Manager, glances at an infographic indicating that 20% of users are registering for their courses on a mobile device.

“If I was a student, I wouldn’t be doing something so important on my mobile phone.”

“They’re totally different people than us, Frank.” answers Laurel Williams, taking a wistful sip of coffee.

“It’s a new age.”

Silence falls over the team as the clock hits 11:50 am.

The fact that there are still 1600 active users with ten minutes to go before the first years come online is making the team nervous. The Google Analytics page is showing that first year students are already starting to appear online by the hundreds and encounter the staggered enrolment page.

“They’re at the gates.” says Haroon Rafique, Technical Lead for SIS.

“What are they saying on social media?” asks Laurel Williams anxiously.

“It’s pretty quiet for the most part,” replies Mike Clark, EASI UX Manager.

“There’s a small thread going on the U of T subreddit comparing the load speed of SWS and ACORN.”

Up on the screens, it is obvious why SWS is running slightly faster than ACORN at the present moment. ACORN has 1700 live users compared to just 500 on the SWS. Power users accustomed to rapidly typing in course codes heavily favour SWS for its speed and familiarity.

At 11:58 a message comes in to Mike from a student asking if ACORN is down. The third year student says that they should have been able to access the service at 11:00. Mike needs more information to assist and writes a response to the student with a small flurry of questions.

“Something is going on with IDP,” says Andre Kalamandeen, an ACORN development team leader.

“Well, if they can just fix that in the next minute, that’d be great.” says Mike wryly as he shoots a second email out to registrars. His soldierly calm in the face of potential disaster is impressive to say the least.

It is now 12:00 pm.

The number of users online ticks up to 2,300 and keeps rising. A world map infographic shows that users are logging in from as far away as Beiijing and Siberia. If this were the olden days of lining up to enroll at the Registrar’s office, it would be akin to two thousand students coming in through the windows, doors, floorboards and ventilation pipes to simultaneously ask to be registered.

“They can’t get in. There’s a bottleneck at IDP.” says Andre Kalamandeen, a hint of anxiousness in his voice.

The pipes in the wall creak and groan sympathetically, the building itself manifesting the strain on the four servers.

The team watches helplessly as the number of users stuck on the IDP authentication page climbs into the thousands. The minutes tick by and there is a noticeable slowdown in service. The four servers are being hammered from all over the world.

This is the moment. All the preparations and tests and meetings have led to this moment. If ACORN is going to explode, it is going to happen right now.

But it doesn’t explode. And the number trapped on the IDP page start to drop. First year students are getting onto the site and registering for courses.

The service is working. There was a bit of slowdown, but it is still working.

* * *

A few hours later, the team has gone out for a celebratory lunch. The consensus is that there is still much more to be done but that huge strides have been made. Accolades have been pouring in from students and from the registrars’ offices.

“”I just wanted to say thank you to you and the team for managing a really complicated challenge today. Great job!”

“Congratulations! I did see the improvement on ACORN. This year I spent less time on loading in ACORN than last year. From 30 mins to 15 mins. Good job!”

“I really appreciate all the hard work your team put into preparations for today, and how communicative you’ve been with the registrarial community this morning. It’s made a big difference for our staff and our students”

“Good job guys, and thank you for the updates, they were very useful.”

Best of all, there is a rumour has they have even come up with a new name for the first Friday in August.

They’re calling it Light Grey Friday.