A thread on how/why the birdsite is "still up":
Software in general can run a long time if you leave it alone. Think of your laptop. If you open it up, leave it on and plugged in, how long could it run if you didn’t change anything?
A thread on how/why the birdsite is "still up":
Software in general can run a long time if you leave it alone. Think of your laptop. If you open it up, leave it on and plugged in, how long could it run if you didn’t change anything?
At some point, there will be a need for some maintenance. If you leave your laptop on long enough it might restart automatically for a software update to install new features, bug fixes, and security patches. Your Wi-Fi router might overheat. Or a shark might chew through an undersea internet cable (look it up).
When running at Twitter’s scale, components fail all the time. Hard drives have moving parts. Servers have fans that cool them. These moving parts fail at an especially higher rate. At Twitter’s scale, multiple failures occur every day that might need attention from the hardware and software teams. Jeff Dean's presentation from a decade ago lists the probability of some of these events https://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
Companies like Twitter operate on timescales of months and years. Teams plan years in advance to ensure future user growth and revenue growth can be accommodated. Want to double user growth? Some systems will need wholesale redesigning to maintain scalability.
Want to ensure Twitter can handle that World Cup final peak traffic when the entire world is tweeting at the same time? Extra hardware capacity might need to be purchased, and you can’t really order this type of hardware off Amazon prime overnight delivery. Twitter added a brand new data center last year. This was a companywide effort that took ~three years.
Most of the time however, software engineers aren’t manually turning cranks to keep systems running. They strive to let systems run themselves, and aim to focus on adding features and fixing bugs.
With the massive reduction in headcount, very little of this medium term or long term work stands a chance of happening. If the number of users continues to grow, and the traffic into the site continues to grow (the World Cup will be an interesting case), Twitter will lack the staffing and specialized knowledge to add hardware capacity, respond to incidents, move traffic around, etc.
Complex systems such as Twitter are most in danger of breaking when new features are rolled out. Since the acquisition went into effect, Twitter has been in code freeze almost the entire time, which means that the risk of anything breaking has been relatively low. When you add a new feature, there could be many types of issues that can arise - functionality bugs, privacy bugs, scaling issues, reliability issues, etc.
The fact that the site is still running mostly fine is a testament to how resilient these systems were built to be. But it would be extremely daft to say that our jobs were unessential. We were constantly in the process of making our own jobs redundant so we could do more impactful things. We were constantly automating tasks that we used to do manually so that we could move on and do more impactful work.
There was much important work remaining to be done at Twitter to reach more users, protect their privacy, and give them a safe experience. I am so sad that so many wonderful teams were cruelly forced to leave this work behind. /fin
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.