Should Twitter Publish Its Ranking Algorithm?

Published

Apr 20, 2022

Reading time

3 min read

Dear friends,

Last week, Elon Musk launched a surprise attempt to acquire Twitter. The $43-billion bid was motivated, he said, by his desire to protect free speech endangered by the company’s practice of promoting some tweets while burying others. To that end, he proposes publishing the company’s ranking algorithm, the program that decides which tweets appear in a given user’s feed.

Social media companies generally keep their ranking algorithms secret. Let’s take a look at some pros and cons of letting people see what these companies are doing behind the scenes.

Why keep ranking algorithms secret?

Keeping the algorithm secret arguably makes it harder for scammers and spammers to manipulate its output. Security through obscurity can’t be the only defense, but it is one barrier. It’s true, open source software can be highly secure because public scrutiny reveals holes to be patched. But I think there’s a difference between defending traditional software from hackers and defending a ranking algorithm from statistical manipulation. Rather than probing a live website, which may alert the security team, attackers can repeatedly probe an offline copy of the algorithm to find message formats that it’s likely to promote.
Crucially, if the point is to enable people to understand how a learning algorithm works, then publishing it also requires publishing the data that drives it — the system’s behavior depends on both. But releasing Twitter’s data isn’t practical. One reason is the massive size of the dataset. Another is the company’s obligation to protect users’ privacy when the dataset presumably includes intimate details like user locations, interests, and times of use.
Even if both the code and the data were available, the algorithm’s behavior would still be very difficult to analyze due to the black-box nature of machine learning.
Proprietary algorithms confer a competitive advantage. Twitter developed its ranking algorithm at great time and expense, and it’s an important part of what differentiates the company from competitors. Publishing it would give rivals a leg up.

On the other hand, there are clear benefits to making ranking algorithms public.

Researchers and the broader public could gain more insight into how the algorithms work, spot problems, and evaluate the provider’s neutrality. Such scrutiny would put pressure on companies to improve flawed products and, if they were to do so, raise public confidence in their services.
Given the huge impact of these algorithms on millions of people — including, perhaps, influencing the outcomes of democratic elections — there’s a case to be made that citizens and governments alike deserve to know more about how they work.

Of course, overseeing ranking algorithms is only a small part of protecting free speech online. Some commentators panned Musk’s views on social media moderation as naive. Other social networks have been overrun by toxic communication, scams, and spam when they allowed people to post without restriction. Former Reddit CEO Yishan Wong offered insights into the difficulty of moderating social network posts in a widely read tweet storm.

Twitter has been a valuable place for the AI community to share knowledge and perspectives, and I have deep respect for Parag Agrawal and Jack Dorsey, the current and former CEOs of Twitter, who have kept their product successful through difficult changes in social media. I also applaud its ML Ethics, Transparency and Accountability team for its insightful studies. Nonetheless, Twitter has been criticized for its business performance, which has created an opening for corporate raiders like Musk and private equity firms.

Whether or not Musk’s bid is successful, the question remains: Would society be better off if internet companies were to publish their ranking algorithms? This is a complicated question that deserves more than simplistic statements about freedom of speech. My gut says “yes,” and I believe the benefit of even the partial transparency afforded by publishing the code (but not the data) would outweigh the harm. Having said that, how to secure such open-source learning algorithms, and whether demanding disclosure is fair considering the huge investment it takes to develop this intellectual property, requires careful thought.

Keep learning!

Andrew

Subscribe to The Batch