Challenges, priorities, and progress in anti-censorship technology at Tor

Challenges, priorities, and progress in anti-censorship technology at Tor

Philipp Winter
August 27, 2020

This blog post seeks to bring clarity to the modus operandi of the Tor Project in the anti-censorship space by providing a summary of the challenges we face, the priorities we focus on, and the progress we have made so far related to our circumvention technology. Censorship circumvention is a complex and ever evolving problem, and this blog post summarizes our approach in tackling it. Please feel free to ask any related question in the comments. Thanks to hanneloresx’s translation, you can find a Chinese version of this blog post below.

Tor’s Anti-Censorship space

In February 2019, we hired two engineers to focus on and advance anti-censorship technology at Tor. The anti-censorship team also includes several other people in the Tor community who contribute designs, code, insight into past work, infrastructure, documentation, and resources. The goal of Tor’s anti-censorship team is to understand network censorship and build technology to circumvent it so that the Tor network can be accessible to everyone.

The state of circumvention

Some Internet Service Providers (ISP) block the domain www.torproject.org, making it difficult for their users to download a copy of Tor Browser.  Our service GetTor can help these users get Tor Browser despite this: simply send an email to gettor@torproject.org, which will automatically respond with alternative download links for Tor Browser. These download links point to GitHub, GitLab, the Internet Archive, and Google Drive. At least one of these hosting providers should be accessible to each of our users. For example, users from China can download Tor Browser from our GitHub mirror.
 
Once you have your copy of Tor Browser, you are ready to connect to the Tor network.  Unfortunately, some ISPs interfere yet again, aided by technology that either blocks the IP addresses of Tor relays and/or detects the Tor protocol dynamically, by inspecting network traffic that passes the ISP’s perimeterso-called deep packet inspection (DPI).
 
If you are unable to directly connect to the Tor network, you need to use bridges.  Bridges are unlisted Tor relays and, depending on the bridge type, can obfuscate network traffic in a way that’s more difficult for ISPs to detect.  The simplest method of censorship circumvention in Tor Browser is to use our default bridgesa set of a dozen bridges that are part of Tor Browser.  These bridges are essentially public, which is why more effective censorship systems such as China’s Great Firewall (GFW) block them, but they are still effective in many places like Iran.  Take a look at our Tor Browser manual to learn how to enable default bridges.
 
If you are unable to connect to our default bridges, you currently have three options:
  1. Use an obfs4 bridge.  You can request obfs4 bridges in three ways: directly in Tor Browser, by visiting bridges.torproject.org, or by sending an email to bridges@torproject.org.  More technical users can set up their own obfs4 bridge.  Unfortunately, many of the bridges you obtain this way may not work in China.  We are currently implementing a social bridge distribution system called Salmon, which will make it significantly harder for the GFW to block obfs4 bridges.  More on that below.
  2. Use Snowflake. Snowflake is currently only available in our Tor Browser alpha version but is on track to be part of Tor Browser stable. Our most recent changes added a new set of STUN servers, making Snowflake available in China and other places that block access to Google services. We are currently stress testing the system to handle more users as we move towards a stable release.
  3. Use meek-azure.  While meek-azure should work everywhere (including behind the GFW), it is overloaded and therefore slow.  Microsoft’s Azure CDN (which meek-azure is based on) is expensive, which is why we have to place a traffic cap on the meek-azure bridge.

What are our challenges?

Technological obstacles

A successful circumvention system consists of two components:
  • a network protocol (for example WebRTC, obfs4, or TLS) and
  • an endpoint to connect to (for example a Snowflake proxy, a CDN server for meek, or an obfs4 bridge).  
Both the protocol and the endpoint must resist detection.  The GFW is able to detect the obfs2 and obfs3 protocol on the wire, meaning that it can detect these protocols by simply looking at the bytes that cross the national perimeter.  The latest iteration in the obfs series, obfs4, still currently works in China in the sense that the GFW cannot (or chooses not to) block it by simply looking at bytes on the wire.  Unfortunately, obfs4 being unblocked is not enough.  One also needs an unblocked endpoint to connect, and this is where the trouble starts.  We currently use the service BridgeDB to hand out bridges to our users.  After the user solves a CAPTCHA, BridgeDB will return up to three bridges.  CAPTCHAs are not the only defense to prevent censors from learning all bridges since in the age of deep learning, CAPTCHAs represent but a small obstacle. Handing out bridges to users while preventing censors from learning them all remains a hard problem, but more on that later.  Fortunately, most censoring countries are limited in the amount of time, money, and talent that they can dedicate to blocking the Tor network, which is why BridgeDB is still effective in many places.

Resource limitations

We are limited in the amount of time and money that we can dedicate to circumvention technology, which means that we are unable to address all problems that we want (and should) address; as a result, we need to think carefully about how we can best spend the limited time we have. Consider, for example, that there are numerous trivial fixes that could make Tor available again in places like China.  Unfortunately, censors can often react just as swiftly and block Tor yet again, rendering such fixes a poor investment of our time.  The key is to develop technology that is asymmetrically more difficult for a censor to block.  A promising circumvention technology is one that takes us n hours to deploy and 2^n hours for a censor to block.  Needless to say, it’s not always clear which technology would work best in advance of deployment, which is why we seek to maximize our impact with limited resources by formulating and focusing on clear priorities, as explained below.

What are our priorities?

Our goal is to maximize the number of people that we help circumvent Internet censorship around the world.  Internet censorship is a moving target, which means that technologies that worked great five years ago find themselves blocked by many ISPs today.  This is why we need to constantly invest in new research and technology to be ahead of censors.
 
Over the last year or so, we invested heavily in Snowflake, which originally started as a research project. Over the next few months, we aim to land Snowflake in the stable Tor Browser version, marking a significant milestone. We have been slowly ramping up its use and now have over 6,000 volunteer proxies helping users circumvent censorship and providing moving targets that are difficult for censors to enumerate and block.
 
BridgeDB, our existing bridge distribution system, is showing its age. It’s heavily tailored towards a specific purpose, making it difficult to extend and generalize.  We therefore started working on a more flexible and lightweight reimplementation.  This reimplementation will provide the following benefits over the old BridgeDB:
  1. Implement a feedback loop that hands out bridges to censorship measurement platforms such as OONI and feeds the resulting reachability information back into bridge distribution.  That means that if a user from country X requests a bridge, we won’t give them a bridge that’s known to be blocked in country X.
  2. Our reimplementation will periodically test bridges with the help of bridgestrap so that we won’t be handing out bridges whose obfs4 port is firewalled, or that are otherwise broken.
  3. We are working on building the Salmon bridge distribution system, which will help with the problem of endpoint blocking.  Salmon was originally proposed in a research paper at PETS’17.  The idea is that users have a “reputation score” that goes down when one of their bridges is blocked and it goes up if their assigned bridges remain unblocked.  If a user’s reputation score gets too low, the user gets blocked, and if it gets high enough, the user can invite others to the system.  Take a look at this net4people thread for a crisp overview of Salmon.
We have a roadmap that we revisit every three months to identify a handful of short-term goals.  Take a look at our roadmap to see these goals.

How can you learn more or get involved?

Our lack of resources is our biggest challenge against powerful nation-state censors, so we are always looking for new contributors and collaborators.  If you would like to learn more or get involved, take a look at the following:
 
Tor’s anti-censorship team meets once a week, on Thursdays at 16:00 UTC in the #tor-meeting IRC channel.  Joining a weekly meeting is the best way to meet us and get started on the team. Don’t worry if you miss a meeting: we post our meeting logs to the tor-project mailing list.  Most of us are also available in the #tor-dev and the #tor-project IRC channels on irc.oftc.net, so feel free to reach out any time.  Every two weeks, we typically have a reading group right after our regular meeting.  We use these reading groups to discuss research papers or software projects.  Take a look at our meeting pad to learn what our next reading group is going to be about and feel free to join our discussion.
 
At the end of each month, we publish a monthly team report to the tor-project mailing list, which summarizes what we accomplished the month before.  For example, here’s our July 2020 report.
 
All of our software lives in the anti-censorship GitLab project.  If you’re interested in getting involved, take a look at issues that have the “First Contribution” label.
 
Finally, here is a list of past blog posts and presentations that provide more information:

Tor在规避审查领域上的挑战、策划、与进展

我们希望以这篇博文介绍Tor项目在规避审查领域面临的挑战、以及相关策划与进展,也借此交待Tor项目在规避审查领域的运作方式。审查规避是一个复杂且不断发展的科技领域,这篇博文总结了我们面对审查的策略。请大家在评论区提出任何意见和相关问题。

Tor的规避审查团队

在2019年2月份,我们聘请了两名工程师,专注于推进Tor的规避审查技术。我们的团队还包括Tor社区的其他一些成员,他们贡献了系统设计、代码、基础设施、说明文档 、对于已有系统的见解、和其他资源。 Tor规避审查团队的目标是了解网络审查制度,并及开拓技术来规避这些制度,让每个人都可以使用Tor网络。

审查规避的现状

一些互联网服务提供商(ISP)屏蔽了域名www.torproject.org,使得其用户很难下载Tor浏览器。在这种情况下,我们的服务GetTor 可以帮助这些用户获得Tor浏览器: 用户只需发送电子邮件到 gettor@torproject.org,它将自动回复一些Tor浏览器的代替下载链接。这些代替下载链接分别指向GitHub、GitLab、Internet Archive、和Google Drive。 每个用户应该至少可以访问其中一个主机提供商。例如, 来自中国的用户可以从我们的GitHub镜像下载Tor浏览器。

用户一旦有了Tor浏览器就可以准备连接到Tor网络。这时候,有些ISP会再次干扰,屏蔽Tor中继的IP地址和/或通过检查经过ISP外围的网络流量来动态检测Tor协议-所谓的深层数据包检测(DPI)。

如果您无法直接连接到Tor网络,您需要使用网桥。网桥是未公开的Tor中继,根据网桥类型不同,可以混淆网络流量,使ISP更难检测。在Tor浏览器中,最简单的规避审查的方法是使用我们的默认网桥–这是Tor浏览器中的十几个网桥。这些默认网桥基本上是公开的, 所以更高能的审查系统(如中国防火墙(GFW))会阻止它们,但默认网桥在伊朗等许多地方仍然有效。如果您需要了解如何启动默认网桥,请看我们的Tor浏览器手册

如果您无法连接到我们的默认网桥,您目前有三个选择:

  1. 使用 obfs4 网桥。 您可以通过三种方式获取 obfs4网桥:直接在Tor浏览器中请求,访问bridges.torproject.org,或者发送电子邮件到 bridges@torproject.org。更熟悉技术的用户可以 建立自己的obfs4网桥。不幸的是,许多通过这种方式获得的网桥可能无法在中国使用。我们目前正在 实施一个称为Salmon的社会化网桥分配系统,这将大大增加GFW封杀obfs4网桥的难度。 以下有更多对Salmon的介绍。

  2. 使用Snowflake。Snowflake目前只在Tor浏览器的alpha版本中使用,但计划成为Tor浏览器稳定版的一部分。我们对Snowflake最新的改善是增加了一组新的STUN服务器, 使Snowflake可以在中国和其他阻止访问Google服务的地方使用。当前,我们正在对Snowflake系统进行压力测试,以处理更多的用户,因为我们正走向稳定版本。

  3. 使用meek-azure。 虽然meek-azure理论上应该在任何地方都可以使用(包括在GFW以内),但它已超载运行,所以速度很慢。微软的Azure CDN(meek-azure是基于Azure CDN的)价格比较贵,所以我们必须在meek-azure网桥上设置流量上限。

我们面临的挑战是什么?

科技难度

一个成功的规避系统由两个部分组成:

  • 网络协议(例如WebRTC、obfs4、TLS),以及

  • 要连接到的端点(例如Snowflake代理、meek的CDN服务器、obfs4网桥)。

这两个部分,协议和端点都必须能够抵御检测。GFW能够检测线路上的obfs2和obfs3协议,也就是说,它可以通过简单地观察跨越国家边界的字节来检测这些协议。obfs系列的最新迭代–obfs4,目前在中国在某种意义上仍然可以使用,就是说GFW不能(或者说选择不) 仅仅通过观察线路上的字节来阻断它。不幸的是,obfs4没有被屏蔽是不够的。我们还需要一个未被屏蔽的端点来连接,这就是问题的开始。目前我们使用BridgeDB服务来给用户发布网桥。 在用户解决了验证码之后,BridgeDB会返回最多三个网桥。验证码并不是防止审查员获取大量网桥的唯一防线,因为在深度学习时代,验证码只是一个小小的障碍。在向用户发布网桥的同时防止审查员获取大量网桥仍是一个科技难点,但我们以后再深度聊这个话题。 幸运的是,大多数审查国家所能够投入封锁Tor网络的时间、金钱和人才都是有限的,所以BridgeDB在很多地方依然有效。

资源有限

我们团队的时间和资金是有限的,这意味着我们无法解决我们想要(和应该)解决的所有问题;因此,我们需要仔细考虑如何最好地利用我们有限的时间。比如,我们可以做很多琐细的修复和措施使Tor在例如中国的地域再次使用。但是,审查人员往往能迅速做出反应, 并再次封锁Tor,这使得盲目的修复很不划算。我们的关键是要创作使审查难度比开发难度大的非对称性技术。一个有前途的规避技术是,我们需要n个小时来部署,而审查员需要2^n个小时来阻止。毋庸置疑,在部署之前,我们不可能总清楚哪种技术最有效, 所以我们要通过明确的策划,以有限的资源发挥出最大的影响,如下文所述。

我们的策划是什么?

我们的目标是最大化全球我们帮助规避互联网审查的人数。互联网审查是一个移动的目标,这意味着五年前还很好用的技术,今天却被许多互联网服务供应商封锁了。这就是为什么我们需要不断投资于新的研究和技术,以领先于审查者。

在过去一年多时间里,我们在Snowflake,一个初为科研的项目,上投入了很多资源。在接下来的几个月中,我们计划将Snowflake加入Tor浏览器的稳定版本中。这会是一个重要的里程碑。我们在逐步扩大Snowflake的使用范围, 现在有6千多个志愿代理帮助用户规避审查并提供难以被审查员列举和阻止的移动目标。

BridgeDB,我们现有的网桥发布系统,已经显出了它的年龄。它是针对特定的目的而定制的,所以很难扩展和通用。因此,我们开始开发一个更灵活、更轻量级的下一代实施。 与旧的BridgeDB相比,下一代实施系统将提供以下优点:

  1. 实现一个反馈循环将网桥发布给OONI等审查测评平台,并将从其测评平台收到的可达性信息反馈给网桥分发系统。这意味着,如果一个来自X国的用户请求一个网桥,我们不会给他一个已知在X国被屏蔽的网桥。

  2. 借助bridgestrap定期测试网桥,这样我们就不会把有问题的网桥发布出去,例如obfs4端口有防火墙的网桥。

  3. 包含Salmon网桥发布系统,这将有助于解决端点屏蔽的问题。Salmon源于PETS’17的一篇研究论文。在Salmon系统里,每个用户有个“信誉分数“。当一个用户的网桥被屏蔽时,这个用户的信誉分数会下降, 而如果一个用户的网桥一直都没有被屏蔽,这个用户的信誉分数会上升。如果一个用户的信誉分数太低用户就会被封,而如果一个用户的信誉分数够高,用户就可以邀请其他人加入系统。 这篇net4people帖子对Salmon有很s清晰的概述。我们正在构建Salmon网桥发布系统。

我们有一个路线图,每三个月重新审视一次,确定一些短期目标。 请在此查看我们的路线图以了解这些目标。

如何了解更多或参与团队?

我们的资源缺短是我们面对强大的国家级审查机构的最大挑战,所以我们一直欢迎新的贡献者和合作者。如果您想了解更多或者参与我们的工作,请看以下内容:

Tor的规避审查团队每周开一次会,每周四下午16:00UTC,在#tor-meeting IRC频道。参加会议是认识我们或加入团队的最好方式。如果您错过了一次会议也不用担心: 我们会将会议记录发布到tor-project邮件列表。我们大多数人也经常在irc.oftc.net的#tor-dev和#tor-project IRC频道上,可以随时联系我们。 每两周,我们通常会在会议后举行一次阅读小组。 我们用这些阅读小组来讨论研究论文或软件项目。请看一下我们的会议记事本以了解我们下一个阅读小组的内容。欢迎您参入我们的讨论。

我们的代码在anti-censorship GitLab项目中。如果您有兴趣参与, 可以看看那些带有“First Contribution”标签的问题

以下为一些过去的博文和演讲,以供参考:

Link to original source



Categories: Tor Project

Leave a Reply