Connection reset. That is the famous response shown to web users in China when they are trying to access one of the over 2600 websites and searches that have been blocked by the authorities. But what is actually blocked and what is causing the connection disruption? Our research suggests that the Great Firewall (GFW) does not filter any incoming data, but only the outgoing traffic.

It is often said that the GFW imposes three types of censorship. The first one is DNS poisoning. For example, this happens when users in China attempt to access facebook.com. The name servers return an incorrect IP address which doesn't work. The browser tries to load the website for a certain period of time until the whole exercise 'times out'.

The second type of censorship is keyword-based. For example, if one tries to search on Google for facebook, freedom  or some of the other 250+ blocked searches, the connection is immediately reset. In addition, users are restricted from accessing any content on the same website for a minute or so.

The third type is supposed to be content based. Regardless of what was searched for or the address that was entered, if the content contains certain keywords, it is supposed to be blocked. This is where our findings become relevant. They suggest that the GFW doesn't interfere at all with the content that is sent back to the web user in China.

One way to illustrate this is to look at some sample blocked keywords, such as those listed in the table found in the top right on this page. The first example is facebook. Facebook.com is blocked in China, and so is the keyword facebook. This means that if you try to search in Google for facebook, your connection will be reset. The first result of that page is of course facebook.com. But if we search for a slightly different keyword - face book - the connection is not reset. The result is the same though - we still find facebook.com.

connection-reset.png
The error message displayed in China when trying to access blocked content.

The second example is the keyword . Why this keyword is blocked is a bit of a mystery - 王, or Wang, is one of the most common surnames in China. One possible explanation is 王府井, or Wangfujing, which is the main shopping street in Beijing and also the chosen location for the attempted Jasmine protests earlier this year. Another explanation is that Wang is bound to be the surname of many dissidents in China, purely because the name is so commonplace. Regardless, the first Google result when searching for 王 is the Wiktionary page for the word. Can we use another way to find the same result? We could enter the complex search term wiktionary wang "han character" and we would find the same Wiktionary page. Except in this instance the connection is not reset.

The third example is the keyword freedom, also blocked in China. One of the first results is the Wikipedia page for freedom. Again, if we alter the search slightly, and instead look for wikipedia free dom, the search is not blocked.

What does this all mean? For one, you could set up mirrors of blocked websites and, as long as they don't contain blocked keywords in the URLs or other data being sent to the server, they should be able to jump the GFW. An example of this could be Wikileaks. Their main website is blocked in China, but they have a large number of mirror websites, and our system currently contains three such mirror websites none of which are blocked (wikileaks.ch, wikileaks.l0cal.com, wikileaks.delfic.org).

Also, any GFW circumvention tool could focus solely on encrypting outgoing traffic, and returning all responses unmodified.

Do you have any findings suggesting otherwise? Please add your comments below!