Cross Lingual Network Analysis of Wikipedia

TLDR: A network importance analysis of wikipedia may reveal differences of perspective across regions

Individuals have different perspectives on events based on their background and common knowledge of a topic. Although on an individual basis one could talk to someone to learn where they stand this does not scale easily, particularly if one wishes to understand perspective across countries or regions.


Wikipedia is a user created and edited online encyclopedia. There are different versions created in over 250 languages. Based on editor surveys, the contributors to a language of Wikipedia tend to be regionalized. For example, most of the editors of the Italian Wikipedia are from Italy. It might then be possible to glean information about different perspectives on a topic across countries using Wikipedia.

Although one could do a textual analysis country by country this would prove difficult for several reasons. Translation and a NLP based topic analysis of the page would need to conducted. This is likely nontrivial. Further, the text of a given page can be subtly, or not so subtly, altered to skew perspective. This is a particular issue come election seasons when information may be inserted or removed from a politician's page. In the example above Indiana Representative Luke Messer's page was updated to emphasize his lobbying background and previously unsuccessful races. By doing a network analysis instead of a textual analysis many problems may be avoided.

By using hyperlinks between pages one can create a network. An importance analysis may be done to figure out which pages and topics are most relevant to a given topic. This approach addresses several of the problems with a textual analysis. A full page translation is not necessary; only the page title needs to be translated. Interwiki links on a page link to the same page in different languages. This makes the mapping something one can pull directly. Also, whereas individual lines of text are easy to manipulate, it is far more difficult to alter the global entire network structure of Wikipedia links.

Investigations of topics with this approach have suggested it can give insight. For the MH17 page copied above, the German Wikipedia emphasizes Russia's role in selling missiles. Contrasting this the concept of "Pilot in Command" and regulatory bodies are more important to the Russian version of the page.

The news articles most important and relevant to MH17 in each country based on Google searches echo the Wikipedia findings. The top German news articles emphasize Russia's role through its providing weapons. The top Russian articles emphasize that a no-fly zone had been set up and that the shot down plane was the only one to violate that no-fly zone. In other words, it was the fault of the airline for trying to save money with a more direct route and of the pilot in command for violating the space.

Analyses like this have been done on an article by article basis as well as in aggregate to see distributions of importance similarity between languages and clustering languages by topic importance similarity for a given event.

Top German Articles

Top Russian Articles