Search this site:

A free press, if you can keep it: What natural language processing reveals about freedom of the press in Hong Kong

Giovanna Maria Dora Dore, Arya D. McCarthy, James A. Scharf
Edited by
This review was not published, as it was not approved by Computing Reviews' Category Editor. Still, I think it is worth sharing as part of my blog!

The British colony of Hong Kong was ceded back to Chinese sovereignty in 1997, after 150 years of occupation, under the “two systems, one country” principle. Given the Chinese is widely known in the West for its tight grip on its press, this short book takes as a goal to use computational analysis in the field of natural language processing to assess changes in press freedom over the 25 years of Chinese rule over Hong Kong.

The author states that, by doing a mixed-methods comparison of news published about the various Hong Kong protest movements that have surfaced since mainland China reclaimed control over it in English-based newspapers, considering both local and international sources, political shifts regarding freedom of press would surface. From the very beginning of the book, the authors explain that the Chinese authorities, even few years before the handover, can influence local media by co-opting media owners, setting norms of journalistic political correctness, and employing “strategic ambiguity” (this is, explicitly not laying out clear guidelines). To be able to perform this analysis over a corpus of 4522 news articles covering the 1998-2020 period, with news sources including two Hong Kong-based, English language newspaper, and six Western, English based news papers (two from the United Kingdom, four based in the USA).

The authors choose several Natural Language Processing (NLP) techniques for their analysis, both synchronic (comparing treatment of events occurring at the same time) and diachronic (comparing the nature of coverage over time). Some of the techniques used include topic modeling, comparison of lexical frequency, lexical usage, computational sentiment analysis, embedding neighborhood comparisons, Vicinato plots and Granger causality. The analysis is split in two major divisions, acknowledging their divide from the book’s index, devoting one of the five total chapters (and being they, after all, the medular chapters) for each: non-rhetorical tactics and rhetorical tactics.

Non-rhetorical tactics mostly presents the news item dataset, the set of major protest events it covers, provides a contextualized understanding of the volume and timing of articles published, and presents correlations between news coverage of protests and protests’ size. Rhetorical tactics refer more to the proper NLP treatment of the information, where semantically difference or separation, diachronic consistency of the discourse over each of the news sources, an analysis of how protests are framed by each of the sources, stylometric differences between news sources.

I found the book’s framing of Hong Kong history interesting. However, I have to admit I was expecting a book closer to Frazer Heritage’s «Language, gender and videogames: using corpora to analyse the representation of gender in fantasy videogames», for which Computing Reviews published my review on April this year. Although this book is less than half the length of Heritage’s, I found it quite harder to read. Although the book’s rhythm is amenable and the events it presents are interesting, I found it hard to relate to — even to finish. The book starts by its title setting some high expectation on results to be found, but never closes on any specific changes any of its methods was able to find. The most plausible explanation I can find to this is that Hong Kong press in colonial times was not very different — the text presents that the colonial government «had laws in place that gave sweeping power to control and punish news organizations when contents were deemed seditious and anti-government», with «as many as 30 laws that could be used to curb media freedom».

Besides this issue, my feeling is that the fact that the author chose to base the study using only two local English language newspaper constitutes a pre-selection bias; it is understandable that tools used to analyze linguistic data cannot work reliably across linguistic borders, and that the quality of automated translation tools is still too far from what this undertaking would need if Cantonese and Mandarin texts were to be imported. However, according to Wikipedia, Cantonese is the native language of 88.2% of the local population, with English at a very distant 4.6%. Wikipedia reports of 17 Chinese language and 7 English language newspapers (not detailing on which Chinese language they write in), so the choice of only two of them is problematic, and should be better explained.

All in all, the book presents an interesting recount of events in Hong Kong focused in the first 22 years of Chinese rule, and the narrative does follow the results of computer analysis based upon NLP tools, I cannot see this book as a geared towards Computer Science readers, nor very telling of the tools or results of the field. I believe the title to be misleading, as there are no conclusions leading the reader to believe that the Hong Kong has improved or diminished its press freedom since the sovereignty handover, nor that the journalists or the civil society are to thank (or to blame) for its current state.