· Petr Korab · Text Networks  · 6 min read

Text Network Analysis: Theory and Practice

Text network analysis belongs to the broader skill set of most text data-oriented analysts.

Text network analysis belongs to the broader skill set of most text data-oriented analysts.

Introduction

Developments since the late 20th century, such as…

availability of data from social networks (e.g. Twitter), transcripts of politicians’ statements and central bankers’ meetings, publicly available APIs to text databases (Google Trends, RSS feeds, Wikipedia, Google Ngrams) general development of technologies to process big data …and potentially many other factors have resulted in a vast amount of text data easily accessible to analysts, students, and researchers. Over time, scientists developed numerous complex methods to understand the relations in the text datasets, including text network analysis.

This first article on text network analysis in Python will briefly survey the underpinnings of text network analysis, real-world applications of text networks, and their implementation in major data science and business intelligence (BI) software.

Text Network Analysis

In the academic literature, networks are more formally referred to as graphs. More rigorous theoretical propositions of graph theory can be traced back to the 1950s (Berge, 1958). Over time, text network literature has evolved into several streams:

semantic networks: modeling inter-connections of concepts, topics, and keywords (Netzer et al., 2012; Griffiths et al., 2007) graph neural networks: combining neural networks with network data structures (Liao et al., 2021; Krenn & Zeilinger, 2020; Yao et al.,2019) network visualization methods: proposing new methods of network visualization and graph discovery (Paranyushkin, 2019; Celardo & Everett, 2020) software development and algorithm implementation: see the examples in the third section below. This field has experienced a rapid increase in popularity among academic researchers. It is reflected in the volume of papers with the “network” keyword in the JSTOR database (link here) and the popularity of text networks and semantic networks in Google Books (available from here).

More formally, networks are comprised of two sets of objects (Ma & Seth, 2022):

• A node set: the “entities” in a graph

• An edge set: the record of “relationships” between the entities in the graph.

For example, if a node set n is comprised of elements:

Then, the edge set e would be represented as pairs of elements:

If we draw out a network, nodes are commonly represented as shapes, such as circles, while edges are the lines between the shapes. In text mining, edges and nodes might be represented by:

Social and professional networks: nodes - individual users, edges - “one user has decided to follow another” Identification of fake news in newspaper articles: nodes - most frequent and relevant words associated with fake news newspaper articles, edges - co-occurrence of words in articles (Segev, 2020) Discovering public discourse of US senators about impeachment: nodes -senators, edges - similarities in the senators’ public statements Understanding policy communication by analyzing written texts: node - concepts in written communication, edges - co-occurrence of concepts within a sentence or paragraph (Shim et al., 2015) Role of advocacy organizations in shaping conversation on social media: nodes - actors engaged in public conversation about an advocacy issue, edges - similarities in the content of their messages (Bail, 2016). Generally, a specified workflow used in network approaches starts with a definition of research questions and leads to inference and decision-making. Several steps can be omitted in a less complex task, but a complete workflow involves steps in Figure 2.

Press enter or click to view image in full size

Figure 2. Schematic representation of the workflow used in network approaches. Source: Borsoom et al., 2021. Image by draw.io An interesting area is a stream in the literature that uses text networks for forecasting. Graph structures are used here as a model whose weights can be optimized by a neural network and used to predict specific variables of interest. Krenn & Zeilinger (2020) use semantic networks and deep learning to predict research topics, which will be published in quantum physics in the next five years.

Network Techniques in Major Software

Network analysis methods are implemented in all major data science and BI tools. Here is the list of the most common libraries and commercial programs. Some of them are not primarily designed for text data analysis, but by feeding them with correctly transformed data, we can display the network structure of the text:

Python:

Network construction:

Networkx

Snap GraphiPy Network visualization:

Pyvis Nxviz Both network construction and visualization:

Textnets for Python

NetworKit Igraph Julia: GraphPlot MatrixNetworks EvolvingGraphs Networks EcologicalNetworks R: Textnets for R visNetwork (this list is not complete as I am not a frequent R user) As companies need to understand complex network data structures, most BI programs include network methods and graphics. See the tutorial by Data Surfers (2019) on network visualizations for Tableau and a list of network methods in Power BI. For other commercial and open-source software, we might opt for Infranodus (text network analysis), Neo4j (graph data science), Gephi (network visualization and exploration), or SocNetV (social network analysis).

Where do we move on?

This article is the first part of upcoming series on analyzing text networks in Python. Stay updated on the following pieces:

Text Network Analysis From Scratch #1 — Data Prep and Network Construction

Text Network Analysis from Scratch #2 — Make Beautiful Network Visualisations

Text Network Analysis from scratch #3 — Using Text Networks for Forecasting

Petr Korab is a Python Engineer and Founder of Text Mining Stories with over eight years of experience in Business Intelligence and NLP.

References

[1] Bail, A., C. 2016. Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media. Proceedings of the National Academy of Sciences, vol. 113, no. 42.

[2 ] Berge, C. 1958. Théorie des graphes et ses applications. Paris: Dunod Editeur.

[3] Borsboom, et al. 2021. Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers, vol. 1, no. 58.

[4] Celardo, L., Everett, M. G. 2020. Network text analysis: A two-way classification approach. International Journal of Information Management, vol. 51, April.

[5] Griffiths, T., L., Steyvers, M., Firl, A. 2007. Google and the Mind: Predicting Fluency With PageRank. Psychological Science, vol. 18, no. 12.

[6] Krenn, M., Zeilinger, A. 2020. Predicting research trends with semantic and neural networks with an application in quantum physics. Proceedings of the National Academy of Sciences, vol. 117, no. 4.

[7] Liao, W., Zeng,B., Liu, J., Wei, P., Cheng, X., Zhang, W. 2021. Multi-level graph neural network for text sentiment analysis. Computers & Electrical Engineering, vol. 92, June.

[8] Ma, E., Seth, M. 2022. Network Analysis Made Simple: An Introduction to Network Analysis and Applied Graph Theory using Python and NetworkX. Lean Publishing. 2022–0516 ed.

[9] Netzer, O., Feldman, R., Goldenberg, J., Fresko, M. 2012. Mine Your Own Business: Market-Structure Surveillance Through Text Mining. Marketing Science, vol., 31, no. 3.

[10] Paranyushkin, 2019. InfraNodus: Generating Insight Using Text Network Analysis. In Proceedings of WWW19: The Web Conference, May 13, 2019, San Francisco, USA.

[11] Segev, 2020. Textual network analysis: Detecting prevailing themes and biases in international news and social media. Sociology Compass, vol. 14, no. 4.

[12] Shim, J., Park, C., Wilding, M. 2015. Identifying policy frames through semantic network analysis: an examination of nuclear energy policy across six countries. Policy Sciences, vol. 48.

[13] The Data Surfers. 2019. How to use Gephi to create Network Visualizations for Tableau. Retrieved 2022531 from https://thedatasurfers.com/2019/08/27/how-to-use-gephi-to-create-network-visualizations-for-tableau/.

[14] Yao, L., Mao, C., Luo, Y. 2019. Graph Convolutional Networks for Text Classification. In Proceedings from The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Hawaii, USA.
Back to Blog

Related Posts

View All Posts »