Identifying influential nodes in complex networks using a gravity model based on the H-index method

Zhu, Siqi; Zhan, Jie; Li, Xing

doi:10.1038/s41598-023-43585-x

Download PDF

Article
Open access
Published: 29 September 2023

Identifying influential nodes in complex networks using a gravity model based on the H-index method

Siqi Zhu¹,
Jie Zhan¹ &
Xing Li¹

Scientific Reports volume 13, Article number: 16404 (2023) Cite this article

1074 Accesses
Metrics details

Subjects

Abstract

Identifying influential spreaders in complex networks is a widely discussed topic in the field of network science. Numerous methods have been proposed to rank key nodes in the network, and while gravity-based models often perform well, most existing gravity-based methods either rely on node degree, k-shell values, or a combination of both to differentiate node importance without considering the overall impact of neighboring nodes. Relying solely on a node's individual characteristics to identify influential spreaders has proven to be insufficient. To address this issue, we propose a new gravity centrality method called HVGC, based on the H-index. Our approach considers the impact of neighboring nodes, path information between nodes, and the positional information of nodes within the network. Additionally, it is better able to identify nodes with smaller k-shell values that act as bridges between different parts of the network, making it a more reasonable measure compared to previous gravity centrality methods. We conducted several experiments on 10 real networks and observed that our method outperformed previously proposed methods in evaluating the importance of nodes in complex networks.

Identifying influential spreaders in complex networks by an improved gravity model

Article Open access 12 November 2021

Integrating local and global information to identify influential nodes in complex networks

Article Open access 14 July 2023

Identifying influential spreaders by gravity model considering multi-characteristics of nodes

Article Open access 14 June 2022

Introduction

Complex networks are a pervasive presence in various domains of both human society and the natural world. In each system, individuals and their relationships can be represented as networks consisting of nodes and edges^1,2. Recently, the identification of significant nodes in complex networks has gained significant attention from researchers, providing a new perspective for understanding the objective world and facilitating a better comprehension of the spread of diseases^3,4,5, power grid protection⁶, information dissemination^7,8,9, protein discovery¹⁰, and immunization strategies^11,12, among other fields^13,14,15.

To date, numerous centrality methods have been proposed to detect key nodes in complex networks. Centrality measurement methods can be primarily categorised into three types: local indices, global indices, and hybrid indices¹⁶. Local-index-based centrality methods include classical measures such as degree centrality¹⁷ (DC) and H-index¹⁸. Local-index-based methods have low computational complexity and are suitable for large-scale networks as they only consider the local neighbourhood information of nodes. However, their ability to identify influential nodes that are not central but have high impact is limited. To address this limitation, many researchers have proposed improvements, such as extended H-index centrality¹⁹ (EHC) and local clustering H-index centrality²⁰ (LCH) methods. Global-index-based centrality methods assess individuals' influence by considering the global structural information of the network, such as closeness centrality²¹ (CC) and betweenness centrality²² (BC). The main drawbacks of these measurement methods are their high computational complexity and inapplicability to large-scale networks²³. Among them, the K-shell decomposition method²⁴ (KS), as a global approach, determines the influence of nodes by differentiating their core levels and operates at a faster speed. However, the main limitation of k-shell is that it assigns the same k-shell value to many nodes, resulting in low differentiation in node influence ranking. Many efforts have been made to address this issue, such as extended neighbourhood coreness²⁵ (CNC+), classifying neighbourhood²⁶ (CN), k-shell iteration factor²⁷ (KSIF), and Mixed Degree Decomposition²⁸ (MDD). The primary limitations of these global methods are their typically high computational costs as they consider the entire topological structure of the network. Hybrid-index-based centrality methods, such as local and global influence²⁹ (LGI), local and global centrality³⁰ (LGC) and global and local information³¹ (GLI) integrate both local and global information about nodes, aiming to strike a balance between algorithm accuracy and computational complexity.

The gravity model not only considers the attributes held by two nodes but also takes into account the shortest path information between nodes, which represents their mutual interactions and provides a basis for integrating local and global information. Inspired by this formula, Ma et al.³² proposed two models (G and G+) based on the gravity formula. These models adopt the k-shell value of a node as its mass and use the shortest path distance between two nodes as the distance. Building upon this, Wang et al.³³ improved the model by considering the degree values of neighbouring nodes, resulting in the improved gravity centrality (IGC). Li et al.³⁴ introduced the gravity model (GM), which employs the degree of nodes as their mass, and developed the local gravity model (LGM), which only considers node pairs within a truncated radius. Furthermore, Li et al.³⁵ combined the local clustering coefficient and degree value as the mass of nodes, proposing the generalized gravity centrality (GGC). In addition, Yang et al.³⁶ introduced a gravity centrality (KSGC) based on the K-shell value of nodes, considering the variations in interactions when nodes are located in different shell layers. Li et al.³⁷ combined the k-shell value and k-shell iteration factor as the mass of nodes, presenting the DK-based gravity model (DKGM) to enhance the model's performance. Subsequently, they considered multiple features of nodes and proposed the multi-characteristics gravity model³⁸ (MCGM). Liu et al.³⁹ introduced the spreading entropy gravity Model (SEGM), incorporating the spreading information entropy of nodes into consideration.

From the above, we can observe that many of the gravity models mentioned are either based on node degree, related to the k-shell value, or a combination of both. However, It is not enough to evaluate the importance of a node solely on the basis of its single attributes; it is also necessary to consider the location of the node and the overall influence of neighbouring nodes on it. For instance, some nodes may have a relatively small k-shell index but possess significant influence since they act as bridges connecting different communities within the network. Similarly, there are nodes with lower degree or k-shell values compared to others but are closer to the most important nodes in the network, surrounded by highly influential nodes, as a result, their importance will also be enhanced. To address this issue, we propose the H-index-based gravity centrality method (HVGC), which not only considers the path information of nodes but also incorporates the overall influence of neighbouring nodes, structural hole position information of nodes, and the differential gravitational impact of nodes positioned at different locations. Experimental results demonstrate that our proposed method exhibits significant competitiveness compared to other advanced gravity models, Particularly in networks with evident community structures, it exhibits outstanding accuracy, unlike other algorithms that are prone to identifying false core nodes.

Preliminaries

Centrality measures

In the context of an undirected and unweighted simple network $G \, = < V, \, E >$,$V$ and $E$ respectively represent the sets of nodes and links. The cardinality of $V$ and $E$ can be expressed as $\left| V \right| = N$ and $\left| E \right| = M$, indicating the presence of $N$ nodes and $M$ links within the network. The network's connectivity structure is typically captured by its adjacency matrix $A = (a_{ij} )_{N \times N}$, where $a_{ij} = 1$ if node $i$ and node $j$ are linked, and 0 otherwise.

Degree centrality¹⁷ of node $i$ is defined as

$$ DC(i) \, = \, k(i), $$

(1)

where $k(i) = \sum\limits_{j = 1}^{N} {a_{ij} } .$

The maximum integer fulfilling that there are at least $H(i)$ neighbors of node $i$ whose degrees are all at least $H(i)$, represented by $H(i)$, is known as the H-index¹⁸ of the node $i$.

The k-shell decomposition method²⁴(KS), operates through an iterative process of decomposing the network into distinct shells. Initially, KS removes nodes with a degree of 1 from the network, resulting in a decrease in the degree values of the remaining nodes. This process is repeated by removing nodes with residual degrees less than or equal to 1 until all remaining nodes have residual degrees greater than 1. The nodes removed in the first step constitute the 1-shell, and their k-shell values are assigned as 1. This process is then iteratively applied to obtain the 2-shell, 3-shell, and so on. The decomposition process continues until all nodes in the network have been accounted for.

Gravity centrality³² (G) of node $i$ is defined as

$$ G(i) = \sum\limits_{{j \in \psi_{i} }} {\frac{{k_{s} (i)k_{s} (j)}}{{d^{2} (i,j)}}} , $$

(2)

where $k_{s} (i)$ is the k-shell value of node $i$, $d(i,j)$ is the shortest path distance from node $i$ to node $j$, and $\psi_{i}$ is the set of nodes whose distance from node $i$ does not exceed 3.

Extended gravity centrality³² (G+) of node $i$ is described as

$$ G_{ + } (i) = \sum\limits_{{j \in \Lambda_{i} }} G (j), $$

(3)

$\Lambda_{i}$ is the nearest neighborhood of node $i$.

The improved gravity centrality³³ (IGC) of node $i$ is measured by

$$ IGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{{k_{s} (i)k(j)}}{{d^{2} (i,j)}}} , $$

(4)

where $R$ is the truncation radius, and the optimal truncation radius $R^{*}$ can be estimated by

$$ R^{*} \approx \frac{1}{2}\langle d\rangle , $$

(5)

where $\langle d\rangle$ is the average distance of the network.

Extended improved gravity centrality³³ (IGG+) of node $i$ is described as

$$ IGC_{ + } (i) = \sum\limits_{{j \in \Lambda_{i} }} {IGC} (j), $$

(6)

$\Lambda_{i}$ is the nearest neighborhood of node $i$.

The local gravity model³⁴ (LGM) of node $i$ is determined by

$$ LGM(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{k(i)k(j)}{{d^{2} (i,j)}}} , $$

(7)

The generalized gravity centrality³⁵ (GGC) of node $i$ is defined as

$$ GGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{{S_{p} (i)S_{p} (j)}}{{d^{2} (i,j)}}} , $$

(8)

$$ S_{p} (i) = {\text{e}}^{{ - \alpha C_{i} }} \times k(i) $$

(9)

$$ C_{i} = \frac{{2n_{i} }}{k(i)(k(i) - 1)} $$

(10)

where $C_{i}$ is the local clustering coefficient of node $i$, $n_{i}$ denotes the number of edges between neighbors of node $i$, and $\alpha = 2$.

The k-shell based on gravity centrality³⁶ (KSGC) is defined as

$$ KSGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {F(i,j)} , $$

(11)

$$ F(i,j) = c_{ij} \frac{k(i)k(j)}{{d^{2} (i,j)}}, $$

(12)

$$ c_{ij} = e^{{\frac{ks(i) - ks(j)}{{ks_{\max } - ks_{\min } }}}} $$

(13)

where $c_{ij}$ is the coefficient of attraction exerted by node $i$ on node $j$, $k_{s} (i)$ and $k_{s} (j)$ denote the k-shell values of node $i$ and node $j$, respectively. $ks_{\max }$ and $ks_{\min }$ refer to the largest and smallest k-shell values present in the network. $d(i,j)$ is the shortest path distance from node $i$ to node $j$.

The DK-based gravity model³⁷ (DKGM) is measured by

$$ DKGM(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{DK(i)DK(j)}{{d^{2} (i,j)}}} , $$

(14)

$$ DK(i) = k(i) + k_{s}^{*} (i), $$

(15)

$$ k_{s}^{*} (i) = k_{s} (i) + \frac{p(i)}{{q(k) + 1}}, $$

(16)

assume that the value of the k-shell of node $i$ is $k_{s} (i).$ For the process of the k-degree iteration, the total iteration number is $q(k)$, and node $i$ is removed in the $p(i)$ iteration of the k-degree process. $k_{s}^{*} (i)$ is called the improved k-shell index of node $i$.

The multi-characteristics gravity model³⁸ (MCGM) is measured by

$$ {\text{MCGM}} (i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{{\left( {\frac{k(i)}{{k_{\max } }} + \frac{{\alpha k_{s} (i)}}{{k_{s\max } }} + \frac{x(i)}{{x_{\max } }}} \right)\left( {\frac{k(j)}{{k_{\max } }} + \frac{{\alpha k_{s} (j)}}{{k_{s\max } }} + \frac{x(j)}{{x_{\max } }}} \right)}}{{d^{2} (i,j)}}} , $$

(17)

$$ \alpha = \frac{{\max \{ \frac{{k_{{{\text{mi}} d}} }}{{k_{\max } }},\frac{{x_{{{\text{mi}} d}} }}{{x_{\max } }}\} }}{{\frac{{k_{{s{\text{mi}} d}} }}{{k_{s\max } }}}}, $$

(18)

where $k_{mid}$, $k_{smid}$ and $x_{mid}$ denote the median of degree value, k-shell value and eigenvector centrality value, respectively. $k_{\max }$, $k_{s\max }$ and $x_{\max }$ denote the maximum values of degree value, k-shell value, and eigenvector centrality value.

The entropy-based gravity model³⁹ (SEGM) is defined as

$$ SEGM(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{SE(i)SE(j)}{{d^{2} (i,j)}}} , $$

(19)

$$ SE(i) = e^{E(i)} k(i), $$

(20)

$$ E(i) = - \sum\limits_{j \in \Gamma (i)} {I(j)\ln I(j)} , $$

(21)

$$ I(i) = \frac{k(i)}{{\sum\limits_{j \in \Gamma (i)} {k(j)} }}, $$

(22)

where $E(i)$ is the information entropy of node $i$, $\Gamma (i)$ represents the set of neighboring nodes of node $i$,and $I(i)$ is the importance of node $i$.

The SIR model used in this paper

To evaluate the ranking of impact generated by the algorithm and the simulation, we employed the widely used SIR model⁴⁰. In the beginning, a single node in the network, referred to as the "source node," is in the infected state (I), while the remaining nodes are in the susceptible state (S). An infected node has the potential to infect its susceptible neighbors with a probability of $\beta$, and the probability of each infected node entering the recovery (R) state is $\lambda$, after which it ceases to participate in the dynamics. This propagation process continues until no infected nodes remain in the network. The impact of any given node $i$ can be estimated by

$$ F(i) = N_{r} /N $$

(23)

the number of nodes that recover after the diffusion process has stabilized is represented by $N_{r}$. For the sake of simplicity,$\lambda$ has been set to 1. Subsequently, the corresponding epidemic threshold⁴¹ can be computed by

$$ \beta_{c} \approx \frac{\langle k\rangle }{{\langle k^{2} \rangle - \langle k\rangle }} $$

(24)

where $\langle k\rangle$ and $\langle k^{2} \rangle$ are the degree distribution's average degree and second-order moments.

Measures

Kendall’s tau coefficient

Kendall's tau coefficient⁴² is a measure of correlation between two sequences, with a larger value indicating a greater similarity between the sequences. The definition of Kendall's tau coefficient is as follows: given two sequences $X$ and $Y$ of the same length, where the $i$ th values are represented by $x_{i}$ and $y_{i}$, respectively. Let each pair of elements $x_{i}$ and $y_{i}$ form a set, denoted by $(x_{i} ,y_{i} )$. If $x_{i} > x_{j}$ and $y_{i} > y_{j}$, or $x_{i} < x_{j}$ and $y_{i} < y_{j}$, the pairs $(x_{i} ,y_{i} )$ and $(x_{j} ,y_{j} )$ are considered concordant. They are considered discordant if $x_{i} > x_{j}$ and $y_{i} < y_{j}$, or $x_{i} < x_{j}$ and $y_{i} > y_{j}$. If $x_{i} = x_{j}$ and $y_{i} = y_{j}$, the pair is neither concordant nor discordant. Therefore, the Kendall's tau coefficient τ is defined as

$$ \tau = \frac{{2(n_{ + } - n_{ - } )}}{N(N - 1)} $$

(25)

where $n_{ + }$ is the number of concordant pairs, and $n_{ - }$ is the number of discordant pairs.

Jaccard similarity coefficient

In some applications, concentrating on the top-rank nodes rather than all nodes may be appropriate. In contrast to the Kendall correlation coefficient, the Jaccard similarity coefficient is utilized to assess the similarity between the top-k nodes in two ranking lists^25,43. The Jaccard similarity is calculated by dividing the number of common nodes by the number of unique nodes in the two lists, and its expression is

$$ Jaccard(X,Y) = \frac{|X \cap Y|}{{|X \cup Y|}} $$

(26)

where $X$ and $Y$ represent the top-k nodes with the highest influence as determined by two different methods. In the context of our experiments, $X$ represents the top-k nodes identified by HvGC and other baseline methods, while $Y$ represents the top-k nodes obtained through the SIR simulation. We use the Jaccard similarity coefficient to measure the similarity between these two sets of top-k nodes. The Jaccard similarity coefficient ranges from 0 to 1, where a higher value indicates a greater degree of similarity between the two ranking results. A Jaccard similarity coefficient of 0 indicates completely distinct results, while a value of 1 indicates that the two sets of top-k nodes are identical.

The monotonicity index

The monotonicity²⁵ $M$ is used to quantitatively measure the resolution of different indices in ranking list $X$, and can be calculated by

$$ M(X) = \left[ {1 - \frac{{\sum\limits_{c \in V} {N_{c} } \left( {N_{c} - 1} \right)}}{N(N - 1)}} \right]^{2} $$

(27)

where $N$ is the size of network, and $N_{c}$ is the number of nodes with the same index value $c$.

Results

Algorithms

Previous research has utilized the gravity model approach to analyze node importance in complex networks. Degree and k-shell values are commonly used metrics to consider the number of neighbors a node has and its position within the network, respectively. However, these metrics alone do not capture the overall influence of a node's neighbors. While the H-index considers the importance of a node's neighbors, it may overlook certain information from neighboring nodes, failing to account for the collective impact of all neighbors. We take the toy network shown in Fig. 1 to illustrate the problem for H-index, where the node spreading capacity derived from 1000 independent runs of the SIR model has been numerically labeled in Fig. 1. Obviously, $H(1) = H(2) = H(3) = H(4) = H(10) = 1$, $H(5) = H(7) = H(8) = 3$,$H(6) = H(9) = 2$, where $H(i)$ represents the H-index of node $i$. The H-index always assigns the same value to different nodes, which leads to a lack of excellence in the ability to differentiate the influence of nodes.

The same issue exists in DC¹⁷ and KS²⁴. Additionally, from Fig. 1, it can be observed that Node 3 has a higher propagation capability compared to Node 9, but Node 3 has a lower H-index than Node 9. This indicates that the H-index overlooks some information from the neighbors of a node. From this, we take out all neighboring nodes in the set of neighbors of node $i$ with degree values greater than or equal to $H(i)$ and add up the degree values of these nodes to measure the overall influence of the neighboring nodes on node $i$. The value obtained is denoted as $HV(i)$, and the expression is

$$ HV(i) = \sum\limits_{{j \in \Lambda_{i} }} {\left[ {k(j)|k(j) \ge H(i)} \right]} , $$

(28)

where $\Lambda_{i}$ is the nearest neighborhood of node $i$,$H(i)$ represents the H-index of node $i$.

By incorporating the overall influence of node neighbors into the definition, it enhances the discriminative power of node identification compared to the H-index. However, it is still insufficient to accurately distinguish cluster-like nodes, due to their close connections, these nodes can more easily achieve greater HV values, but, their actual influence may not be greater than that of nodes with lower HV values, As shown in Fig. 1. $HV(6) = 8$,$HV(9) = 7$,$HV(3) = 4$, and the actual propagation capacity from high to low is nodes 3, 9, and 6, a similar problem with the k-shell approach was noted by Liu et al.⁴⁴ In other words, removing node 3 from the network would result in nodes 1, 2, and 4 losing their interactions with the core nodes, while removing node 6 has a minimal impact on information transmission in the network. This finding demonstrates the higher importance of nodes that serve as bridges between different clusters compared to those within individual clusters.

Based on this, we considered the structural hole position of nodes to enhance the algorithm's ability to identify nodes within community networks. This allows us to identify those bridge nodes that may not have high HV values but play a crucial role in facilitating information flow across different parts of the network. The network constraint coefficient measures the level of constraints imposed on nodes forming a structural hole (SH) in a network⁴⁵, and it can be calculated as follows:

$$ c(i) = \sum\limits_{j \in \Gamma (i)} {\left( {p_{ij} + \sum\limits_{w \in \Gamma (j) \cap \Gamma (i)} {p_{iw} } \cdot p_{wj} } \right)^{2} } , $$

(29)

$$ p_{ij} = \frac{{z_{ij} }}{{\sum\limits_{w \in \Gamma (i)} {z_{iw} } }}, $$

(30)

where $\Gamma (i)$ represents the set of neighboring nodes of node $i$, and $w \in \Gamma (i) \cap \Gamma (j)$ indicates the nodes that are common neighbors of both node $i$ and node $j$. $p_{ij}$ represents the proportion of energy invested by node $i$ to maintain its relationship with node $j$. where $z_{ij} = 1 \, (i \ne j)$ if there is a link between nodes $i$ and $j$, otherwise $z_{ij} = 0$. Based on the above discussions, the gravity centrality based on the H-index (HVGC) measure proposed in this paper is defined as follows:

$$ HVGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {e^{ - c(i)} \frac{Hv(i)Hv(j)}{{d^{2} (i,j)}}} , $$

(31)

where $c(i)$ represents the structural hole constraint coefficient in Eq. (29). A smaller value of $c(i)$ indicates that the node occupies more structural holes and has a stronger ability to bridge different parts of the network. Finally, the metrics, including HVGC, H-index, HV, DC, and KS, were computed for each node in the toy network and compared with the node's spreading capability (SC). The results are presented in Table 1, revealing that HVGC achieves a nearly identical ranking to SC, indicating excellent performance. The algorithmic description of the HVGC is provided in Algorithm 1.

Table 1 The ranking results of SIR, DC, KS, H-index, HV, and HVGC on the toy network.

Full size table

In addition, Fig. 2 depicts a network with a clear community structure, where the four nodes with the strongest propagation capabilities are marked in green. The propagation capabilities of these nodes were determined through 1000 independent experiments using the SIR model. We compared HVGC with other gravity model-based methods in identifying the top 5 nodes in this network, and the results are presented in Table 2.

Table 2 Comparison of the rankings of the top-5 nodes identified by different methods and the rankings based on the SIR propagation ability in the sample network.

Full size table

Data description

This paper evaluates the efficacy of HVGC by analyzing ten real networks from six distinct domains, including a transportation network(USAir⁴⁶), an infrastructure network (Power⁴⁷), a communication network (Email⁴⁸), a technology network (Router⁴⁹), two collaborative networks (Jazz⁵⁰and NS⁵¹), and four social networks (Facebook⁵², PB⁵³, WV⁵⁴, and Sex⁵⁵). Table 3 presents the fundamental topological properties of these networks. $N$ represents the number of nodes in the network, and $M$ represents the number of links. The average degree of nodes is denoted as $\langle k\rangle$, and the average distance between pairs of nodes is denoted as $\langle d\rangle$. The clustering coefficient⁴⁷ of the network is denoted by $C$, while $r$ represents the assortative coefficient⁵⁶. The degree heterogeneity⁵⁷ of the network is denoted by $H$. Additionally, $\beta_{c}$ represents the epidemic threshold⁵⁸ of the SIR model⁴⁰ used to simulate the diffusion process.

Table 3 The topological features of ten real networks.

Full size table

Empirical results

Based on the aforementioned real network, we conducted simulations and compared the influence rankings of various algorithms utilizing the SIR model. In order to ensure the credibility of our findings and the standard ranking of nodes' influence, we conducted 1000 independent experiments for each given network and transmission probability $\beta$, with any one node being chosen as the seed node once during each run. The processor and runtime environment used for the calculations are i7-12700H and Python 3. The development platform used for this paper is Anaconda 3, and the code was executed in Jupyter Notebook. Kendall's tau ($\tau$) was utilized to evaluate the accuracy of the algorithms, with a higher value indicating a greater correlation between the observed sequences and an improved algorithm performance. Table 4 provides a comparison of the accuracy of the proposed algorithm (HVGC) and ten benchmark algorithms, which include degree centrality¹⁷ (DC), k-shell decomposition method²⁴ (KS), the extended version of gravity centrality³² (G+), extended version of improved gravity centrality³³ (IGC+), local gravity model³⁴ (LGM), generalized gravity centrality³⁵ (GGC), the improved gravitational centrality based on k-shell values³⁶ (KSGC), the DK-based gravity model³⁷ (DKGM), multi-characteristics gravity model³⁸ (MCGM), and entropy-based gravity model³⁹ (SEGM).Additionally, Fig. 3 displays the accuracy of the different algorithms for varying values of $\beta$, within the range of $0.5\beta_{c}$ to $1.5\beta_{c}$.

Table 4 The algorithms’ accuracies for $\beta = \beta_{c}$, measured by the Kendall’s Tau (τ).

Full size table

According to Table 4, the methods that utilise the gravitational formula (G+, IGC+, LGM, GGC, KSGC, DKGM, MCGM, SEGM, and HVGC) exhibit significant advantages over classical methods (DC and KS). These advantages are especially prominent in the Power, Router, NS, and Sex networks. Furthermore, it is noteworthy that among all gravity-based algorithms tested on the ten networks, HVGC exhibited the best overall performance. Its Kendall coefficient ranked first in six out of ten networks, with a remarkable 70% proportion being in the top two ranks. Specifically, HVGC ranked first in the Jazz, email, Facebook, PB, WV, and USAir networks and second in the Router network. Additionally, as shown in Fig. 3, when $\beta = \beta_{c}$, although HVGC did not perform best in the NS, Power, and Sex networks, as $\beta$ increases, its performance becomes very close to or even surpasses the previous best-performing algorithm. Taking into account HVGC's superior performance in community-type networks discussed earlier, it demonstrates a stronger overall performance, affirming the robustness of our findings. Furthermore, Fig. 4 displays the optimal truncation radius of HVGC in the ten real networks, revealing that the majority of networks concentrate their optimal truncation radius at $R = 1$. This indicates that HVGC achieves remarkably high accuracy by considering only the influence of the first-order neighbouring nodes of a node, while most other gravity model methods require considering information from second- or third-order neighbouring nodes. In other words, HVGC achieves a high level of accuracy while incurring lower time costs.

Discussion

This paper introduces a novel method called HVGC for identifying influential nodes in a network. While the original gravity model considered both neighbourhood and path information, this new method enhances the existing gravity centrality approaches by taking into account the overall influence of a node's neighbourhood, considering the structural hole position of nodes, and incorporating the differences in interactions between nodes. This method addresses the limitations of existing gravity centrality methods and strengthens the ability to identify important nodes in networks with clear community structures. Therefore, this approach demonstrates a high level of comprehensive performance. We conducted an analysis of the SIR dynamic propagation process in 10 real networks to compare the performance of HVGC with previous state-of-the-art methods. The results, as shown in Table 4, indicate the strong competitiveness of our method.

In certain scenarios, it is necessary to identify the top-k influential nodes for controlling information propagation. Therefore, in addition to evaluating the different ranking methods for individual nodes, we also assessed their performance in identifying the top-k influential spreaders. In other words, we compared the ranked lists of node influence obtained from the ranking methods with the ranked lists of node influence obtained from the SIR simulation, both sorted in descending order. Subsequently, we analysed the similarity between the two lists by considering the top-k nodes. Figure 5 illustrates the results of the Jaccard coefficient for identifying the top-k influential spreaders, ranging from 5 to 100 with a step size of 5. The X -axis shows the number of top influential spreaders, and the Y -axis shows the Jaccard similarity coefficients.

We can observe that, except for the Sex, Power, PB, and Router networks, HVGC exhibits the best and most stable overall performance in identifying the top-k influential spreaders in other networks. Specifically, across all networks, as the number of selected top-k nodes increases, HVGC consistently maintains a high-level or steadily increasing Jaccard coefficient, while other methods display varying degrees of fluctuations. Furthermore, we provide detailed plots for the top-25 nodes, revealing that HVGC consistently ranks among the top three in identifying the top-25 influential spreaders and, in some cases, even secures the first position, except for the Sex network. Therefore, we can conclude that HVGC not only accurately ranks the influence of all nodes in the network but also successfully identifies the top-k nodes with the highest impact.

After applying monotonicity²⁵, we assessed the resolution of various algorithms. Table 5 illustrates that HVGC and MCGM demonstrate similar performance in terms of monotonicity. However, HVGC excels in the majority of networks by solely considering the first-order neighbour information of nodes, whereas MCGM, even with the inclusion of second-order neighbour information, does not necessarily outperform HVGC and incurs higher computational complexity. Furthermore, HVGC demonstrates significantly better performance in identifying important nodes in networks with community structure compared to MCGM. Therefore, overall, HVGC surpasses other gravity model algorithms. Based on the results presented in Table 5, HVGC consistently ranks either at the top or very close to the best-performing algorithm in terms of monotonicity.

Table 5 Monotonicity of the various algorithms is observed, with the best algorithm for each network highlighted in bold.

Full size table

Based on the above discussion, it is evident that centrality based on the gravitational model is more accurate than classical centrality. However, many of these models tend to identify false core nodes in the network and do not take into account the influence of neighbouring nodes. In our proposed HVGC (H-index-based Gravity Centrality), we address this limitation by comprehensively considering the overall impact of a node's neighbours and its position within the network's structural holes. This approach effectively overcomes the drawbacks of gravity-based methods and demonstrates superior performance compared to other algorithms.

Despite the excellent performance exhibited by HVGC, it shares a common limitation with other gravity-based methods, namely the need to determine the optimal truncation radius $R$. However, this disadvantage is mitigated by the fact that most real networks exhibit small-world characteristics^47,59, and the optimal truncation radius is approximately linearly related to the average distance³⁴. Furthermore, since HVGC is derived from the domain centrality method, even considering only the first-order neighbor nodes in the ten real networks studied can lead to very high performance and accurate results.

In conclusion, while HVGC demonstrates better overall performance compared to other gravity-based methods and introduces improvements to existing gravity models, there are still areas that require further refinement. For example, the current approach does not consider the influence of weight factors associated with different indicators. Instead, it directly operates on the indicator values of the nodes. The weights of HV and the structural hole constraint coefficient $c(i)$ in the computation process may affect the accuracy of the algorithm. In networks with clear community structures, a higher weight for $c(i)$ may lead to better performance, while in other types of networks, a lower weight may yield better results. Therefore, future work may involve incorporating adjustable parameters to balance the weights of different indicators, which is a direction for further exploration. Additionally, these algorithms have not been evaluated in weighted networks, where the impact of the path from node $i$ to node $j$ may differ from that of the path from node $j$ to node $i$, and the link heterogeneity⁶⁰ in a weighted network may result in varying node impact. Lastly, future research may involve incorporating adjustable parameters to modify the interplay of gravitational forces among nodes and balance the weights of different metrics in order to improve the performance of the algorithm.

Data availability

All relevant data are available at https://github.com/MLIF/Network-Data.

References

Strogatz, S. H. Exploring complex networks. Nature 410, 268–276 (2001).
CAS PubMed MATH ADS Google Scholar
Newman, M. E. J. The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003).
MathSciNet MATH ADS Google Scholar
Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).
PubMed PubMed Central Google Scholar
Zhu, P., Zhi, Q., Guo, Y. & Wang, Z. Analysis of epidemic spreading process in adaptive networks. IEEE Trans. Circuits Syst. II Express Briefs 66, 1252–1256 (2019).
Google Scholar
Yao, S., Fan, N. & Hu, J. Modeling the spread of infectious diseases through influence maximization. Optim. Lett. 16, 1563–1586 (2022).
MathSciNet PubMed PubMed Central MATH Google Scholar
Albert, R., Albert, I. & Nakarado, G. L. Structural vulnerability of the North American power grid. Phys. Rev. E 69, 025103 (2004).
ADS Google Scholar
Hosni, A. I. E., Li, K. & Ahmad, S. Minimizing rumor influence in multiplex online social networks based on human individual and social behaviors. Inf. Sci. 512, 1458–1480 (2020).
MathSciNet MATH Google Scholar
Ahmed, W., Vidal-Alaball, J., Downing, J. & Seguí, F. L. COVID-19 and the 5G conspiracy theory: Social network analysis of Twitter data. J. Med. Internet Res. 22, e19458 (2020).
PubMed PubMed Central Google Scholar
Xu, W. et al. Identifying structural hole spanners to maximally block information propagation. Inf. Sci. 505, 100–126 (2019).
MathSciNet MATH Google Scholar
Csermely, P., Korcsmáros, T., Kiss, H. J. M., London, G. & Nussinov, R. Structure and dynamics of molecular networks: A novel paradigm of drug discovery: A comprehensive review. Pharmacol. Ther. 138, 333–408 (2013).
CAS PubMed PubMed Central Google Scholar
Liu, Y., Wang, X. & Kurths, J. Framework of evolutionary algorithm for investigation of influential nodes in complex networks. IEEE Trans. Evol. Comput. 23, 1049–1063 (2019).
Google Scholar
Morone, F. & Makse, H. A. Influence maximization in complex networks through optimal percolation. Nature 524, 65–68 (2015).
CAS PubMed ADS Google Scholar
Sui, L. et al. The fractal description model of rock fracture networks characterization. Chaos Solitons Fractals 129, 71–76 (2019).
ADS Google Scholar
Huang, Y., Dong, H., Zhang, W. & Lu, J. Stability analysis of nonlinear oscillator networks based on the mechanism of cascading failures. Chaos Solitons Fractals 128, 5–15 (2019).
MathSciNet MATH ADS Google Scholar
Zhao, J. & Deng, Y. Complex network modeling of evidence theory. IEEE Trans. Fuzzy Syst. 29, 3470–3480 (2021).
Google Scholar
Namtirtha, A., Dutta, A. & Dutta, B. Weighted kshell degree neighborhood: A new method for identifying the influential spreaders from a variety of complex network connectivity structures. Expert Syst. Appl. 139, 112859 (2020).
Google Scholar
Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2, 113–120 (1972).
Google Scholar
Lü, L., Zhou, T., Zhang, Q.-M. & Stanley, H. E. The H-index of a network node and its relation to degree and coreness. Nat. Commun. 7, 10168 (2016).
PubMed PubMed Central ADS Google Scholar
Zareie, A. & Sheikhahmadi, A. EHC: Extended H-index Centrality measure for identification of users’ spreading influence in complex networks. Phys. Stat. Mech. Appl. 514, 141–155 (2019).
Google Scholar
Xu, G.-Q., Meng, L., Tu, D.-Q. & Yang, P.-L. LCH: A local clustering H-index centrality measure for identifying and ranking influential nodes in complex networks. Chin. Phys. B 30, 088901 (2021).
ADS Google Scholar
Freeman, L. C. Centrality in social networks conceptual clarification in Hawaii Nets conferences. Cent. Soc. Netw. Concept. Clarification Hawaii Nets Conf. 1, 215–239 (1979).
Google Scholar
Freeman, L. C. A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977).
Google Scholar
Lü, C. et al Identifying Influential Nodes in Complex Networks.pdf (2012).
Kitsak, M. et al. Identification of influential spreaders in complex networks. Nat. Phys. 6, 888–893 (2010).
CAS Google Scholar
Bae, J. & Kim, S. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Phys. Stat. Mech. Appl. 395, 549–559 (2014).
MathSciNet MATH Google Scholar
Li, C., Wang, L., Sun, S. & Xia, C. Identification of influential spreaders based on classified neighbors in real-world complex networks. Appl. Math. Comput. 320, 512–523 (2018).
MathSciNet MATH Google Scholar
Wang, Z., Zhao, Y., Xi, J. & Du, C. Fast ranking influential nodes in complex networks using a k-shell iteration factor. Phys. Stat. Mech. Appl. 461, 171–181 (2016).
Google Scholar
Zeng, A. & Zhang, C.-J. Ranking spreaders by decomposing complex networks. Phys. Lett. A 377, 1031–1035 (2013).
CAS ADS Google Scholar
Qiu, L., Zhang, J. & Tian, X. Ranking influential nodes in complex networks based on local and global structures. Appl. Intell. 51, 4394–4407 (2021).
Google Scholar
Ullah, A. et al. Identifying vital nodes from local and global perspectives in complex networks. Expert Syst. Appl. 186, 115778 (2021).
Google Scholar
Yang, Y.-Z., Hu, M. & Huang, T.-Y. Influential nodes identification in complex networks based on global and local information. Chin. Phys. B 29, 088903 (2020).
ADS Google Scholar
Ma, L., Ma, C., Zhang, H.-F. & Wang, B.-H. Identifying influential spreaders in complex networks based on gravity formula. Phys. Stat. Mech. Appl. 451, 205–212 (2016).
MATH Google Scholar
Wang, J., Li, C. & Xia, C. Improved centrality indicators to characterize the nodal spreading capability in complex networks. Appl. Math. Comput. 334, 388–400 (2018).
MathSciNet Google Scholar
Li, Z. et al. Identifying influential spreaders by gravity model. Sci. Rep. 9, 8387 (2019).
PubMed PubMed Central ADS Google Scholar
Li, H., Shang, Q. & Deng, Y. A generalized gravity model for influential spreaders identification in complex networks. Chaos Solitons Fractals 143, 110456 (2021).
MathSciNet MATH Google Scholar
Yang, X. & Xiao, F. An improved gravity model to identify influential nodes in complex networks based on k-shell method. Knowl. Based Syst. 227, 107198 (2021).
Google Scholar
Li, Z. & Huang, X. Identifying influential spreaders in complex networks by an improved gravity model. Sci. Rep. 11, 22194 (2021).
CAS PubMed PubMed Central ADS Google Scholar
Li, Z. & Huang, X. Identifying influential spreaders by gravity model considering multi-characteristics of nodes. Sci. Rep. 12, 9879 (2022).
CAS PubMed PubMed Central ADS Google Scholar
Liu, Y., Cheng, Z., Li, X. & Wang, Z. An entropy-based gravity model for influential spreaders identification in complex networks. Complexity 2023, e6985650 (2023).
Google Scholar
Hethcote, H. W. The mathematics of infectious diseases. SIAM Rev. 42, 599–653 (2000).
MathSciNet MATH ADS Google Scholar
Castellano, C. et al. Thresholds for epidemic spreading in networks. Phys. Rev. Lett. 105, 218701. https://doi.org/10.1103/PhysRevLett.105.218701 (2010).
Article CAS PubMed ADS Google Scholar
Kendall, M. G. A new measure of rank correlation. Biometrika 30, 81–93 (1938).
MATH Google Scholar
Zareie, A., Sheikhahmadi, A., Jalili, M. & Fasaei, M. S. K. Finding influential nodes in social networks based on neighborhood correlation coefficient. Knowl. Based Syst. 194, 105580 (2020).
Google Scholar
Liu, Y., Tang, M., Zhou, T. & Do, Y. Core-like groups result in invalidation of identifying super-spreader by k-shell decomposition. Sci. Rep. 5, 9602 (2015).
CAS PubMed PubMed Central Google Scholar
Lu, M. Node importance evaluation based on neighborhood structure hole and improved TOPSIS. Comput. Netw. 178, 107336 (2020).
Google Scholar
Pajek Datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/.
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).
CAS PubMed MATH ADS Google Scholar
Guimerà, R., Danon, L., Díaz-Guilera, A., Giralt, F. & Arenas, A. Self-similar community structure in a network of human interactions. Phys. Rev. E 68, 065103 (2003).
ADS Google Scholar
Spring, N., Mahajan, R., Wetherall, D. & Anderson, T. Measuring ISP topologies with Rocketfuel. IEEEACM Trans. Netw. 12, 2–16 (2004).
Google Scholar
Gleiser, P. M. & Danon, L. Community structure in jazz. Adv. Complex Syst. 06, 565–573 (2003).
Google Scholar
Newman, M. E. J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006).
MathSciNet CAS ADS Google Scholar
Leskovec, J. & Mcauley, J. Learning to discover social circles in ego networks. In Advances in Neural Information Processing Systems. Vol. 25 (Curran Associates, Inc., 2012).
Adamic, L. A. & Glance, N. The political blogosphere and the 2004 U.S. election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. 36–43 (Association for Computing Machinery, 2005). https://doi.org/10.1145/1134271.1134277.
Leskovec, J., Huttenlocher, D. & Kleinberg, J. Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web. 641–650 (Association for Computing Machinery, 2010). https://doi.org/10.1145/1772690.1772756.
Rocha, L. E. C., Liljeros, F. & Holme, P. Simulated epidemics in an empirical spatiotemporal network of 50,185 sexual contacts. PLOS Comput. Biol. 7, e1001109 (2011).
CAS PubMed PubMed Central ADS Google Scholar
Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
CAS PubMed ADS Google Scholar
Hu, H.-B. & Wang, X.-F. Unified index to quantifying heterogeneity of complex networks. Phys. Stat. Mech. Appl. 387, 3769–3780 (2008).
Google Scholar
Castellano, C. & Pastor-Satorras, R. Thresholds for epidemic spreading in networks. Phys. Rev. Lett. 105, 218701 (2010).
PubMed ADS Google Scholar
Amaral, L. A. N., Scala, A., Barthélémy, M. & Stanley, H. E. Classes of small-world networks. Proc. Natl. Acad. Sci. 97, 11149–11152 (2000).
CAS PubMed PubMed Central ADS Google Scholar
Bellingeri, M., Bevacqua, D., Scotognella, F. & Cassi, D. The heterogeneity in link weights may decrease the robustness of real-world complex weighted networks. Sci. Rep. 9, 10692 (2019).
CAS PubMed PubMed Central ADS Google Scholar

Download references

Author information

Authors and Affiliations

Physical and Electronic Sciences College, Hunan University of Science and Technology of China, Xiangtan, 411100, People’s Republic of China
Siqi Zhu, Jie Zhan & Xing Li

Authors

Siqi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Xing Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.Z. devised the research project. S.Z. performed the research. S.Z. and J.Z. analyzed the data. S.Z. and X.L. wrote the paper.

Corresponding authors

Correspondence to Siqi Zhu or Jie Zhan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, S., Zhan, J. & Li, X. Identifying influential nodes in complex networks using a gravity model based on the H-index method. Sci Rep 13, 16404 (2023). https://doi.org/10.1038/s41598-023-43585-x

Download citation

Received: 14 March 2023
Accepted: 26 September 2023
Published: 29 September 2023
DOI: https://doi.org/10.1038/s41598-023-43585-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.