Introduction

Complex networks are a pervasive presence in various domains of both human society and the natural world. In each system, individuals and their relationships can be represented as networks consisting of nodes and edges1,2. Recently, the identification of significant nodes in complex networks has gained significant attention from researchers, providing a new perspective for understanding the objective world and facilitating a better comprehension of the spread of diseases3,4,5, power grid protection6, information dissemination7,8,9, protein discovery10, and immunization strategies11,12, among other fields13,14,15.

To date, numerous centrality methods have been proposed to detect key nodes in complex networks. Centrality measurement methods can be primarily categorised into three types: local indices, global indices, and hybrid indices16. Local-index-based centrality methods include classical measures such as degree centrality17 (DC) and H-index18. Local-index-based methods have low computational complexity and are suitable for large-scale networks as they only consider the local neighbourhood information of nodes. However, their ability to identify influential nodes that are not central but have high impact is limited. To address this limitation, many researchers have proposed improvements, such as extended H-index centrality19 (EHC) and local clustering H-index centrality20 (LCH) methods. Global-index-based centrality methods assess individuals' influence by considering the global structural information of the network, such as closeness centrality21 (CC) and betweenness centrality22 (BC). The main drawbacks of these measurement methods are their high computational complexity and inapplicability to large-scale networks23. Among them, the K-shell decomposition method24 (KS), as a global approach, determines the influence of nodes by differentiating their core levels and operates at a faster speed. However, the main limitation of k-shell is that it assigns the same k-shell value to many nodes, resulting in low differentiation in node influence ranking. Many efforts have been made to address this issue, such as extended neighbourhood coreness25 (CNC+), classifying neighbourhood26 (CN), k-shell iteration factor27 (KSIF), and Mixed Degree Decomposition28 (MDD). The primary limitations of these global methods are their typically high computational costs as they consider the entire topological structure of the network. Hybrid-index-based centrality methods, such as local and global influence29 (LGI), local and global centrality30 (LGC) and global and local information31 (GLI) integrate both local and global information about nodes, aiming to strike a balance between algorithm accuracy and computational complexity.

The gravity model not only considers the attributes held by two nodes but also takes into account the shortest path information between nodes, which represents their mutual interactions and provides a basis for integrating local and global information. Inspired by this formula, Ma et al.32 proposed two models (G and G+) based on the gravity formula. These models adopt the k-shell value of a node as its mass and use the shortest path distance between two nodes as the distance. Building upon this, Wang et al.33 improved the model by considering the degree values of neighbouring nodes, resulting in the improved gravity centrality (IGC). Li et al.34 introduced the gravity model (GM), which employs the degree of nodes as their mass, and developed the local gravity model (LGM), which only considers node pairs within a truncated radius. Furthermore, Li et al.35 combined the local clustering coefficient and degree value as the mass of nodes, proposing the generalized gravity centrality (GGC). In addition, Yang et al.36 introduced a gravity centrality (KSGC) based on the K-shell value of nodes, considering the variations in interactions when nodes are located in different shell layers. Li et al.37 combined the k-shell value and k-shell iteration factor as the mass of nodes, presenting the DK-based gravity model (DKGM) to enhance the model's performance. Subsequently, they considered multiple features of nodes and proposed the multi-characteristics gravity model38 (MCGM). Liu et al.39 introduced the spreading entropy gravity Model (SEGM), incorporating the spreading information entropy of nodes into consideration.

From the above, we can observe that many of the gravity models mentioned are either based on node degree, related to the k-shell value, or a combination of both. However, It is not enough to evaluate the importance of a node solely on the basis of its single attributes; it is also necessary to consider the location of the node and the overall influence of neighbouring nodes on it. For instance, some nodes may have a relatively small k-shell index but possess significant influence since they act as bridges connecting different communities within the network. Similarly, there are nodes with lower degree or k-shell values compared to others but are closer to the most important nodes in the network, surrounded by highly influential nodes, as a result, their importance will also be enhanced. To address this issue, we propose the H-index-based gravity centrality method (HVGC), which not only considers the path information of nodes but also incorporates the overall influence of neighbouring nodes, structural hole position information of nodes, and the differential gravitational impact of nodes positioned at different locations. Experimental results demonstrate that our proposed method exhibits significant competitiveness compared to other advanced gravity models, Particularly in networks with evident community structures, it exhibits outstanding accuracy, unlike other algorithms that are prone to identifying false core nodes.

Preliminaries

Centrality measures

In the context of an undirected and unweighted simple network \(G \, = < V, \, E >\),\(V\) and \(E\) respectively represent the sets of nodes and links. The cardinality of \(V\) and \(E\) can be expressed as \(\left| V \right| = N\) and \(\left| E \right| = M\), indicating the presence of \(N\) nodes and \(M\) links within the network. The network's connectivity structure is typically captured by its adjacency matrix \(A = (a_{ij} )_{N \times N}\), where \(a_{ij} = 1\) if node \(i\) and node \(j\) are linked, and 0 otherwise.

Degree centrality17 of node \(i\) is defined as

$$ DC(i) \, = \, k(i), $$
(1)

where \(k(i) = \sum\limits_{j = 1}^{N} {a_{ij} } .\)

The maximum integer fulfilling that there are at least \(H(i)\) neighbors of node \(i\) whose degrees are all at least \(H(i)\), represented by \(H(i)\), is known as the H-index18 of the node \(i\).

The k-shell decomposition method24(KS), operates through an iterative process of decomposing the network into distinct shells. Initially, KS removes nodes with a degree of 1 from the network, resulting in a decrease in the degree values of the remaining nodes. This process is repeated by removing nodes with residual degrees less than or equal to 1 until all remaining nodes have residual degrees greater than 1. The nodes removed in the first step constitute the 1-shell, and their k-shell values are assigned as 1. This process is then iteratively applied to obtain the 2-shell, 3-shell, and so on. The decomposition process continues until all nodes in the network have been accounted for.

Gravity centrality32 (G) of node \(i\) is defined as

$$ G(i) = \sum\limits_{{j \in \psi_{i} }} {\frac{{k_{s} (i)k_{s} (j)}}{{d^{2} (i,j)}}} , $$
(2)

where \(k_{s} (i)\) is the k-shell value of node \(i\), \(d(i,j)\) is the shortest path distance from node \(i\) to node \(j\), and \(\psi_{i}\) is the set of nodes whose distance from node \(i\) does not exceed 3.

Extended gravity centrality32 (G+) of node \(i\) is described as

$$ G_{ + } (i) = \sum\limits_{{j \in \Lambda_{i} }} G (j), $$
(3)

\(\Lambda_{i}\) is the nearest neighborhood of node \(i\).

The improved gravity centrality33 (IGC) of node \(i\) is measured by

$$ IGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{{k_{s} (i)k(j)}}{{d^{2} (i,j)}}} , $$
(4)

where \(R\) is the truncation radius, and the optimal truncation radius \(R^{*}\) can be estimated by

$$ R^{*} \approx \frac{1}{2}\langle d\rangle , $$
(5)

where \(\langle d\rangle\) is the average distance of the network.

Extended improved gravity centrality33 (IGG+) of node \(i\) is described as

$$ IGC_{ + } (i) = \sum\limits_{{j \in \Lambda_{i} }} {IGC} (j), $$
(6)

\(\Lambda_{i}\) is the nearest neighborhood of node \(i\).

The local gravity model34 (LGM) of node \(i\) is determined by

$$ LGM(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{k(i)k(j)}{{d^{2} (i,j)}}} , $$
(7)

The generalized gravity centrality35 (GGC) of node \(i\) is defined as

$$ GGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{{S_{p} (i)S_{p} (j)}}{{d^{2} (i,j)}}} , $$
(8)
$$ S_{p} (i) = {\text{e}}^{{ - \alpha C_{i} }} \times k(i) $$
(9)
$$ C_{i} = \frac{{2n_{i} }}{k(i)(k(i) - 1)} $$
(10)

where \(C_{i}\) is the local clustering coefficient of node \(i\), \(n_{i}\) denotes the number of edges between neighbors of node \(i\), and \(\alpha = 2\).

The k-shell based on gravity centrality36 (KSGC) is defined as

$$ KSGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {F(i,j)} , $$
(11)
$$ F(i,j) = c_{ij} \frac{k(i)k(j)}{{d^{2} (i,j)}}, $$
(12)
$$ c_{ij} = e^{{\frac{ks(i) - ks(j)}{{ks_{\max } - ks_{\min } }}}} $$
(13)

where \(c_{ij}\) is the coefficient of attraction exerted by node \(i\) on node \(j\), \(k_{s} (i)\) and \(k_{s} (j)\) denote the k-shell values of node \(i\) and node \(j\), respectively. \(ks_{\max }\) and \(ks_{\min }\) refer to the largest and smallest k-shell values present in the network. \(d(i,j)\) is the shortest path distance from node \(i\) to node \(j\).

The DK-based gravity model37 (DKGM) is measured by

$$ DKGM(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{DK(i)DK(j)}{{d^{2} (i,j)}}} , $$
(14)
$$ DK(i) = k(i) + k_{s}^{*} (i), $$
(15)
$$ k_{s}^{*} (i) = k_{s} (i) + \frac{p(i)}{{q(k) + 1}}, $$
(16)

assume that the value of the k-shell of node \(i\) is \(k_{s} (i).\) For the process of the k-degree iteration, the total iteration number is \(q(k)\), and node \(i\) is removed in the \(p(i)\) iteration of the k-degree process. \(k_{s}^{*} (i)\) is called the improved k-shell index of node \(i\).

The multi-characteristics gravity model38 (MCGM) is measured by

$$ {\text{MCGM}} (i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{{\left( {\frac{k(i)}{{k_{\max } }} + \frac{{\alpha k_{s} (i)}}{{k_{s\max } }} + \frac{x(i)}{{x_{\max } }}} \right)\left( {\frac{k(j)}{{k_{\max } }} + \frac{{\alpha k_{s} (j)}}{{k_{s\max } }} + \frac{x(j)}{{x_{\max } }}} \right)}}{{d^{2} (i,j)}}} , $$
(17)
$$ \alpha = \frac{{\max \{ \frac{{k_{{{\text{mi}} d}} }}{{k_{\max } }},\frac{{x_{{{\text{mi}} d}} }}{{x_{\max } }}\} }}{{\frac{{k_{{s{\text{mi}} d}} }}{{k_{s\max } }}}}, $$
(18)

where \(k_{mid}\), \(k_{smid}\) and \(x_{mid}\) denote the median of degree value, k-shell value and eigenvector centrality value, respectively. \(k_{\max }\), \(k_{s\max }\) and \(x_{\max }\) denote the maximum values of degree value, k-shell value, and eigenvector centrality value.

The entropy-based gravity model39 (SEGM) is defined as

$$ SEGM(i) = \sum\limits_{d(i,j) \le R,j \ne i} {\frac{SE(i)SE(j)}{{d^{2} (i,j)}}} , $$
(19)
$$ SE(i) = e^{E(i)} k(i), $$
(20)
$$ E(i) = - \sum\limits_{j \in \Gamma (i)} {I(j)\ln I(j)} , $$
(21)
$$ I(i) = \frac{k(i)}{{\sum\limits_{j \in \Gamma (i)} {k(j)} }}, $$
(22)

where \(E(i)\) is the information entropy of node \(i\), \(\Gamma (i)\) represents the set of neighboring nodes of node \(i\),and \(I(i)\) is the importance of node \(i\).

The SIR model used in this paper

To evaluate the ranking of impact generated by the algorithm and the simulation, we employed the widely used SIR model40. In the beginning, a single node in the network, referred to as the "source node," is in the infected state (I), while the remaining nodes are in the susceptible state (S). An infected node has the potential to infect its susceptible neighbors with a probability of \(\beta\), and the probability of each infected node entering the recovery (R) state is \(\lambda\), after which it ceases to participate in the dynamics. This propagation process continues until no infected nodes remain in the network. The impact of any given node \(i\) can be estimated by

$$ F(i) = N_{r} /N $$
(23)

the number of nodes that recover after the diffusion process has stabilized is represented by \(N_{r}\). For the sake of simplicity,\(\lambda\) has been set to 1. Subsequently, the corresponding epidemic threshold41 can be computed by

$$ \beta_{c} \approx \frac{\langle k\rangle }{{\langle k^{2} \rangle - \langle k\rangle }} $$
(24)

where \(\langle k\rangle\) and \(\langle k^{2} \rangle\) are the degree distribution's average degree and second-order moments.

Measures

Kendall’s tau coefficient

Kendall's tau coefficient42 is a measure of correlation between two sequences, with a larger value indicating a greater similarity between the sequences. The definition of Kendall's tau coefficient is as follows: given two sequences \(X\) and \(Y\) of the same length, where the \(i\) th values are represented by \(x_{i}\) and \(y_{i}\), respectively. Let each pair of elements \(x_{i}\) and \(y_{i}\) form a set, denoted by \((x_{i} ,y_{i} )\). If \(x_{i} > x_{j}\) and \(y_{i} > y_{j}\), or \(x_{i} < x_{j}\) and \(y_{i} < y_{j}\), the pairs \((x_{i} ,y_{i} )\) and \((x_{j} ,y_{j} )\) are considered concordant. They are considered discordant if \(x_{i} > x_{j}\) and \(y_{i} < y_{j}\), or \(x_{i} < x_{j}\) and \(y_{i} > y_{j}\). If \(x_{i} = x_{j}\) and \(y_{i} = y_{j}\), the pair is neither concordant nor discordant. Therefore, the Kendall's tau coefficient τ is defined as

$$ \tau = \frac{{2(n_{ + } - n_{ - } )}}{N(N - 1)} $$
(25)

where \(n_{ + }\) is the number of concordant pairs, and \(n_{ - }\) is the number of discordant pairs.

Jaccard similarity coefficient

In some applications, concentrating on the top-rank nodes rather than all nodes may be appropriate. In contrast to the Kendall correlation coefficient, the Jaccard similarity coefficient is utilized to assess the similarity between the top-k nodes in two ranking lists25,43. The Jaccard similarity is calculated by dividing the number of common nodes by the number of unique nodes in the two lists, and its expression is

$$ Jaccard(X,Y) = \frac{|X \cap Y|}{{|X \cup Y|}} $$
(26)

where \(X\) and \(Y\) represent the top-k nodes with the highest influence as determined by two different methods. In the context of our experiments, \(X\) represents the top-k nodes identified by HvGC and other baseline methods, while \(Y\) represents the top-k nodes obtained through the SIR simulation. We use the Jaccard similarity coefficient to measure the similarity between these two sets of top-k nodes. The Jaccard similarity coefficient ranges from 0 to 1, where a higher value indicates a greater degree of similarity between the two ranking results. A Jaccard similarity coefficient of 0 indicates completely distinct results, while a value of 1 indicates that the two sets of top-k nodes are identical.

The monotonicity index

The monotonicity25 \(M\) is used to quantitatively measure the resolution of different indices in ranking list \(X\), and can be calculated by

$$ M(X) = \left[ {1 - \frac{{\sum\limits_{c \in V} {N_{c} } \left( {N_{c} - 1} \right)}}{N(N - 1)}} \right]^{2} $$
(27)

where \(N\) is the size of network, and \(N_{c}\) is the number of nodes with the same index value \(c\).

Results

Algorithms

Previous research has utilized the gravity model approach to analyze node importance in complex networks. Degree and k-shell values are commonly used metrics to consider the number of neighbors a node has and its position within the network, respectively. However, these metrics alone do not capture the overall influence of a node's neighbors. While the H-index considers the importance of a node's neighbors, it may overlook certain information from neighboring nodes, failing to account for the collective impact of all neighbors. We take the toy network shown in Fig. 1 to illustrate the problem for H-index, where the node spreading capacity derived from 1000 independent runs of the SIR model has been numerically labeled in Fig. 1. Obviously, \(H(1) = H(2) = H(3) = H(4) = H(10) = 1\), \(H(5) = H(7) = H(8) = 3\),\(H(6) = H(9) = 2\), where \(H(i)\) represents the H-index of node \(i\). The H-index always assigns the same value to different nodes, which leads to a lack of excellence in the ability to differentiate the influence of nodes.

Figure 1
figure 1

A toy network. The red node is ranked first in terms of H-index, while green and yellow represent second and third, respectively.

The same issue exists in DC17 and KS24. Additionally, from Fig. 1, it can be observed that Node 3 has a higher propagation capability compared to Node 9, but Node 3 has a lower H-index than Node 9. This indicates that the H-index overlooks some information from the neighbors of a node. From this, we take out all neighboring nodes in the set of neighbors of node \(i\) with degree values greater than or equal to \(H(i)\) and add up the degree values of these nodes to measure the overall influence of the neighboring nodes on node \(i\). The value obtained is denoted as \(HV(i)\), and the expression is

$$ HV(i) = \sum\limits_{{j \in \Lambda_{i} }} {\left[ {k(j)|k(j) \ge H(i)} \right]} , $$
(28)

where \(\Lambda_{i}\) is the nearest neighborhood of node \(i\),\(H(i)\) represents the H-index of node \(i\).

By incorporating the overall influence of node neighbors into the definition, it enhances the discriminative power of node identification compared to the H-index. However, it is still insufficient to accurately distinguish cluster-like nodes, due to their close connections, these nodes can more easily achieve greater HV values, but, their actual influence may not be greater than that of nodes with lower HV values, As shown in Fig. 1. \(HV(6) = 8\),\(HV(9) = 7\),\(HV(3) = 4\), and the actual propagation capacity from high to low is nodes 3, 9, and 6, a similar problem with the k-shell approach was noted by Liu et al.44 In other words, removing node 3 from the network would result in nodes 1, 2, and 4 losing their interactions with the core nodes, while removing node 6 has a minimal impact on information transmission in the network. This finding demonstrates the higher importance of nodes that serve as bridges between different clusters compared to those within individual clusters.

Based on this, we considered the structural hole position of nodes to enhance the algorithm's ability to identify nodes within community networks. This allows us to identify those bridge nodes that may not have high HV values but play a crucial role in facilitating information flow across different parts of the network. The network constraint coefficient measures the level of constraints imposed on nodes forming a structural hole (SH) in a network45, and it can be calculated as follows:

$$ c(i) = \sum\limits_{j \in \Gamma (i)} {\left( {p_{ij} + \sum\limits_{w \in \Gamma (j) \cap \Gamma (i)} {p_{iw} } \cdot p_{wj} } \right)^{2} } , $$
(29)
$$ p_{ij} = \frac{{z_{ij} }}{{\sum\limits_{w \in \Gamma (i)} {z_{iw} } }}, $$
(30)

where \(\Gamma (i)\) represents the set of neighboring nodes of node \(i\), and \(w \in \Gamma (i) \cap \Gamma (j)\) indicates the nodes that are common neighbors of both node \(i\) and node \(j\). \(p_{ij}\) represents the proportion of energy invested by node \(i\) to maintain its relationship with node \(j\). where \(z_{ij} = 1 \, (i \ne j)\) if there is a link between nodes \(i\) and \(j\), otherwise \(z_{ij} = 0\). Based on the above discussions, the gravity centrality based on the H-index (HVGC) measure proposed in this paper is defined as follows:

$$ HVGC(i) = \sum\limits_{d(i,j) \le R,j \ne i} {e^{ - c(i)} \frac{Hv(i)Hv(j)}{{d^{2} (i,j)}}} , $$
(31)

where \(c(i)\) represents the structural hole constraint coefficient in Eq. (29). A smaller value of \(c(i)\) indicates that the node occupies more structural holes and has a stronger ability to bridge different parts of the network. Finally, the metrics, including HVGC, H-index, HV, DC, and KS, were computed for each node in the toy network and compared with the node's spreading capability (SC). The results are presented in Table 1, revealing that HVGC achieves a nearly identical ranking to SC, indicating excellent performance. The algorithmic description of the HVGC is provided in Algorithm 1.

Table 1 The ranking results of SIR, DC, KS, H-index, HV, and HVGC on the toy network.

In addition, Fig. 2 depicts a network with a clear community structure, where the four nodes with the strongest propagation capabilities are marked in green. The propagation capabilities of these nodes were determined through 1000 independent experiments using the SIR model. We compared HVGC with other gravity model-based methods in identifying the top 5 nodes in this network, and the results are presented in Table 2.

figure a
Figure 2
figure 2

A sample network with an obvious community structure, where the four nodes with the strongest propagation capabilities are marked in green.

Table 2 Comparison of the rankings of the top-5 nodes identified by different methods and the rankings based on the SIR propagation ability in the sample network.

Data description

This paper evaluates the efficacy of HVGC by analyzing ten real networks from six distinct domains, including a transportation network(USAir46), an infrastructure network (Power47), a communication network (Email48), a technology network (Router49), two collaborative networks (Jazz50and NS51), and four social networks (Facebook52, PB53, WV54, and Sex55). Table 3 presents the fundamental topological properties of these networks. \(N\) represents the number of nodes in the network, and \(M\) represents the number of links. The average degree of nodes is denoted as \(\langle k\rangle\), and the average distance between pairs of nodes is denoted as \(\langle d\rangle\). The clustering coefficient47 of the network is denoted by \(C\), while \(r\) represents the assortative coefficient56. The degree heterogeneity57 of the network is denoted by \(H\). Additionally, \(\beta_{c}\) represents the epidemic threshold58 of the SIR model40 used to simulate the diffusion process.

Table 3 The topological features of ten real networks.

Empirical results

Based on the aforementioned real network, we conducted simulations and compared the influence rankings of various algorithms utilizing the SIR model. In order to ensure the credibility of our findings and the standard ranking of nodes' influence, we conducted 1000 independent experiments for each given network and transmission probability \(\beta\), with any one node being chosen as the seed node once during each run. The processor and runtime environment used for the calculations are i7-12700H and Python 3. The development platform used for this paper is Anaconda 3, and the code was executed in Jupyter Notebook. Kendall's tau (\(\tau\)) was utilized to evaluate the accuracy of the algorithms, with a higher value indicating a greater correlation between the observed sequences and an improved algorithm performance. Table 4 provides a comparison of the accuracy of the proposed algorithm (HVGC) and ten benchmark algorithms, which include degree centrality17 (DC), k-shell decomposition method24 (KS), the extended version of gravity centrality32 (G+), extended version of improved gravity centrality33 (IGC+), local gravity model34 (LGM), generalized gravity centrality35 (GGC), the improved gravitational centrality based on k-shell values36 (KSGC), the DK-based gravity model37 (DKGM), multi-characteristics gravity model38 (MCGM), and entropy-based gravity model39 (SEGM).Additionally, Fig. 3 displays the accuracy of the different algorithms for varying values of \(\beta\), within the range of \(0.5\beta_{c}\) to \(1.5\beta_{c}\).

Table 4 The algorithms’ accuracies for \(\beta = \beta_{c}\), measured by the Kendall’s Tau (τ).
Figure 3
figure 3figure 3

Kendall's Tau was utilized to measure the accuracy of the algorithms at various \(\beta\) values. The different colour symbols represent different methods, and the red symbol represents HVGC algorithms.

According to Table 4, the methods that utilise the gravitational formula (G+, IGC+, LGM, GGC, KSGC, DKGM, MCGM, SEGM, and HVGC) exhibit significant advantages over classical methods (DC and KS). These advantages are especially prominent in the Power, Router, NS, and Sex networks. Furthermore, it is noteworthy that among all gravity-based algorithms tested on the ten networks, HVGC exhibited the best overall performance. Its Kendall coefficient ranked first in six out of ten networks, with a remarkable 70% proportion being in the top two ranks. Specifically, HVGC ranked first in the Jazz, email, Facebook, PB, WV, and USAir networks and second in the Router network. Additionally, as shown in Fig. 3, when \(\beta = \beta_{c}\), although HVGC did not perform best in the NS, Power, and Sex networks, as \(\beta\) increases, its performance becomes very close to or even surpasses the previous best-performing algorithm. Taking into account HVGC's superior performance in community-type networks discussed earlier, it demonstrates a stronger overall performance, affirming the robustness of our findings. Furthermore, Fig. 4 displays the optimal truncation radius of HVGC in the ten real networks, revealing that the majority of networks concentrate their optimal truncation radius at \(R = 1\). This indicates that HVGC achieves remarkably high accuracy by considering only the influence of the first-order neighbouring nodes of a node, while most other gravity model methods require considering information from second- or third-order neighbouring nodes. In other words, HVGC achieves a high level of accuracy while incurring lower time costs.

Figure 4
figure 4

The optimal truncation radius \(R^{*}\) of HVGC at \(\beta = \beta_{c}\) is presented in the graph. Each pentagram in the graph corresponds to a network, with a total of ten networks represented. The blue line corresponds to \(R^{*} = 1\). Specifically, for HVGC, the value of \(R^{*}\) is 1 in Email, Facebook, Jazz, PB, USAir, NS and WV networks, 2 in Router and Sex networks, and 4 in Power network. The majority of the networks have an optimal truncation radius of 1, with the next most common radius being 2. This outcome aligns with the characteristics of domain centrality, which typically considers first-order and second-order neighbor nodes. HVGC represents a significant advancement over the H-index in domain centrality to obtain centrality, which is consistent with this characteristic. However, this does not impede its competitiveness relative to other algorithms, as it manages to achieve both simplicity and accuracy.

Discussion

This paper introduces a novel method called HVGC for identifying influential nodes in a network. While the original gravity model considered both neighbourhood and path information, this new method enhances the existing gravity centrality approaches by taking into account the overall influence of a node's neighbourhood, considering the structural hole position of nodes, and incorporating the differences in interactions between nodes. This method addresses the limitations of existing gravity centrality methods and strengthens the ability to identify important nodes in networks with clear community structures. Therefore, this approach demonstrates a high level of comprehensive performance. We conducted an analysis of the SIR dynamic propagation process in 10 real networks to compare the performance of HVGC with previous state-of-the-art methods. The results, as shown in Table 4, indicate the strong competitiveness of our method.

In certain scenarios, it is necessary to identify the top-k influential nodes for controlling information propagation. Therefore, in addition to evaluating the different ranking methods for individual nodes, we also assessed their performance in identifying the top-k influential spreaders. In other words, we compared the ranked lists of node influence obtained from the ranking methods with the ranked lists of node influence obtained from the SIR simulation, both sorted in descending order. Subsequently, we analysed the similarity between the two lists by considering the top-k nodes. Figure 5 illustrates the results of the Jaccard coefficient for identifying the top-k influential spreaders, ranging from 5 to 100 with a step size of 5. The X -axis shows the number of top influential spreaders, and the Y -axis shows the Jaccard similarity coefficients.

Figure 5
figure 5figure 5

The Jaccard similarity coefficients on the top-k influential spreaders.

We can observe that, except for the Sex, Power, PB, and Router networks, HVGC exhibits the best and most stable overall performance in identifying the top-k influential spreaders in other networks. Specifically, across all networks, as the number of selected top-k nodes increases, HVGC consistently maintains a high-level or steadily increasing Jaccard coefficient, while other methods display varying degrees of fluctuations. Furthermore, we provide detailed plots for the top-25 nodes, revealing that HVGC consistently ranks among the top three in identifying the top-25 influential spreaders and, in some cases, even secures the first position, except for the Sex network. Therefore, we can conclude that HVGC not only accurately ranks the influence of all nodes in the network but also successfully identifies the top-k nodes with the highest impact.

After applying monotonicity25, we assessed the resolution of various algorithms. Table 5 illustrates that HVGC and MCGM demonstrate similar performance in terms of monotonicity. However, HVGC excels in the majority of networks by solely considering the first-order neighbour information of nodes, whereas MCGM, even with the inclusion of second-order neighbour information, does not necessarily outperform HVGC and incurs higher computational complexity. Furthermore, HVGC demonstrates significantly better performance in identifying important nodes in networks with community structure compared to MCGM. Therefore, overall, HVGC surpasses other gravity model algorithms. Based on the results presented in Table 5, HVGC consistently ranks either at the top or very close to the best-performing algorithm in terms of monotonicity.

Table 5 Monotonicity of the various algorithms is observed, with the best algorithm for each network highlighted in bold.

Based on the above discussion, it is evident that centrality based on the gravitational model is more accurate than classical centrality. However, many of these models tend to identify false core nodes in the network and do not take into account the influence of neighbouring nodes. In our proposed HVGC (H-index-based Gravity Centrality), we address this limitation by comprehensively considering the overall impact of a node's neighbours and its position within the network's structural holes. This approach effectively overcomes the drawbacks of gravity-based methods and demonstrates superior performance compared to other algorithms.

Despite the excellent performance exhibited by HVGC, it shares a common limitation with other gravity-based methods, namely the need to determine the optimal truncation radius \(R\). However, this disadvantage is mitigated by the fact that most real networks exhibit small-world characteristics47,59, and the optimal truncation radius is approximately linearly related to the average distance34. Furthermore, since HVGC is derived from the domain centrality method, even considering only the first-order neighbor nodes in the ten real networks studied can lead to very high performance and accurate results.

In conclusion, while HVGC demonstrates better overall performance compared to other gravity-based methods and introduces improvements to existing gravity models, there are still areas that require further refinement. For example, the current approach does not consider the influence of weight factors associated with different indicators. Instead, it directly operates on the indicator values of the nodes. The weights of HV and the structural hole constraint coefficient \(c(i)\) in the computation process may affect the accuracy of the algorithm. In networks with clear community structures, a higher weight for \(c(i)\) may lead to better performance, while in other types of networks, a lower weight may yield better results. Therefore, future work may involve incorporating adjustable parameters to balance the weights of different indicators, which is a direction for further exploration. Additionally, these algorithms have not been evaluated in weighted networks, where the impact of the path from node \(i\) to node \(j\) may differ from that of the path from node \(j\) to node \(i\), and the link heterogeneity60 in a weighted network may result in varying node impact. Lastly, future research may involve incorporating adjustable parameters to modify the interplay of gravitational forces among nodes and balance the weights of different metrics in order to improve the performance of the algorithm.