Decoding Virgil van Dijk: Part 3
Identifying Similar Players Through Advanced Statistical Modeling
Hello and welcome to the last chapter of my data scouting series where I try to find the next Virgil van Dijk through data.
I am really excited about this one because I got some incredible help and assistance from a colleague and friend, Marwane Hamdani, author of
.Marwane is a football data scientist who specializes in using advanced statistical models to analyze both players and the game itself. I find his articles on team rebuilds particularly fascinating! I highly recommend checking them out. You can find his substack below 💡
I asked Marwane if he could give me a list of players similar to Van Dijk based on data from the current season from the top 5 leagues. In contrast to my own framework in Part 2 where I handpicked statistics, Marwane’s model is based on the statistical concept called Cosine-Similarity.
The Model
In short, Cosine Similarity is a way to measure how similar two players are by calculating the angle between their statistical vectors. Instead of comparing raw values directly, it assesses the direction of a player's statistical profile, meaning two players with similar styles will have a smaller angle between their vectors, even if their overall volume of actions differs. This allows us to find comparable players based on patterns in their data rather than just absolute numbers.
Here are the results of the 15 most similar players to Virgil van Dijk:
Some very interesting players, indeed! Which player stands out for you the most?
At the top of the list we have Thilo Kehrer. If I’m not mistaken, he actually started his career as a full back. He previously played for PSG and now finds himself at Monaco.
In second place is Taylor Harwood-Bellis, currently at Southampton. Looping back to Liverpool’s lessons in recruitment, you can often find top players in teams that are going down. Could he be a hidden gem and a future elite centre back? He’s already made his senior debut for England during Lee Carsley’s brief stint as interim manager before Tuchel took over.
Other names that catch my eye include Julian Chabot in third, doing well for a revitalized Stuttgart side in the Bundesliga. Then there’s Philipp Lienhart, also in Germany, putting in solid performances for Freiburg.
And right after that we’ve got 41 year old Dante, still going strong. Legend.
I’m also happy to see Nico Schlotterbeck, Lewis Dunk, and Cristian Romero on the list. All of them scored highly in my own data search from the previous article.
And last but not least, Alexsandro Ribeiro, who ended up being my final pick for an elite centre back. You can read the full breakdown on him below.
Analysis
These are all interesting players, but now it’s time to dig a little deeper. First, I’m going to rank the listed players by tackles per 90 minutes, since that’s how this whole series started in the first place.
My belief is that elite centre backs should score low in tackles. It’s a sign that they’re preventing danger before it happens, relying on smart positioning and reading of the game rather than last-ditch interventions.
The players highlighted in the bar chart are the ones with the lowest tackles per 90. For reference, Virgil van Dijk averages around 1 tackle per game. Based on that benchmark, 8 players drop out of contention on this stat alone.
Our final shortlist of players are the following: Philipp Lienhart, Lewis Dunk, Ladislav Krejčí, Taylor Harwood-Bellis, Alexsandro Ribeiro, Arthur Theate and Julian Chabot.
Moving on, I’m going to rank the 7 selected players based on progression. Since Van Dijk’s strongest progressive stat is passes into the final third, that’s the one I’m focusing on here.
Harwood-Bellis stands out here too, averaging 6.36 passes into the final third. This is just one pass shy of Van Dijk.
Ribeiro tops the list, followed by Dunk and Ladislav Krejčí, who plays for Girona. I wasn’t too familiar with him before, but I see he’s only 25 and has chipped in with 2 goals this season in La Liga. That suggests he’s a threat in the opponent’s box as well, another trait we often associate with Van Dijk.
Let’s see if that holds up by looking at how these players perform in terms of expected goals, a stat we don’t typically use to evaluate defenders, but one that might reveal a bit more here.
As we can see, the intuition was right. Krejčí and Harwood-Bellis both rank at the top for non-penalty xG among the seven players. For reference, Van Dijk averages around 0.07 xG per 90 himself.
Based on this full statistical breakdown, my final pick is Taylor Harwood-Bellis.
At just 23 years old, he’s already showing the right profile, and it wouldn’t be a surprise if a bigger club comes in for him ahead of next season.
Let’s wrap it up by taking a closer look at his radar chart.
Radars
Harwood-Bellis’ radar chart paints a highly promising picture. His relative metrics bear a strong resemblance to Virgil van Dijk’s, which is unsurprising given the underlying statistical model for similarity.
However, there is one notable exception: Harwood-Bellis ranks in the 95th percentile for blocks.
This can easily be explained by context and the fact that he has to defend a lot more than Van Dijk, especially within his own box. Playing for a soon to be relegated team will naturally show itself in this way. As such, this metric shouldn't be overemphasized when assessing his true quality.
Instead, we can look at his progressive stats which are quite good, scoring in the 95th percentile for passes into the final third. Remember that Southampton was really good in the Championship last season, adopting a highly ball-centered approach which eventually led to Russel Martin’s dismissal in the Premier League.
The silver lining however, is that Harwood-Bellis has recieved consistent coaching in building from the back and progressing play through the thirds.
He is not exceptional in the air like Van Dijk, but he demonstrates respectable numbers averaging 2 aerials won per game with a success rate of 63.5 %.
Perhaps most intriguing is his low tackling volume (15th percentile), which is actually a positive. Either he’s consistently late to engage in tackles, or he’s reading the game so well that he rarely needs to. I am choosing to go for the latter conclusion as he also does well for challenges lost and fouls committed.
The overall conclusion is that Harwood-Bellis, at only 23 years old, is an exciting prospect that shows similar characteristics to Virgil van Dijk.
The question is, who will sign him?
Bonus - Dean Huijsen
Finally, let’s talk about a player who’s had a real breakthrough season. Dean Huijsen is arguably one of the hottest centre backs on the market right now after an impressive spell with Bournemouth. How Juventus let him go, I’ll never understand.
Huijsen is tall, fast, physical, and incredibly smooth on the ball. He’s also comfortable with both feet and regularly picks out runners with well-weighted balls in behind.
The reason I’m bringing him up at the end of this series is because he might be the closest resemblance to Van Dijk. He didn’t make the initial shortlists simply because he fell just below my threshold for aerial duels won percentage.
He has a 59 % success rate but he averages 2.5 aerial duels per 90 which is not bad. Additionally, he recorded slightly more than 2 interceptions per game.
That said, I’ve actually come to see higher interceptions as a good sign, as they reflect anticipation and game reading more than reactive defending like tackles do. And given the transitional way that Bournemouth plays, it makes sense that he scores highly in this statistic, often pushing up to intercept balls in line with the high press of the Bournemouth style of play.
This is where it gets interesting...
When I ran my own similarity calculation (not cosine similarity) based on Van Dijk’s exact per 90 stats, Huijsen and Marcos Alonso were the only two out of 796 defenders to show an overall similarity score of 75% or higher.
I’m not claiming this is a perfect method, but I feel confident in the stat selection and the alternative approach to comparing defenders. A testament to this approach is the fact that Alexsandro Ribeiro was a standout performer in both models of similarity.
Ending this series with Huijsen’s radar chart, and hoping that he picks the right team!
Final Remarks
I had been thinking about doing a piece on Van Dijk for a while, but little did I know it would turn into a full three-part series. Sometimes the data just pulls you into a rabbit hole, and you can’t help but keep digging.
If you made it through the entire Decoding Van Dijk series—thank you so much for reading and engaging. It really means a lot.
Also, a big shoutout to Marwane for providing the statistical foundation for this final piece 📊
📧 If you enjoyed this post, I’d appreciate it if you would share it with a friend 🌟
🔗 I post regularly on my LinkedIn as well, so feel free to connect with me there 🤝