Football (Soccer) Datasets

Football datasets feature match-by-match historical soccer stats covering English Premier League, German Bundesliga, Italian Serie A, French Ligue 1, and Spanish La Liga datasets in Excel spreadsheets. Sort, filter by class, category, or in-season plan. View samples to see the what columns are included and see metadata for descriptions.

Discounts for Historical Datasets

(1) BULK PURCHASE DISCOUNT: You get a bigger discount as you add more seasons to your cart.
‎‎‎‏‏‎ ‎‏‏‎ ‎• 2 to 5 seasons → 5% OFF
‏‏‎ ‎‏‏‎ ‎• 6 to 10 seasons → 10% OFF
‏‏‎ ‎‏‏‎ ‎• 11 seasons and more → %15 OFF

(2) MEMBER DISCOUNT: You just need to log in, or sign up for a free BigDataBall account to get your 30% member discount on historical datasets in addition to the bulk purchase discount.
Discounts for Season Pass Plans

(1) BUNDLE PURCHASE DISCOUNT: You get a bigger discount as you add more season passes to your cart.
‏‏‎ ‎‏‏‎ ‎• 2 season passes → 5% OFF
‏‏‎ ‎‏‏‎ ‎• 3 season passes → 10% OFF
‏‏‎ ‎‏‏‎ ‎• 4 season passes and more → %15 OFF

(2) SUBSCRIBER DISCOUNT: If you have previously purchased a “season pass” from any sports, get 15% subscriber discount in addition to the bundle purchase discount.
Football Datasets & PlansView & Download SamplePriceBuy
ENG Historical Football Dataset - 2019-2020

EPL Team Sample

$25
ENG Historical Football Dataset - 2020-2021

EPL Team Sample

$25
ENG Historical Football Dataset - 2021-2022

EPL Team Sample

$25
ENG Historical Football Dataset - 2022-2023

EPL Team Sample

$25
ESP Historical Football Dataset - 2019-2020

ESP Team Sample

$25
ESP Historical Football Dataset - 2020-2021

ESP Team Sample

$25
ESP Historical Football Dataset - 2021-2022

ESP Team Sample

$25
ESP Historical Football Dataset - 2022-2023

ESP Team Sample

$25
FRA Historical Football Dataset - 2019-2020

FRA Team Sample

$25
FRA Historical Football Dataset - 2020-2021

FRA Team Sample

$25
FRA Historical Football Dataset - 2021-2022

FRA Team Sample

$25
FRA Historical Football Dataset - 2022-2023

FRA Team Sample

$25
GER Historical Football Dataset - 2019-2020

GER Team Sample

$25
GER Historical Football Dataset - 2020-2021

GER Team Sample

$25
GER Historical Football Dataset - 2021-2022

GER Team Sample

$25
GER Historical Football Dataset - 2022-2023

GER Team Sample

$25
ITA Historical Football Dataset - 2019-2020

ITA Team Sample

$25
ITA Historical Football Dataset - 2020-2021

ITA Team Sample

$25
ITA Historical Football Dataset - 2021-2022

ITA Team Sample

$25
ITA Historical Football Dataset - 2022-2023

ITA Team Sample

$25

Football / Soccer Data in Excel Spreadsheets

Which leagues are covered in the football/soccer datasets at BigDataBall?

Our dataset currently focuses on the “Big 5” European leagues: the English Premier League, German Bundesliga, Italian Serie A, French Ligue 1, and Spanish La Liga. It includes detailed team-level match statistics exclusively for league matches in these top-tier leagues.

2. What type of data is included in the dataset?

Our datasets include all main statistics for each football match in the covered leagues, starting from the 2019-2020 season. This encompasses offense and defense stats; goals scored, total number of shots, expected goals (xG), tackles, interceptions, cards and many more. Additionally, it features betting-related information such as moneyline odds, over/unders, and Asian handicap, alongside Elo ratings for each team. Finally, it has match-specific information such as team lineups, referee and venue.

3. How are the betting odds in the dataset calculated?

The odds indicated in our dataset represent the closing odds and are calculated as the average of all odds from top bookmakers.

4. What is the Elo rating mentioned in the dataset?

The Elo rating in our dataset represents the team’s Elo rating at the start of each match. It’s a widely recognized metric used to measure the strength of a team based on their past game results, valuable for understanding team performance trends and extensively used in sports analytics and betting.

5. What do “Expected Goals (xG)” and “Expected Assists (x) mean in the dataset?

Expected Goals (xG) is a statistical measure used to assess the quality of scoring opportunities. It assigns a probability to each goal-scoring chance, indicating how likely it is that the chance would be scored. A high xG value suggests a high likelihood of scoring. This metric helps understand how many goals a team or player should have scored on average, given the quality and quantity of the shots taken.
Expected Assists (xA) measures the likelihood that a given pass will become an assist. It considers factors such as the type of pass, the location from where it was made, and the subsequent actions of the receiver. xA provides insight into a player’s playmaking abilities, indicating their effectiveness in creating goal-scoring opportunities for teammates.

6. What can you do with the BigDataBall’s Football Datasets?

Predict match outcomes or scores:

* Train machine learning models like random forests, neural networks using features like shots, possession, xG, past results to predict match outcome (win/lose/draw)
* Tune models using training and test sets to identify best parameters and features
* Evaluate model accuracy on unseen data to test predictive ability
* Extend models to predict exact scorelines using similar features

Analyze team strengths/weaknesses:

* Aggregate stats like shots, tackles, possession over a season by team
* Compare averages vs opponents to identify strengths/weaknesses
* See if teams excel at shooting, keeping possession, set pieces etc
* Identify areas for improvement based on weaker metrics

Evaluate player performance:

* Track stats like goals, assists, pass %, tackles for each player
* Rank players within position or across league based on contribution
* Compare players to teammates in similar positions to quantify impact
* Build player ratings/indexes based on key stats that define good performance

Simulate seasons:

* Use Elo or other team ratings to represent team strengths
* Simulate matches between teams based on ratings and historical match data
* Update ratings after each simulated match
* Run 100s of simulations and track final table, points, and milestones

Study betting odds:

* Gather odds data, match stats, and actual results
* Analyze market movements compared to team metrics
* Build models to predict odds or find mispriced bets
* Backtest models historcially to evaluate profitability

Gain tactical insights:

* Look at lineup data like formations, positions played over time
* Identify patterns in setups, roles, and relationships
* See how tactics have evolved in terms of style, shape, personnel
* Relate to match stats to quantify tactical impact

Analyze referee performance:

* Track referee assignments and match stats like cards, penalties
* Compare rates of disciplinary actions to averages
* Assess if certain refs have biases or affect outcomes
* Relate to team playstyles to see if mismatch affects decisions