NBA Play-by-Play FAQ - BigDataBall

We introduced the new format of our NBA play-by-play dataset a few years ago, but we still use the old one. We should treat the new one as more of a backup and feel comfortable keeping the old one as long as raw data is available.

CHANGES IN THE NEW NBA Play-by-Play FORMAT

(1) SUBSTITUTIONS
In the new format, entered and left players are shown in two separate rows for the event of substitutions. In the old format, we used to show both players in the same rows.

(2) PLAYER IDs
Player column now lists the player ID and player name together, like 1626167/Myles Turner

(3) NEW FIELDS
We have enriched the play-by-play data by adding new fields at the end of the columns that familiarize you with the old ones. This way, the new dataset remains compatible with the old format. The newly added fields are:

home_team
away_team
team_possession
time_actual
qualifiers1
qualifiers2
qualifiers3
qualifiers4
area
area_detail
official_id

These new fields provide more detailed context and information about each play, enhancing your analysis capabilities without disrupting workflows.

Field	Description	Purpose	Example Value
`home_team`	The name or abbreviation of the home team	Identifies the home team in the game	`BOS` (Boston Celtics)
`away_team`	The name or abbreviation of the away team	Identifies the away team in the game	`IND` (Indiana Pacers)
`team_possession`	Indicates which team has possession of the ball	Provides information on current ball possession	`BOS` (Boston Celtics)
`time_actual`	The exact timestamp of the event in ISO 8601 format	Offers precise timing for the event	`2024-05-26T00:40:56.0Z`
`qualifiers1`	Additional qualifier for the event	Provides extra context or specific details about the play	`pointsinthepaint`
`qualifiers2`	Another qualifier for the event	Adds further detail about the play	`left`
`qualifiers3`	An additional qualifier for the event	Offers more granularity about the event	`0-8 Center`
`qualifiers4`	Yet another qualifier for the event	Provides comprehensive details about the play	`Above the Break 3`
`area`	The general area of the court where the event took place	Indicates the broad location of the court	`right`
`area_detail`	More specific area details within the `area`	Provides detailed spatial information	`24+ Right Center`
`official_id`	Identifier for the official (referee) involved in the event	Tracks which official was present or made a call during the play

(4) DEPRECATED FIELDS
converted_x and the converted_y are no longer needed since the new original_x and original_y range from 0 to 100 and are easy to interpret. Here’s a sample of how new coordinates spatially translate to the court.

Frequently Asked Questions for Play-by-Play Dataset

What does the play-by-play data include?

Each season’s NBA playbyplay dataset comes up with 2 types of files: 1) Individual CSV files for all games played in the regular season and the playoffs. 2) A season-to-date CSV file where all CSV game files are combined. This file allows you to analyze the whole season’s stats in one sheet. In brief, our database-friendly (each play presented in a row) log includes every in-game movement such as: “Active players on the court”, “event time (remaining/elapsing)”, “play length & id”, “activity type (substitution/shot/free throw/turnover/foul committed & drawn/rebound/assist/jump ball etc.),” “shot location” and “shot coordinates.”

Download the sample dataset and open the Excel file where descriptions for all columns have already been provided. Remember that those descriptions do not appear on the season game logs, so we recommend you keep the sample file easily accessible until you get familiar with the play-by-play fields.

What size is a season of play-by-play dataset?

How is an individual game log being named?

How are the five-man lineups determined in the play-by-play logs?

We have developed a proprietary algorithm in which substitutions and the relevant game events (points, fouls, getting fouled, assists, steals etc.) that are assigned to players are considered. Note that, despite being a very rare situation, if the player does not record anything or does not take part in any game event while he’s on the court, the chances are our algorithm might not be 100% accurate.

Back to Help Center

CHANGES IN THE NEW NBA Play-by-Play FORMAT

Frequently Asked Questions for Play-by-Play Dataset

Add a Comment: Cancel reply