Using MySQL and Python to create a database of the Fifa World Cup 2022 in Google Cloud Platform

View the code on Github

In this project, I have used MySQL and mysql package in python to build a relational database model that stores the players, teams, stadiums, referees, staffs and matches. After building the database, we perform various queries in the database to extract data about the Fifa world cup 2022.

Creating the database

The Entity Relation model for the database is given in the following image:

Team is the central part of this model since four other entities share a relationship with team entity. Since, each team only has one head coach manager it is linked to teams with one-to-one relationship and team attribute of the head coach manager table is made as a primary key to prevent data with two head coaches in the same team. This model is designed for group matches and assumes that each team will play against its opponent only one time. Hence the combination of team_1 and team_2 is made as a primary key in order to prevent duplicate entries such as same opponents playing two matches.

The code for creating the database is given in the github page mentioned above. Example of the columns in players table is given below:

Data Insertion

After creating the table I inserted the data into the database using Beautiful Soup web scrapper where I scrape the data from the wikipedia website and store it in a pandas dataframe. This data is then converted into tuples. After making the connection to the database in GCP using the mysql package, I used the MySQL commands to insert the data into the database.

Querying the database

After the data insertion, I have performed various queries in the database using SQL commands. These can be viewed in the Github page of this project. An example query is given below:

Export data to find the youngest manager.

Using Python to execute queries

Establishing the connection to the GCP cloud

import mysql.connector
from mysql.connector import Error
     

def create_db_connection(host_name, user_name, user_password, db_name):
    connection = None
    try:
        connection = mysql.connector.connect(
            host=host_name,
            user=user_name,
            passwd=user_password,
            database=db_name
        )
        print("MySQL Database connection successful")
    except Error as err:
        print(f"Error: '{err}'")

    return connection

host = "35.225.150.37"
user = "root"
password = "1234"
database = "fifa_2022"

connection = create_db_connection(host, user, password,database)

Querying the database

An example of using python to execute the SQL queries is given below:

query = """
SELECT game.team_1, game.team_2, game.group_name, game.date, stadium.name AS stadium
FROM game, stadium
WHERE stadium.stadium_id = game.stadium_id
ORDER BY game.date;
"""
cursor.execute(query)
results = cursor.fetchall()
pd.set_option('max_rows', 100)
df = pd.DataFrame(results, columns = ['Team 1', 'Team 2', 'Group', 'Date', 'Stadium'])
df