View the code on Github
In this project, I have used MySQL and mysql package in python to build a relational database model that stores the players, teams, stadiums, referees, staffs and matches. After building the database, we perform various queries in the database to extract data about the Fifa world cup 2022.
Creating the database
The Entity Relation model for the database is given in the following image:
Team is the central part of this model since four other entities share a relationship with team entity. Since, each team only has one head coach manager it is linked to teams with one-to-one relationship and team attribute of the head coach manager table is made as a primary key to prevent data with two head coaches in the same team. This model is designed for group matches and assumes that each team will play against its opponent only one time. Hence the combination of team_1 and team_2 is made as a primary key in order to prevent duplicate entries such as same opponents playing two matches.
The code for creating the database is given in the github page mentioned above. Example of the columns in players table is given below:
Data Insertion
After creating the table I inserted the data into the database using Beautiful Soup web scrapper where I scrape the data from the wikipedia website and store it in a pandas dataframe. This data is then converted into tuples. After making the connection to the database in GCP using the mysql package, I used the MySQL commands to insert the data into the database.
Querying the database
After the data insertion, I have performed various queries in the database using SQL commands. These can be viewed in the Github page of this project. An example query is given below:
Export data to find the youngest manager.
Using Python to execute queries
Establishing the connection to the GCP cloud
import mysql.connector
from mysql.connector import Error
def create_db_connection(host_name, user_name, user_password, db_name):
connection = None
try:
connection = mysql.connector.connect(
host=host_name,
user=user_name,
passwd=user_password,
database=db_name
)
print("MySQL Database connection successful")
except Error as err:
print(f"Error: '{err}'")
return connection
host = "35.225.150.37"
user = "root"
password = "1234"
database = "fifa_2022"
connection = create_db_connection(host, user, password,database)
Querying the database
An example of using python to execute the SQL queries is given below:
query = """
SELECT game.team_1, game.team_2, game.group_name, game.date, stadium.name AS stadium
FROM game, stadium
WHERE stadium.stadium_id = game.stadium_id
ORDER BY game.date;
"""
cursor.execute(query)
results = cursor.fetchall()
pd.set_option('max_rows', 100)
df = pd.DataFrame(results, columns = ['Team 1', 'Team 2', 'Group', 'Date', 'Stadium'])
df