Problem
You have a dataset containing movie titles, release years, and user ratings (1-5 stars). You want to analyze the data to calculate average ratings per movie and identify movies with consistently high or low ratings.
Data Structure:
- Tuple: Each data point can be stored as a tuple containing the movie title (string), release year (integer), and average rating (float). This structure ensures data integrity during calculations.
Algorithm:
- Iterate and accumulate: Loop through the dataset (list of tuples) and accumulate the total rating and number of ratings for each unique movie title encountered.
Implementation
Movies data (a list of tuples)
movie_data = [
("The Shawshank Redemption", 1994, 4.8),
("The Godfather", 1972, 4.9),
("The Dark Knight", 2008, 4.8),
("The Godfather: Part II", 1974, 4.7),
("Pulp Fiction", 1994, 4.6),
("Parasite", 2019, 4.3),
("Taxi Driver", 2021, 4.1),
("Oppenheimer", 2023, 4.2),
("Taxi Driver", 2021, 4.3), # Duplicate entry for testing
("Taxi Driver", 2021, 4.0), # Duplicate entry for testing
("The Godfather", 1972, 4.8),# Duplicate entry for testing
("Oppenheimer", 2023, 4.3), # Duplicate entry for testing
("Oppenheimer", 2023, 4.7), # Duplicate entry for testing
("Oppenheimer", 2023, 4.8), # Duplicate entry for testing
("The Shawshank Redemption", 1994, 5.0), # Duplicate entry for testing
]
Analyze movie ratting
def analyze_movie_ratings(data):
ratings_by_movie = {} # Dictionary to store movie titles and their rating statistics
for movie_title, release_year, rating in data:
if movie_title not in ratings_by_movie:
ratings_by_movie[movie_title] = {"total_rating": 0, "num_ratings": 0}
ratings_by_movie[movie_title]["total_rating"] += rating
ratings_by_movie[movie_title]["num_ratings"] += 1
for movie_title, stats in ratings_by_movie.items():
if stats["num_ratings"] > 0: # Avoid division by zero
stats["average_rating"] = stats["total_rating"] / stats["num_ratings"]
return ratings_by_movie
Analyze and print the result
movie_ratings = analyze_movie_ratings(movie_data)
for movie_title, stats in movie_ratings.items():
print(f"{movie_title}: Average Rating - {stats['average_rating']:.2f} ({stats['num_ratings']} ratings)")
The result
Benefits of using Tuples:
- Immutability: Tuples ensure the data (movie title, release year) remains unchanged during analysis, preventing accidental modifications that could affect calculations.
- Readability: Tuples clearly represent fixed data points within each entry.
- Memory Efficiency: Tuples are lightweight compared to lists, as they don’t require additional overhead for mutability.