Data Structures in Action: Using Tuple for analyzing movie ratings

python 301 data structure analyzing movie rating

Problem

You have a dataset containing movie titles, release years, and user ratings (1-5 stars). You want to analyze the data to calculate average ratings per movie and identify movies with consistently high or low ratings.

Data Structure:

  • Tuple: Each data point can be stored as a tuple containing the movie title (string), release year (integer), and average rating (float). This structure ensures data integrity during calculations.

Algorithm:

  • Iterate and accumulate: Loop through the dataset (list of tuples) and accumulate the total rating and number of ratings for each unique movie title encountered.

Implementation

Movies data (a list of tuples)

movie_data = [
    ("The Shawshank Redemption", 1994, 4.8),
    ("The Godfather", 1972, 4.9),
    ("The Dark Knight", 2008, 4.8),
    ("The Godfather: Part II", 1974, 4.7),
    ("Pulp Fiction", 1994, 4.6),
    ("Parasite", 2019, 4.3),
    ("Taxi Driver", 2021, 4.1),
    ("Oppenheimer", 2023, 4.2),
    ("Taxi Driver", 2021, 4.3),  # Duplicate entry for testing
    ("Taxi Driver", 2021, 4.0),  # Duplicate entry for testing
    ("The Godfather", 1972, 4.8),# Duplicate entry for testing
    ("Oppenheimer", 2023, 4.3),  # Duplicate entry for testing
    ("Oppenheimer", 2023, 4.7),  # Duplicate entry for testing
    ("Oppenheimer", 2023, 4.8),  # Duplicate entry for testing
    ("The Shawshank Redemption", 1994, 5.0),  # Duplicate entry for testing
]

Analyze movie ratting

def analyze_movie_ratings(data):
  ratings_by_movie = {}  # Dictionary to store movie titles and their rating statistics
  for movie_title, release_year, rating in data:
    if movie_title not in ratings_by_movie:
      ratings_by_movie[movie_title] = {"total_rating": 0, "num_ratings": 0}
    ratings_by_movie[movie_title]["total_rating"] += rating
    ratings_by_movie[movie_title]["num_ratings"] += 1

  for movie_title, stats in ratings_by_movie.items():
    if stats["num_ratings"] > 0:  # Avoid division by zero
      stats["average_rating"] = stats["total_rating"] / stats["num_ratings"]

  return ratings_by_movie

Analyze and print the result

movie_ratings = analyze_movie_ratings(movie_data)
for movie_title, stats in movie_ratings.items():
  print(f"{movie_title}: Average Rating - {stats['average_rating']:.2f} ({stats['num_ratings']} ratings)")

The result

Data Structures and Algorithms: Analyzing Movie Ratings by tuples in python

Benefits of using Tuples:

  • Immutability: Tuples ensure the data (movie title, release year) remains unchanged during analysis, preventing accidental modifications that could affect calculations.
  • Readability: Tuples clearly represent fixed data points within each entry.
  • Memory Efficiency: Tuples are lightweight compared to lists, as they don’t require additional overhead for mutability.

Related posts

Leave a Comment