Data Science Exam  >  Data Science Notes  >  Data Structures-new

Data Structures-new

Data Structures

By the end of this lesson, you'll be able to build a music playlist manager that organizes songs, removes duplicates, counts plays, and finds your top tracks using Python's powerful data structures.

What You'll Build Today

You're going to create a real music streaming app backend that manages playlists, tracks song statistics, and finds patterns in listening habits. This is exactly how Spotify and Apple Music organize millions of songs! You'll use lists for playlists, sets to remove duplicate songs, tuples for song details, and dictionaries to track how many times each song has been played.

Python - Data Structures

# Final music streaming app playlist = ["Blinding Lights", "Shape of You", "Levitating"] song_details = ("Blinding Lights", "The Weeknd", 2019) play_count = {"Blinding Lights": 47, "Shape of You": 32, "Levitating": 61} unique_artists = {"The Weeknd", "Ed Sheeran", "Dua Lipa"} top_song = max(play_count, key=play_count.get) # find most played song print(f"Your top track: {top_song} with {play_count[top_song]} plays!") print(f"Total unique artists: {len(unique_artists)}") # Output: Your top track: Levitating with 61 plays! # Output: Total unique artists: 3

1. Lists - Your First Collection

A list is like a playlist on your phone - it keeps things in order, can have duplicates, and you can add or remove items anytime. Lists use square brackets and are the most flexible data structure in Python. Every social media feed, game inventory, and shopping cart uses lists behind the scenes.

Python - Data Structures

# Creating a game inventory list backpack = ["sword", "health potion", "shield", "health potion"] backpack.append("magic scroll") # add item to the end backpack.remove("health potion") # remove first matching item first_item = backpack[0] # access by position (indexing starts at 0) print(f"Inventory: {backpack}") print(f"First item equipped: {first_item}") # Output: Inventory: ['sword', 'shield', 'health potion', 'magic scroll'] # Output: First item equipped: sword

Notice how lists keep items in the exact order you added them, and you can have duplicates like two health potions. The number in square brackets is the index - Python always starts counting from 0, not 1.

Think of a list like a numbered row of lockers - each locker has a number (index) and can hold anything inside, even duplicates of the same item.

Accessing List Items

You can grab items from anywhere in a list using their position number (index), or even count backwards from the end using negative numbers.

Python - Data Structures

# Accessing songs in different ways favorite_songs = ["Bohemian Rhapsody", "Stairway to Heaven", "Hotel California", "Imagine"] first_song = favorite_songs[0] # get the first song last_song = favorite_songs[-1] # negative index counts from the end middle_songs = favorite_songs[1:3] # slice from index 1 up to (not including) 3 print(f"Opening track: {first_song}") print(f"Closing track: {last_song}") print(f"Middle section: {middle_songs}") # Output: Opening track: Bohemian Rhapsody # Output: Closing track: Imagine # Output: Middle section: ['Stairway to Heaven', 'Hotel California']

The slice notation [1:3] is like saying "give me items starting at position 1, up to but not including position 3". It's one of Python's most powerful features for grabbing chunks of data.

Remember: lists[0] is the first item, lists[-1] is the last item, and lists[start:end] grabs a range but doesn't include the end position.

Modifying Lists

Lists are mutable, meaning you can change them after creating them - add items, remove items, or swap items around completely.

Python - Data Structures

# Managing a pizza order system pizza_orders = ["Margherita", "Pepperoni", "Hawaiian"] pizza_orders.insert(1, "BBQ Chicken") # add at specific position pizza_orders[2] = "Veggie Supreme" # replace item at index 2 cancelled_order = pizza_orders.pop() # remove and return last item total_orders = len(pizza_orders) # count how many items print(f"Current orders: {pizza_orders}") print(f"Cancelled: {cancelled_order}") print(f"Total orders: {total_orders}") # Output: Current orders: ['Margherita', 'BBQ Chicken', 'Veggie Supreme'] # Output: Cancelled: Hawaiian # Output: Total orders: 3

The insert() method lets you squeeze items into any position, while pop() removes and gives you back the last item (or any position if you specify). The len() function counts how many items are in any collection.

Pro tip: use append() when adding to the end, insert() when you need a specific position, and pop() when you want to remove and use that item.

List Methods You'll Use Every Day

Lists come with built-in superpowers called methods - functions that belong to lists and help you sort, reverse, count, and organize data instantly.

Python - Data Structures

# Analyzing quiz scores quiz_scores = [78, 92, 85, 78, 90, 88, 78] quiz_scores.sort() # arrange from lowest to highest quiz_scores.reverse() # flip the order to highest first times_got_78 = quiz_scores.count(78) # count occurrences position_of_92 = quiz_scores.index(92) # find where 92 is located print(f"Scores high to low: {quiz_scores}") print(f"Got 78 points: {times_got_78} times") print(f"Score 92 is at position: {position_of_92}") # Output: Scores high to low: [92, 90, 88, 85, 78, 78, 78] # Output: Got 78 points: 3 times # Output: Score 92 is at position: 0

Notice how sort() changes the list permanently, while count() and index() just look at the data without changing it. These methods are incredibly useful when analyzing datasets.

You just learned the core list operations data scientists use every single day! Lists are the foundation of data analysis in Python.

2. Tuples - Collections That Never Change

A tuple is like a sealed envelope - once you create it, you can't change what's inside. Tuples use parentheses instead of square brackets and are perfect for data that should never be modified, like GPS coordinates, dates of birth, or product barcodes. They're faster than lists and protect your data from accidental changes.

Python - Data Structures

# Storing permanent game character stats player_character = ("Shadowblade", 100, 45, "Warrior") # name, health, attack, class character_name = player_character[0] # access by index just like lists health_points = player_character[1] attack_power = player_character[2] print(f"Character: {character_name}") print(f"HP: {health_points}, ATK: {attack_power}") print(f"Total stats: {len(player_character)}") # Output: Character: Shadowblade # Output: HP: 100, ATK: 45 # Output: Total stats: 4

Tuples look almost identical to lists when you access them, but try to change a value and Python will throw an error - that's the point! This immutability prevents bugs where data gets accidentally modified.

Use tuples for data that represents a single "thing" with multiple properties - like a song with (title, artist, duration) or a date with (year, month, day).

Tuple Unpacking

Unpacking is a super elegant Python trick that lets you split a tuple into separate variables in one line - it makes your code clean and readable.

Python - Data Structures

# Unpacking song information song_info = ("Bohemian Rhapsody", "Queen", 354) # title, artist, duration in seconds title, artist, duration = song_info # split tuple into 3 variables at once minutes = duration // 60 # convert seconds to minutes seconds = duration % 60 # get remaining seconds print(f"Now playing: {title} by {artist}") print(f"Duration: {minutes}m {seconds}s") # Output: Now playing: Bohemian Rhapsody by Queen # Output: Duration: 5m 54s

This unpacking technique is incredibly common in data science when functions return multiple values at once. It saves you from writing song_info[0], song_info[1], song_info[2] everywhere.

Tuple unpacking is like opening a gift box and instantly sorting each item into its own labeled container - fast and elegant!

When to Use Tuples vs Lists

The rule is simple: use tuples for fixed data that shouldn't change, and lists for collections that will grow, shrink, or be modified.

Python - Data Structures

# Comparing tuples and lists in a weather app city_location = (51.5074, -0.1278) # latitude, longitude never change daily_temperatures = [18, 21, 19, 22, 20] # temperatures update daily daily_temperatures.append(23) # add today's temperature - works! daily_temperatures[0] = 17 # correct yesterday's reading - works! # city_location[0] = 52 # ERROR! tuples cannot be modified print(f"City coordinates: {city_location}") print(f"This week's temps: {daily_temperatures}") # Output: City coordinates: (51.5074, -0.1278) # Output: This week's temps: [18, 21, 19, 22, 20, 23]

Notice how the temperature list changes throughout the week, but GPS coordinates are permanent. This is exactly when you'd choose each data structure in real applications.

You've now mastered both flexible lists and secure tuples - two of Python's most essential data structures!

3. Sets - Collections with No Duplicates

A set is like a VIP club - each member can only appear once, no duplicates allowed, and the order doesn't matter. Sets use curly braces and automatically remove any repeated items. They're perfect for tracking unique visitors, removing duplicate emails, or finding what two groups have in common.

Python - Data Structures

# Tracking unique visitors to a website visitors_today = {"Alice", "Bob", "Charlie", "Alice", "David", "Bob"} # set automatically removes duplicate Alice and Bob visitors_today.add("Emma") # add new visitor visitors_today.remove("Charlie") # visitor left the site unique_visitor_count = len(visitors_today) print(f"Unique visitors: {visitors_today}") print(f"Total unique visitors: {unique_visitor_count}") # Output: Unique visitors: {'Alice', 'Bob', 'David', 'Emma'} # Output: Total unique visitors: 4

Sets don't remember the order you added items - they optimize for speed instead. You can't use indexing like visitors[0] because there's no "first" item in a set.

Think of sets as a bag of unique items - you can reach in and grab any item, but you can't have two identical items in the bag.

Set Operations - Finding Relationships

Sets have magical powers for comparing groups: union combines everything, intersection finds what's in both, and difference shows what's unique to one group.

Python - Data Structures

# Comparing streaming platform subscriptions netflix_users = {"Alice", "Bob", "Charlie", "David"} spotify_users = {"Charlie", "David", "Emma", "Frank"} both_platforms = netflix_users & spotify_users # intersection: who has both either_platform = netflix_users | spotify_users # union: everyone combined netflix_only = netflix_users - spotify_users # difference: Netflix exclusive print(f"Have both subscriptions: {both_platforms}") print(f"Total users: {len(either_platform)}") print(f"Only Netflix: {netflix_only}") # Output: Have both subscriptions: {'Charlie', 'David'} # Output: Total users: 6 # Output: Only Netflix: {'Alice', 'Bob'}

These set operations are incredibly fast even with millions of items - databases and search engines use them constantly. The & symbol means "and" (both), | means "or" (either), and - means "not" (difference).

Set operations are like Venn diagrams in code - you can instantly answer "who's in both groups?" or "who's unique to one group?"

Removing Duplicates from Lists

One of the most common uses of sets is cleaning up messy data by converting a list to a set and back again.

Python - Data Structures

# Cleaning up duplicate game scores all_scores = [100, 85, 92, 100, 78, 85, 100, 92, 88] unique_scores = set(all_scores) # convert to set to remove duplicates sorted_scores = sorted(unique_scores, reverse=True) # convert back and sort highest_score = sorted_scores[0] number_of_unique = len(unique_scores) print(f"Unique scores: {sorted_scores}") print(f"High score: {highest_score}") print(f"Different scores achieved: {number_of_unique}") # Output: Unique scores: [100, 92, 88, 85, 78] # Output: High score: 100 # Output: Different scores achieved: 5

This pattern - list to set to remove duplicates, then back to list for ordering - is used in data cleaning constantly. The sorted() function works on any collection and returns a list.

Amazing work! You can now clean messy data and find relationships between groups using sets - a crucial skill in data science.

4. Dictionaries - Storing Key-Value Pairs

A dictionary is like a real dictionary or contact list - you look up information using a key instead of a number. Dictionaries store key-value pairs in curly braces and are perfect for representing real-world objects, storing settings, or counting things. Every game character, user profile, and database record uses dictionaries.

Python - Data Structures

# Creating a student profile student = { "name": "Maya Rodriguez", "age": 14, "grade": 9, "favorite_subject": "Science", "gpa": 3.8 } student_name = student["name"] # look up value using key student["gpa"] = 3.9 # update existing value student["clubs"] = ["Robotics", "Chess"] # add new key-value pair print(f"Student: {student_name}, GPA: {student['gpa']}") print(f"Clubs: {student['clubs']}") # Output: Student: Maya Rodriguez, GPA: 3.9 # Output: Clubs: ['Robotics', 'Chess']

Dictionaries use meaningful keys like "name" and "age" instead of numbers, making your code self-documenting and readable. You can store any type of value - numbers, strings, lists, even other dictionaries!

Dictionaries are the most important data structure for working with real-world data - they're how Python represents everything from JSON to database rows.

Dictionary Methods

Dictionaries have powerful methods for accessing keys, values, or both together, plus safe ways to look up values that might not exist.

Python - Data Structures

# Managing a game inventory with quantities inventory = {"gold_coins": 150, "health_potions": 5, "mana_potions": 3} all_items = inventory.keys() # get all item names all_quantities = inventory.values() # get all quantities total_items = sum(inventory.values()) # add up all quantities # safe lookup that won't crash if item doesn't exist arrows = inventory.get("arrows", 0) # returns 0 if "arrows" not found print(f"Items: {list(all_items)}") print(f"Total items in inventory: {total_items}") print(f"Arrows: {arrows}") # Output: Items: ['gold_coins', 'health_potions', 'mana_potions'] # Output: Total items in inventory: 158 # Output: Arrows: 0

The get() method is safer than using square brackets because it won't crash your program if the key doesn't exist - it just returns a default value instead. This prevents errors when working with unpredictable data.

Always use .get() when you're not sure if a key exists - it's the professional way to handle dictionaries safely.

Looping Through Dictionaries

You can loop through dictionaries to process all keys, all values, or both together using the items() method.

Python - Data Structures

# Calculating total cost of shopping cart shopping_cart = {"laptop": 899, "mouse": 25, "keyboard": 75, "monitor": 299} total_cost = 0 for item_name, item_price in shopping_cart.items(): # loop through both key and value total_cost += item_price # add each price to running total print(f"{item_name}: ${item_price}") print(f"\nTotal cart value: ${total_cost}") # Output: laptop: $899 # Output: mouse: $25 # Output: keyboard: $75 # Output: monitor: $299 # Output: Total cart value: $1298

The items() method gives you both the key and value in each loop, which is perfect for processing paired data. You can also use .keys() or .values() if you only need one or the other.

You've mastered dictionaries - the data structure that makes Python perfect for handling real-world information!

5. Iteration Techniques - Smart Ways to Loop

Python gives you supercharged looping tools that make processing data fast, elegant, and readable. These techniques are what separate beginners from professionals - they let you transform, filter, and combine collections in single lines of code. Every data scientist uses these patterns daily.

Python - Data Structures

# Processing a list of temperatures from Celsius to Fahrenheit temperatures_celsius = [0, 10, 20, 30, 40] # list comprehension: create new list by transforming each item temperatures_fahrenheit = [(temp * 9/5) + 32 for temp in temperatures_celsius] # enumerate gives you both index and value for position, temp_f in enumerate(temperatures_fahrenheit): print(f"Reading {position + 1}: {temp_f}°F") # Output: Reading 1: 32.0°F # Output: Reading 2: 50.0°F # Output: Reading 3: 68.0°F # Output: Reading 4: 86.0°F # Output: Reading 5: 104.0°F

List comprehensions are Python's way of saying "create a new list by doing something to each item in an old list" - all in one line. The enumerate() function is brilliant for when you need both the position number and the value.

List comprehensions are like magic - they turn 5 lines of loop code into 1 elegant line that's faster and easier to read.

Filtering with Comprehensions

You can add conditions to comprehensions to filter out items you don't want, creating powerful one-line data filters.

Python - Data Structures

# Finding high-scoring students test_scores = [45, 78, 92, 65, 88, 55, 90, 72] # comprehension with condition: only keep scores 80 and above passing_grades = [score for score in test_scores if score >= 80] # count how many students passed pass_count = len(passing_grades) pass_rate = (pass_count / len(test_scores)) * 100 print(f"High scores (80+): {passing_grades}") print(f"Pass rate: {pass_rate:.1f}%") # Output: High scores (80+): [92, 88, 90] # Output: Pass rate: 37.5%

The if condition at the end of the comprehension acts as a filter - only items that pass the test make it into the new list. This is incredibly common for cleaning datasets.

Filtering with comprehensions is how data scientists clean millions of records - it's fast, readable, and powerful.

The Zip Function

The zip() function pairs up items from multiple lists, which is perfect for combining related data from different sources.

Python - Data Structures

# Combining parallel lists of game data player_names = ["Phoenix", "Shadow", "Blaze", "Storm"] player_scores = [1500, 2100, 1800, 1950] player_levels = [12, 18, 14, 16] # zip combines three lists into tuples for name, score, level in zip(player_names, player_scores, player_levels): print(f"{name} (Lvl {level}): {score} points") # Output: Phoenix (Lvl 12): 1500 points # Output: Shadow (Lvl 18): 2100 points # Output: Blaze (Lvl 14): 1800 points # Output: Storm (Lvl 16): 1950 points

Zip is like a zipper on a jacket - it pairs up corresponding items from multiple sequences. This is essential when working with datasets where related information is stored in separate columns.

Zip is your best friend when different pieces of information about the same thing are stored in separate lists!

The Range Function

The range() function generates sequences of numbers, which is essential for counting, indexing, and creating numerical patterns.

Python - Data Structures

# Creating a countdown timer for seconds_left in range(10, 0, -1): # start, stop, step print(f"Launching in {seconds_left} seconds...") print("Blast off!") # generate list of even numbers even_numbers = list(range(0, 20, 2)) # start at 0, stop before 20, step by 2 print(f"Even numbers: {even_numbers}") # Output: Launching in 10 seconds... # Output: Launching in 9 seconds... # Output: ... (counts down to 1) # Output: Blast off! # Output: Even numbers: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Range takes three arguments: start, stop (exclusive), and step. A negative step counts backwards, and you can convert range() to a list to see all the numbers it generates.

Range is how you create numbered sequences without typing them all out - essential for loops and generating data patterns!

6. Functional Tools - Map, Filter, and Lambda

Functional programming tools let you apply operations to entire collections at once without writing explicit loops. Map transforms every item, filter selects items matching a condition, and lambda creates tiny throwaway functions. These tools make data processing code incredibly compact and expressive.

Python - Data Structures

# Processing a batch of quiz scores raw_scores = [85, 92, 78, 88, 95, 73, 89] # lambda creates a mini function: add 5 bonus points to each score add_bonus = lambda score: score + 5 adjusted_scores = list(map(add_bonus, raw_scores)) # filter keeps only scores that pass the test high_scores = list(filter(lambda score: score >= 90, adjusted_scores)) print(f"After bonus: {adjusted_scores}") print(f"Excellent scores (90+): {high_scores}") # Output: After bonus: [90, 97, 83, 93, 100, 78, 94] # Output: Excellent scores (90+): [90, 97, 93, 100, 94]

Lambda functions are anonymous (unnamed) mini functions perfect for simple operations you only need once. Map applies a function to every item, and filter keeps only items where the function returns True.

Lambda, map, and filter are the professional's toolkit for processing data efficiently - once you master them, you'll use them everywhere!

Lambda Functions Explained

Lambda functions are one-line mini functions perfect for simple operations - they save you from writing full function definitions for tiny tasks.

Python - Data Structures

# Different ways to create functions # Traditional function def calculate_area(radius): return 3.14159 * radius ** 2 # Lambda version: same function in one line calculate_area_lambda = lambda radius: 3.14159 * radius ** 2 circle_radii = [3, 5, 7, 10] circle_areas = list(map(calculate_area_lambda, circle_radii)) print(f"Circle areas: {circle_areas}") # Output: Circle areas: [28.27431, 78.53975, 153.93791, 314.159]

Lambda syntax is: lambda arguments: expression. Whatever comes after the colon is automatically returned. Use lambdas for simple operations inside map, filter, or sorted; use regular functions for anything complex.

Think of lambda as a quick sticky note function - perfect for small tasks you'll only use once and don't need to name.

Sorting with Key Functions

The sorted() function becomes incredibly powerful when you tell it exactly how to compare items using a key function.

Python - Data Structures

# Sorting students by different criteria students = [ {"name": "Alex", "grade": 85, "age": 14}, {"name": "Jordan", "grade": 92, "age": 13}, {"name": "Taylor", "grade": 88, "age": 15} ] # sort by grade (highest first) by_grade = sorted(students, key=lambda student: student["grade"], reverse=True) top_student = by_grade[0] print(f"Top student: {top_student['name']} with {top_student['grade']}%") print(f"Rankings: {[s['name'] for s in by_grade]}") # Output: Top student: Jordan with 92% # Output: Rankings: ['Jordan', 'Taylor', 'Alex']

The key parameter tells sorted() what value to use for comparison. Here, we're extracting the "grade" from each dictionary to determine sort order. This pattern works for sorting any complex data.

Sorting with key functions is how you organize complex real-world data - essential for rankings, leaderboards, and data analysis!

7. Working with Collections for Data Science

Real data science work involves combining multiple data structures together, transforming messy data into clean insights, and answering questions by processing collections efficiently. These patterns are what you'll use in actual data analysis projects to find trends, calculate statistics, and discover patterns.

Python - Data Structures

# Analyzing streaming service watch history watch_history = [ {"title": "Stranger Things", "genre": "Sci-Fi", "minutes": 485}, {"title": "The Crown", "genre": "Drama", "minutes": 320}, {"title": "Black Mirror", "genre": "Sci-Fi", "minutes": 180}, {"title": "Breaking Bad", "genre": "Drama", "minutes": 540} ] # calculate total watch time per genre genre_totals = {} for show in watch_history: genre = show["genre"] genre_totals[genre] = genre_totals.get(genre, 0) + show["minutes"] print(f"Watch time by genre: {genre_totals}") # Output: Watch time by genre: {'Sci-Fi': 665, 'Drama': 860}

This pattern - accumulating totals in a dictionary - is one of the most common data analysis operations. You start with an empty dictionary and build it up as you process each record.

Grouping and aggregating data like this is the foundation of data science - you're doing real analytics work!

Counting Frequencies

Counting how often items appear is a fundamental data analysis task - Python makes it elegant with the get() method pattern.

Python - Data Structures

# Analyzing word frequency in song lyrics lyrics = ["love", "baby", "love", "yeah", "baby", "love", "tonight", "baby"] word_counts = {} for word in lyrics: word_counts[word] = word_counts.get(word, 0) + 1 # increment count # find most common word most_common = max(word_counts, key=word_counts.get) most_common_count = word_counts[most_common] print(f"Word frequencies: {word_counts}") print(f"Most repeated: '{most_common}' appears {most_common_count} times") # Output: Word frequencies: {'love': 3, 'baby': 3, 'yeah': 1, 'tonight': 1} # Output: Most repeated: 'love' appears 3 times

This frequency counting pattern is used everywhere - analyzing survey responses, counting website clicks, finding trending hashtags. The get(word, 0) means "give me the current count, or 0 if this is the first time seeing this word".

Congratulations! Counting frequencies is one of the most valuable skills in data analysis - you'll use this pattern constantly!

Nested Data Structures

Real-world data often has structures inside structures - lists of dictionaries, dictionaries containing lists, or even deeper nesting.

Python - Data Structures

# Managing a music streaming service database users_playlists = { "user_001": ["Levitating", "Blinding Lights", "Peaches"], "user_002": ["Levitating", "Good 4 U", "Montero"], "user_003": ["Blinding Lights", "Levitating", "Peaches"] } # find which songs appear in multiple playlists all_songs = [] for playlist in users_playlists.values(): all_songs.extend(playlist) # combine all playlists into one list song_popularity = {} for song in all_songs: song_popularity[song] = song_popularity.get(song, 0) + 1 print(f"Song popularity: {song_popularity}") # Output: Song popularity: {'Levitating': 3, 'Blinding Lights': 2, 'Peaches': 2, 'Good 4 U': 1, 'Montero': 1}

Nested structures let you represent complex relationships - users have playlists, playlists contain songs, songs have properties. Processing them requires looping through the outer structure, then processing each inner structure.

You're now working with real database-style data structures - this is exactly how professional applications store and process information!

Mini-Project: Music Streaming Analytics Dashboard

You're going to build a complete music analytics system that tracks songs, counts plays, finds top artists, and manages playlists - just like Spotify's backend! This project combines every data structure you've learned to create a real working application.

Step 1 - Create the Song Database

Build a dictionary of songs where each song has detailed information stored as a nested dictionary.

Python - Data Structures

# Song database with nested dictionaries music_database = { "song_001": {"title": "Blinding Lights", "artist": "The Weeknd", "plays": 47, "duration": 200}, "song_002": {"title": "Shape of You", "artist": "Ed Sheeran", "plays": 32, "duration": 234}, "song_003": {"title": "Levitating", "artist": "Dua Lipa", "plays": 61, "duration": 203}, "song_004": {"title": "Starboy", "artist": "The Weeknd", "plays": 28, "duration": 230} } total_songs = len(music_database) print(f"Music Database: {total_songs} songs loaded") print(f"Sample: {music_database['song_001']['title']} by {music_database['song_001']['artist']}") # Output: Music Database: 4 songs loaded # Output: Sample: Blinding Lights by The Weeknd

Step 2 - Calculate Total Plays and Find Top Song

Loop through the database to sum all plays and identify the most popular song using max().

Python - Data Structures

# Song database (same as step 1) music_database = { "song_001": {"title": "Blinding Lights", "artist": "The Weeknd", "plays": 47, "duration": 200}, "song_002": {"title": "Shape of You", "artist": "Ed Sheeran", "plays": 32, "duration": 234}, "song_003": {"title": "Levitating", "artist": "Dua Lipa", "plays": 61, "duration": 203}, "song_004": {"title": "Starboy", "artist": "The Weeknd", "plays": 28, "duration": 230} } total_plays = sum([song["plays"] for song in music_database.values()]) top_song_id = max(music_database, key=lambda id: music_database[id]["plays"]) top_song = music_database[top_song_id] print(f"Total streams: {total_plays}") print(f"Top track: {top_song['title']} with {top_song['plays']} plays") # Output: Total streams: 168 # Output: Top track: Levitating with 61 plays

Step 3 - Count Plays by Artist

Group songs by artist and calculate total plays per artist using dictionary accumulation.

Python - Data Structures

# Song database (same as step 1) music_database = { "song_001": {"title": "Blinding Lights", "artist": "The Weeknd", "plays": 47, "duration": 200}, "song_002": {"title": "Shape of You", "artist": "Ed Sheeran", "plays": 32, "duration": 234}, "song_003": {"title": "Levitating", "artist": "Dua Lipa", "plays": 61, "duration": 203}, "song_004": {"title": "Starboy", "artist": "The Weeknd", "plays": 28, "duration": 230} } artist_plays = {} for song in music_database.values(): artist = song["artist"] artist_plays[artist] = artist_plays.get(artist, 0) + song["plays"] print(f"Plays by artist: {artist_plays}") top_artist = max(artist_plays, key=artist_plays.get) print(f"Top artist: {top_artist} with {artist_plays[top_artist]} total plays") # Output: Plays by artist: {'The Weeknd': 75, 'Ed Sheeran': 32, 'Dua Lipa': 61} # Output: Top artist: The Weeknd with 75 total plays

Step 4 - Create User Playlists and Find Popular Songs

Build user playlists as lists, find unique songs across all users, and identify which songs appear in multiple playlists.

Python - Data Structures

# Complete music analytics system music_database = { "song_001": {"title": "Blinding Lights", "artist": "The Weeknd", "plays": 47}, "song_002": {"title": "Shape of You", "artist": "Ed Sheeran", "plays": 32}, "song_003": {"title": "Levitating", "artist": "Dua Lipa", "plays": 61}, "song_004": {"title": "Starboy", "artist": "The Weeknd", "plays": 28} } user_playlists = { "Alex": ["song_001", "song_003", "song_004"], "Jordan": ["song_001", "song_002"], "Taylor": ["song_003", "song_001"] } all_playlist_songs = [] for playlist in user_playlists.values(): all_playlist_songs.extend(playlist) unique_songs = set(all_playlist_songs) song_frequency = {song: all_playlist_songs.count(song) for song in unique_songs} most_added = max(song_frequency, key=song_frequency.get) print(f"Most added to playlists: {music_database[most_added]['title']}") print(f"Added {song_frequency[most_added]} times") # Output: Most added to playlists: Blinding Lights # Output: Added 3 times

Incredible work! You just built a real analytics dashboard that combines dictionaries, lists, sets, loops, and comprehensions - this is professional-level data science code!

Quick Reference - Data Structures

Python - Data Structures

# ─── LISTS ─── playlist = ["song1", "song2", "song3"] playlist.append("song4") # add to end playlist.insert(1, "new") # add at position playlist.remove("song2") # remove first match playlist.pop() # remove and return last item playlist[0] # access by index playlist[1:3] # slice from 1 to 3 (not including 3) len(playlist) # count items playlist.sort() # sort in place sorted(playlist) # return new sorted list # ─── TUPLES ─── coordinates = (51.5, -0.1) # immutable, uses () lat, lon = coordinates # unpack into variables coordinates[0] # access by index (but cannot modify) # ─── SETS ─── unique_users = {"Alice", "Bob", "Charlie"} unique_users.add("David") # add item unique_users.remove("Bob") # remove item set1 & set2 # intersection (items in both) set1 | set2 # union (all items combined) set1 - set2 # difference (items only in set1) # ─── DICTIONARIES ─── student = {"name": "Alex", "age": 14, "grade": 9} student["name"] # access value by key student["gpa"] = 3.8 # add or update key-value pair student.get("clubs", []) # safe lookup with default student.keys() # get all keys student.values() # get all values student.items() # get key-value pairs # ─── ITERATION ─── [x * 2 for x in numbers] # list comprehension [x for x in numbers if x > 10] # comprehension with filter for i, item in enumerate(my_list): # loop with index for a, b in zip(list1, list2): # pair up two lists for key, value in my_dict.items(): # loop through dictionary # ─── FUNCTIONAL TOOLS ─── lambda x: x * 2 # anonymous function list(map(function, collection)) # apply function to all items list(filter(function, collection)) # keep items where function returns True sorted(collection, key=lambda x: x["score"]) # sort by custom criteria

Try It Yourself

Exercise 1: Recall - Create a Game Inventory

Create a dictionary representing a game character's inventory with at least 4 items and their quantities. Then add a new item and update the quantity of an existing item.

Python - Data Structures

# Create your inventory dictionary here inventory = { "gold_coins": ___, "health_potions": ___, ___: ___, ___: ___ } # Add a new item inventory[___] = ___ # Update existing item quantity inventory["health_potions"] = ___ print(inventory)

Exercise 2: Apply - Remove Duplicate Songs

You have a list of songs that contains duplicates. Convert it to a set to remove duplicates, then convert back to a sorted list and print how many unique songs there are.

Python - Data Structures

song_list = ["Levitating", "Peaches", "Levitating", "Good 4 U", "Peaches", "Montero"] # Remove duplicates using a set unique_songs = ___(song_list) # Convert back to sorted list sorted_songs = ___(unique_songs) # Count unique songs unique_count = ___(unique_songs) print(f"Unique songs: {___}") print(f"Total unique: {___}")

Exercise 3: Analyse - Debug the Frequency Counter

This code is supposed to count how many times each word appears, but it has bugs. Find and explain the three errors.

Python - Data Structures

words = ["apple", "banana", "apple", "cherry", "banana", "apple"] word_count = [] # Bug 1: wrong data structure for word in words: word_count[word] = word_count[word] + 1 # Bug 2: missing something most_common = max(word_count) # Bug 3: wrong max usage print(f"Most common word: {most_common}")

Exercise 4: Challenge - Student Grade Analyzer

Create a program that stores student data (name, scores list) in a dictionary, calculates each student's average, finds the top student, and filters students with averages above 80.

Python - Data Structures

students = { "Alex": [85, 92, 78, 88], "Jordan": [90, 95, 87, 92], "Taylor": [76, 82, 79, 81] } # Calculate average for each student averages = {} for name, scores in students.items(): averages[name] = ___(scores) / ___(scores) # Find student with highest average top_student = ___(averages, key=___) # Filter students with average > 80 high_performers = [name for name, avg in averages.items() if ___] print(f"Averages: {averages}") print(f"Top student: {___} with average {___:.1f}") print(f"High performers: {___}")

What You Learned

  • Lists - ordered, mutable collections accessed by index using square brackets, with methods like append(), remove(), sort()
  • Tuples - immutable collections using parentheses, perfect for fixed data and unpacking values into variables
  • Sets - unordered collections with no duplicates using curly braces, with operations like union, intersection, and difference
  • Dictionaries - key-value pairs in curly braces, accessed by meaningful keys instead of numbers, using get(), keys(), values(), items()
  • List comprehensions - creating new lists by transforming or filtering existing collections in one elegant line
  • Iteration tools - enumerate() for index+value, zip() for pairing lists, range() for number sequences
  • Functional programming - map() to transform all items, filter() to select items, lambda for quick anonymous functions
  • Data analysis patterns - counting frequencies with dictionaries, grouping data, finding maximums, and working with nested structures

Next Chapter → File Handling and Data Import/Export - learn to read CSV files, write JSON data, and process real datasets from files

The document Data Structures-new is a part of Data Science category.
All you need of Data Science at this link: Data Science
Download as PDF

Top Courses for Data Science

Related Searches
past year papers, Viva Questions, Objective type Questions, Extra Questions, Semester Notes, MCQs, Data Structures-new, Summary, mock tests for examination, Free, practice quizzes, ppt, Previous Year Questions with Solutions, study material, video lectures, Exam, Important questions, shortcuts and tricks, Data Structures-new, Sample Paper, Data Structures-new, pdf ;