Table of Contents
CSE 30332 - HW3

CSE 30332 - HW3

Programming Paradigms


In this assignment you will be using Flask and SQLAlchemy to make a basic web API. Building on top of HW2 you will be using these tools to wrap your Reddit objects and expose them over the network.

Reddit


It has come to my attention that many of you do not understand how reddit is structured. I recommend you spend several minutes poking around the site. Try think about the relationship between subreddits, posts, and comments and how they are structured in the context of a UML diagram. This should be easier having finished HW2 already.

Some subreddits are classified as Not Safe For Work (NSFW), for this assignment make sure you ignore those subreddits that are tagged NSFW and only consider ones that are Safe For Work (SFW) otherwise my mom will get mad. Every subreddit that is given in our possible subreddit list has a row in the CSV file that indicates whether or not the subreddit is NSFW. You can download the CSV file from the link below.

Subreddits CSV

Command Line Arguments


Your code will no longer take command line arguments, instead 3rd parties will be able to send POST requests to set the parameters. This is what reddit_setter.py will be used for.

Modifying your classes

reddit_classes.py


In order to use our classes with a database and SQLAlchemy we need to make some slight modifications to them. Several attributes in the Subreddit Object make sense to now represent as columns in a database. Specifically the URL attribute.

Additionally, you will implement a new class: Settings. This class will hold all of the settings for the program that a user may want to set. Instead of having a single regex, number, and attr your new Settings object should have subreddit, title, and comment specific regexs, numbers, and attrs.

Now instead of instantiating a Subreddit and Post object with all of these different attributes, you will pass a single Settings object to the object you're creating.

Furthermore, you will need to update your display functions to return a single string that will then be passed on the the 3rd party's client (usually a web browser). To add a newline to your string, instead of using the newline character use an HTML tag, as the final string will be viewed by a web browser and thus can contain HTML.

Preparing the database

db_manager.py


Ideally you create your database a single time, and then every time afterwards you use the already instantiated database. Therefore we will have a helper function named init_db() that will create the database for us and fill it with our starting data (Subreddit objects using the subreddits found in the CSV file linked above).

To create your database properly, run your db_init function through the python3 interactive prompt on the command line.

Setting up the API

reddit_api.py


Your API will have 4 endpoints:

  • The landing page will display the subreddit links.
  • The second endpoint will allow the user to add a single number to the URL to see the post titles for the subreddit associated with the number they added to the URL.
  • The third endpoint will allow the user to add another number to the URL in addition to the first. This second number will be associated with a post title from the subreddit associated with the first number. This endpoint will display the comments for the post.
  • Finally, the fourth endpoint will be one accepting a POST request that will allow the requester to change attributes of the Settings object during runtime.
  • Changing settings

    reddit_setter.py


    Using reddit_setter.py you can change the items in the Settings object during runtime. To do so you can make a POST request to one of your endpoints with a JSON dict giving the items you wish to change in the Settings object.

    Running a Flask server


    To run a flask server you first need to set the environmental variable FLASK_APP for your shell. After doing so you can run your server using the command:
    python3 -m flask run -p [PORT]
    where [PORT] is a number between 9000 and 9999.

    Code Overview and Scaffold


    To help you get some sense of the approximate length of each part of the code, I have included the number of lines used in each section in my solution. I will of course probably implement my code slightly differently than you may. Therefore you should take these as a relative reference only, it is completely reasonable that your code may be several lines shorter or longer. You should only start to worry if yours is significantly different than what I have.
    reddit_api.py
                            
    from flask import Flask ,request
    from flask_sqlalchemy import SQLAlchemy
    from db_manager import db_session
    from reddit_classes import Settings, Subreddit, Post
    
    app = Flask(__name__)
    
    setr = None
    subs = None
    
    def check_globals() -> None: # 8 LOC
        global setr
        global subs
    
        # create the settings object if it doesn't exist
    
        # get the subreddit objects from the database and add the settings object
        # if the subreddit objects don't already exist
    
    @app.route('/')
    def display_subreddits() -> str: # 6 LOC
        check_globals()
    
        # return a string containing all of the subreddit links
    
    @app.route('/<int:sub_id>/')
    def display_post_titles(sub_id: int) -> str: # 2 LOC
        check_globals()
    
        # return a string containing all of the titles for the subreddit specified
    
    
    @app.route('/<int:sub_id>/<int:post_id>/')
    def display_post_comments(sub_id: int, post_id: int) -> str: # 6 LOC
        check_globals()
    
        # return a string containing all of the comments for the subreddit and post specified
    
    @app.route('/settings/', methods=['POST'])
    def settings() -> None: # 12 LOC
        global settings_dict
    
        # update the Settings object with the new items from the POST request JSON
    
        # return a status code for a successful POST request
    
    @app.teardown_appcontext
    def shutdown_session(exception=None):
        db_session.remove()
                            
                        
    reddit_classes.py
                            
    import re
    import os
    import requests
    from db_manager import Base
    from sqlalchemy import Column, Integer, String, Boolean
    
    class Settings(Base):
        __tablename__ = 'settings'
        user_id = Column(Integer, primary_key=True)
    
        # add all of the Columns for the settings the program will have
        # 11 LOC
    
        def __init__(self, sub_regex='.*', title_regex='.*', comment_regex='.*',
                            sub_num=25, title_num=25, comment_num=25,
                            sub_reverse=False, title_reverse=False, comment_reverse=False,
                            title_attr='score', comment_attr='score'):
    
           # set all of the attributes for the Settings object 
           # 11 LOC
    
        def __repr__(self) -> str:
            return super().__repr__()
    
    class Subreddit(Base):
        __tablename__ = 'subreddits'
        id = Column(Integer, primary_key=True)
    
        # add a column for the subreddit URL
        # 1 LOC
    
        def __init__(self, url: str, settings: Settings) -> None: # 2 LOC
            # set the two Subreddit attributes
    
        def scrape(self) -> None: # 4 LOC
            # scrape the Subreddit URL and instantiate a list of Post objects
    
        def display(self, loc: int, titles: bool = False) -> str: # 8 LOC
            # create a single string containing all of the items you want to display
            # if titles is True, then scrape the subreddit
            
        def filter(self) -> bool: # 3 LOC
            # Check if the URL of the subreddit matches the regex for subreddits
    
        def __repr__(self) -> str:
            return super().__repr__()
    
    class Post(Base):
        __tablename__ = 'posts'
        id = Column(Integer, primary_key=True)
        # add a column for the post URL
        # 1 LOC
    
        def __init__(self, url, settings) -> None: # 6 LOC
            # set the Posts attributes and try and scrape the Post's data
    
        def scrape(self) -> None: # 3 LOC
            # scrape the Post's URL
    
        def display(self, loc: int, comments: bool = False) -> str: # 10 LOC
            # return a single string containing the Post's title and possibly all its comments as well
    
        def display_comment_tree(self, reply_dict: dict, depth: int) -> str: 12 LOC
            # return a single string containing all of the comments in the comment tree for the comment in reply_dict using recursion
    
        def filter(self, item, comments: bool = False) -> bool: # 6 LOC
            # return true or false depending on whether or not the item matches its regex
    
        def __repr__(self) -> str:
            return super().__repr__()
                            
                        
    db_manager.py
                            
    import csv
    from sqlalchemy import create_engine
    from sqlalchemy.orm import scoped_session, sessionmaker, declarative_base
    
    #create an engine for your DB using sqlite and storing it in a file named reddit.sqlite
    
    db_session = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
    
    Base = declarative_base()
    Base.query = db_session.query_property()
    
    def init_db(): # 15 LOC
        '''Create the database and fill it with the first 1000 SFW Subreddits
    
        Args:
            None
    
        Returns:
            None
        '''
    
        # import your classes that represent tables in the DB and then create_all of the tables
    
        # read in the subreddit lists from the given CSV and add the first 1000 SFW subreddits to your database by creating Subreddit objects
    
        # save the database
                            
                        
    reddit_setter.py
                            
    import requests
    
    settings_dict = {
        'sub_regex': '.*',
    
        # enter values for every attribute in the Setting object
        # 10 LOC
    }
    
    # use the requests module to make a post request to the reddit API settings endpoint
    
    print(r.status_code)
                            
                        

    Example Output


    Here are some screenshots of browsing your reddit API:





    We'll do it live

    Note, since we are pulling data from an active website, the articles may change between runs.

    Reddit API Oddities


    There have been several oddities in the reddit API that have been brought to my attention. Firstly, image posts are causing lots of problems for students and secondly, every once in a while you'll get a replies dictionary that is not empty but also not formatted in the standard manner. To handle these I recommend you make use of try/except blocks. The cats subreddit especially has proven problematic and as such we will be lenient when grading in relation to this specific subreddit.

    Submission Instructions


    This assignment is due by 11:59 PM on Monday, March 27th (03/27). To submit, please create a folder named HW3 in your dropbox. Then put your python files, named reddit_api.py, reddit_classes.py, db_manager.py, and reddit_setter.py into this folder. Assignments are programmatically collected at the due date.

    Extension Policy


    The course's late work policy is that late work recieves no credit. However, we all live busy and active lives and should you feel like you cannot finish the assignment by the due date, please email me and ask for an extension. Given that the email is polite, I will almost certainly grant the extension.

    Grading Rubric


    Component Points
    API follows guidelines:
        - Subreddit objects loaded in only once
        - Landing page displays subreddit links
        - Subreddit endpoint displays subreddit's posts titles
        - Post endpoint displays a post's comments
        - Settings endpoint modifies the Settings object
    10
    2
    2
    2
    2
    2
    Settings class follows guidelines:
        - properly creates SQLAlchemy table and columns
        - init function properly sets classes data
    10
    5
    5
    Subreddit class follows guidelines:
        - properly creates SQLAlchemy table and columns
        - init gets data dict for posts during scrape
        - contains a list of Post objects as an attribute
        - display prints posts titles using Posts display method
    10
    2
    2
    3
    3
    Post class follows guidelines:
        - properly creates SQLAlchemy table and columns
        - Use polymorphism for scrape function
        - Display differentiates between post title and post comments level
        - display_comment_tree implemented recursively
    10
    2
    2
    3
    3
    Database manager follows guidelines:
        - Correctly instantiates database
        - Fills database with subreddit objects
    5
    2
    3
    Settings modifier follows guidelines:
        - Sends a POST request to your reddit API with a new settings JSON dict
    5
    5
    Code output correct given reasonable inputs 20
    Code style 5
    Total 75