CSE 30332 - HW3
Programming Paradigms
In this assignment you will be using Flask and SQLAlchemy to make a basic
web API. Building on top of HW2 you will be using these
tools to wrap your Reddit objects and expose them over the network.
Reddit
It has come to my attention that many of you do not understand how
reddit
is structured. I recommend you spend several minutes poking around the site. Try think about the
relationship between subreddits, posts, and comments and how they are structured in the context of a UML diagram.
This should be easier having finished HW2 already.
Some subreddits are classified as Not Safe For Work (NSFW), for this assignment make sure you
ignore those subreddits that are tagged NSFW and only consider ones that are Safe For Work (SFW)
otherwise my mom will get mad. Every subreddit that is given in our possible subreddit list has
a row in the
CSV
file that indicates whether or not the subreddit is NSFW. You
can download the
CSV
file from the link below.
Subreddits CSV
Command Line Arguments
Your code will no longer take command line arguments, instead 3rd parties will
be able to send POST
requests to set the parameters.
This is what reddit_setter.py
will be used for.
Modifying your classes
reddit_classes.py
In order to use our classes with a database and SQLAlchemy we need to make some
slight modifications to them. Several attributes in the Subreddit Object
make sense to now represent as columns in a database. Specifically the URL attribute.
Additionally, you will implement a new class: Settings
. This class will hold
all of the settings for the program that a user may want to set. Instead of having a single
regex, number, and attr your new Settings
object should have subreddit, title, and
comment specific regexs, numbers, and attrs.
Now instead of instantiating a Subreddit
and Post
object with all
of these different attributes, you will pass a single Settings
object to the object
you're creating.
Furthermore, you will need to update your display functions to return a single string that will
then be passed on the the 3rd party's client (usually a web browser). To add a newline to your
string, instead of using the newline character use an HTML tag, as the final string will be
viewed by a web browser and thus can contain HTML.
Preparing the database
db_manager.py
Ideally you create your database a single time, and then every time afterwards you use
the already instantiated database. Therefore we will have a helper function named
init_db()
that will create the database for us and fill it with our starting
data (Subreddit objects using the subreddits found in the CSV
file linked above).
To create your database properly, run your db_init
function through the python3
interactive prompt on the command line.
Setting up the API
reddit_api.py
Your API will have 4 endpoints:
The landing page will display the subreddit links.
The second endpoint will allow the user to add a single number to the URL to see
the post titles for the subreddit associated with the number they added to the URL.
The third endpoint will allow the user to add another number to the URL in addition
to the first. This second number will be associated with a post title from the subreddit
associated with the first number. This endpoint will display the comments for the post.
Finally, the fourth endpoint will be one accepting a POST
request that will
allow the requester to change attributes of the Settings
object during runtime.
Changing settings
reddit_setter.py
Using reddit_setter.py
you can change the items in the Settings
object during runtime. To do so you can make a POST
request to one of your
endpoints with a JSON dict giving the items you wish to change in the Settings
object.
Running a Flask server
To run a flask server you first need to set the environmental variable FLASK_APP
for your shell. After doing so you can run your server using the command:
python3 -m flask run -p [PORT]
where [PORT] is a number between 9000 and 9999.
Code Overview and Scaffold
To help you get some sense of the approximate length of each part of the code, I have
included the number of lines used in each section in my solution. I will of course
probably implement my code slightly differently than you may. Therefore you should take
these as a relative reference only, it is completely reasonable that your code may be
several lines shorter or longer. You should only start to worry if yours is significantly
different than what I have.
reddit_api.py
from flask import Flask ,request
from flask_sqlalchemy import SQLAlchemy
from db_manager import db_session
from reddit_classes import Settings, Subreddit, Post
app = Flask(__name__)
setr = None
subs = None
def check_globals() -> None: # 8 LOC
global setr
global subs
# create the settings object if it doesn't exist
# get the subreddit objects from the database and add the settings object
# if the subreddit objects don't already exist
@app.route('/')
def display_subreddits() -> str: # 6 LOC
check_globals()
# return a string containing all of the subreddit links
@app.route('/<int:sub_id>/')
def display_post_titles(sub_id: int) -> str: # 2 LOC
check_globals()
# return a string containing all of the titles for the subreddit specified
@app.route('/<int:sub_id>/<int:post_id>/')
def display_post_comments(sub_id: int, post_id: int) -> str: # 6 LOC
check_globals()
# return a string containing all of the comments for the subreddit and post specified
@app.route('/settings/', methods=['POST'])
def settings() -> None: # 12 LOC
global settings_dict
# update the Settings object with the new items from the POST request JSON
# return a status code for a successful POST request
@app.teardown_appcontext
def shutdown_session(exception=None):
db_session.remove()
reddit_classes.py
import re
import os
import requests
from db_manager import Base
from sqlalchemy import Column, Integer, String, Boolean
class Settings(Base):
__tablename__ = 'settings'
user_id = Column(Integer, primary_key=True)
# add all of the Columns for the settings the program will have
# 11 LOC
def __init__(self, sub_regex='.*', title_regex='.*', comment_regex='.*',
sub_num=25, title_num=25, comment_num=25,
sub_reverse=False, title_reverse=False, comment_reverse=False,
title_attr='score', comment_attr='score'):
# set all of the attributes for the Settings object
# 11 LOC
def __repr__(self) -> str:
return super().__repr__()
class Subreddit(Base):
__tablename__ = 'subreddits'
id = Column(Integer, primary_key=True)
# add a column for the subreddit URL
# 1 LOC
def __init__(self, url: str, settings: Settings) -> None: # 2 LOC
# set the two Subreddit attributes
def scrape(self) -> None: # 4 LOC
# scrape the Subreddit URL and instantiate a list of Post objects
def display(self, loc: int, titles: bool = False) -> str: # 8 LOC
# create a single string containing all of the items you want to display
# if titles is True, then scrape the subreddit
def filter(self) -> bool: # 3 LOC
# Check if the URL of the subreddit matches the regex for subreddits
def __repr__(self) -> str:
return super().__repr__()
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
# add a column for the post URL
# 1 LOC
def __init__(self, url, settings) -> None: # 6 LOC
# set the Posts attributes and try and scrape the Post's data
def scrape(self) -> None: # 3 LOC
# scrape the Post's URL
def display(self, loc: int, comments: bool = False) -> str: # 10 LOC
# return a single string containing the Post's title and possibly all its comments as well
def display_comment_tree(self, reply_dict: dict, depth: int) -> str: 12 LOC
# return a single string containing all of the comments in the comment tree for the comment in reply_dict using recursion
def filter(self, item, comments: bool = False) -> bool: # 6 LOC
# return true or false depending on whether or not the item matches its regex
def __repr__(self) -> str:
return super().__repr__()
db_manager.py
import csv
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker, declarative_base
#create an engine for your DB using sqlite and storing it in a file named reddit.sqlite
db_session = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base()
Base.query = db_session.query_property()
def init_db(): # 15 LOC
'''Create the database and fill it with the first 1000 SFW Subreddits
Args:
None
Returns:
None
'''
# import your classes that represent tables in the DB and then create_all of the tables
# read in the subreddit lists from the given CSV and add the first 1000 SFW subreddits to your database by creating Subreddit objects
# save the database
reddit_setter.py
import requests
settings_dict = {
'sub_regex': '.*',
# enter values for every attribute in the Setting object
# 10 LOC
}
# use the requests module to make a post request to the reddit API settings endpoint
print(r.status_code)
We'll do it live
Note, since we are pulling data from an active website, the articles may
change between runs.
Reddit API Oddities
There have been several oddities in the reddit API that have been brought to my attention.
Firstly, image posts are causing lots of problems for students and secondly, every once in
a while you'll get a replies
dictionary that is not empty but also not formatted
in the standard manner. To handle these I recommend you make use of try/except
blocks.
The cats subreddit especially has proven problematic and as such we will be lenient when grading in
relation to this specific subreddit.
Submission Instructions
This assignment is due by 11:59 PM on Monday, March 27th (03/27).
To submit, please create a folder named HW3
in your dropbox. Then
put your python files, named reddit_api.py
, reddit_classes.py
,
db_manager.py
, and reddit_setter.py
into this folder.
Assignments are programmatically collected at the due date.
Extension Policy
The course's late work policy is that late work recieves no credit.
However, we all live busy and active lives and should you feel like you
cannot finish the assignment by the due date, please email me and ask
for an extension. Given that the email is polite, I will almost certainly
grant the extension.
Grading Rubric
Component |
Points |
API follows guidelines:
- Subreddit objects loaded in only once
- Landing page displays subreddit links
- Subreddit endpoint displays subreddit's posts titles
- Post endpoint displays a post's comments
- Settings endpoint modifies the Settings object
|
10
2
2
2
2
2
|
Settings class follows guidelines:
- properly creates SQLAlchemy table and columns
- init function properly sets classes data
|
10
5
5
|
Subreddit class follows guidelines:
- properly creates SQLAlchemy table and columns
- init gets data dict for posts during scrape
- contains a list of Post objects as an attribute
- display prints posts titles using Posts display method
|
10
2
2
3
3
|
Post class follows guidelines:
- properly creates SQLAlchemy table and columns
- Use polymorphism for scrape function
- Display differentiates between post title and post comments level
- display_comment_tree implemented recursively
|
10
2
2
3
3
|
Database manager follows guidelines:
- Correctly instantiates database
- Fills database with subreddit objects
|
5
2
3
|
Settings modifier follows guidelines:
- Sends a POST request to your reddit API with a new settings JSON dict
|
5
5
|
Code output correct given reasonable inputs |
20 |
Code style |
5 |
Total |
75 |