CSE 30332 - HW1
                    Programming Paradigms
                    
                    
                        In this assignment you will be using functional programming
                        tools in python such as Map, List Comprehensions, Lambda Functions,
                        and others to write a command line tool to scrape Reddit.
                    
                 
                
                    Reddit Review
                    
                    Peter likes to sit in the back of the class.  It has its perks:
- 
He can beat the rush out the door when class ends.
 
- 
He can see everyone browsing Facebook, playing video games, watching
  YouTube, or doing homework.
 
- 
He feels safe from being called upon by the instructor... except when he
  does that strange thing where he goes around the class and tries to talk to
  people.  Totally weird .
 
That said, sitting in the back has its downsides:
- 
He can never see what the instructor is writing because he has terrible
  handwriting and always writes too small .
 
- 
He is prone to falling asleep because the instructor is really boring and
  the class is not as interesting as his other computer science courses.
 
To combat his boredom, Peter typically just browses Reddit.  His favorite
subreddits are AdviceAnimals, aww, todayilearned, and of course
UnixPorn.  Lately, however, Peter has grown paranoid that his web browser
is leaking information about him, and so he wants to be able to get the
latest links from Reddit directly in his terminal.
                 
                
                    Requests Module
                    
Fortunately for Peter, Reddit provides a JSON feed for every subreddit.
You simply need to append .json to the end of each subreddit.  For
instance, the JSON feed for todayilearned can be found here:
https://www.reddit.com/r/todayilearned/.json
 
To fetch that data, Peter uses the Requests package in Python to access
the JSON data:
r = requests.get('https://www.reddit.com/r/todayilearned/.json')
print(r.json())
 
 429 Too Many Requests
Reddit tries to prevent bots from accessing its website too often.  To work
around any 429: Too Many Requests errors, we can trick Reddit by
specifying our own user agent:
headers  = {'user-agent': 'reddit-{}'.format(os.environ.get('USER', 'cse-30332-sp23'))}
response = requests.get(url, headers=headers)
 
This should allow you to make requests without getting the dreaded 429
error.
                 
                The code above would output something like the following:
{"kind": "Listing", "data": {"modhash": "g8n3uwtdj363d5abd2cbdf61ed1aef6e2825c29dae8c9fa113", "children": [{"kind": "t3", "data": ...
 
                Looking through that stream of text, Peter sees that the JSON data is a
collection of structured or hierarchical dictionaries and lists.  This
looks a bit complex to him, so he wants you to help him complete the
reddit.py script which fetches the JSON data for a specified
subreddit or URL and allows the user to sort the articles by various
fields, restrict the number of items displayed, and even shorten the URLs
of each article.
                
                    Command Line Arguments
                    
                    The reddit.py script takes the following arguments:
                    
                        
    --subreddits SUB1,SUB2,SUB3,...SUBN   The list of subreddits to scrape, delimited by commas
    --num        LIMIT                    Number of articles to display per subreddit (default: 5)
    --regex      REGEX                    A regex to use to filter posts
    --attr       ATTR                     Field to sort articles by (default: score)
    --reverse                             Include this flag to reverse the output
                        
                     
                    
                        
The --subreddits flag specifies the list of subreddits to scrape, is comma delimited, and can be of variable length.
                        
The --num flag specifies the number of titles to display per subreddit, the default is 5.
                        
The --regex flag specifies the regex used to filter the titles.
                        
The --attr flag specifies the field to sort the titles by.
                        
The --reverse flag specifies whether the output should be printed in reverse direction or not.
                    
                 
                
                    
Code Overview and Scaffold
                    
                
                
                    
                        
                            import requests
                            import os
                            import re
                            import argparse
                            from functools import partial
                            fromt typing import Generator
                            def scraper(sub: str) -> list:
                            '''Use the Reddit API to get the JSON for a single
                            subreddit.
                            Args:
                                sub (str): a subreddit name in the form of a string
                            Returns:
                                list: A list of dicts containing the posts on the subreddit
                            '''
                            def searcher(num: int, regex: str, post_list: list) -> list: #1 line
                            '''Use the supplied regex to filter the titles of posts
                            in the subreddit.
                            Args:
                                num (int): the number of posts to return out of the filtered set
                                regex (str): the regex with which to filter the post titles
                                post_list (str): a list of dicts for each post in the subreddit
                            Returns:
                                list: A list of NUM dicts for posts on the subreddit with titles
                                      matching REGEX
                            '''
                            def sorter(attr: str, dir: bool, post_list: list) -> list: #1 line
                            '''Sort the filtered posts based on ATTR.
                            Args:
                                attr (str): The dictionary key on which to sort the posts
                                dir (str): A boolean for the direction in which to sort (asc/dsc)
                                post_list (str): A filtered list of posts in the subreddit
                            Returns:
                                list: A list of dicts containing the posts on the subreddit
                                      sorted by ATTR
                            '''
                            def formatter(post_list: list) -> Generator[str, None, None]: #1 line
                            '''Return a nicely formatted string for each remaining post in your list
                            args:
                                post_list (list): The list of posts that have been filtered and sorted
                            Returns:
                                Generator: A generator yielding the strings
                            '''
                            if __name__ == '__main__':
                                parser = argparse.ArgumentParser()
                                parser.add_argument("--subreddits", help="A comma separated list of subreddits to scrape")
                                #Parse the rest of your arguments here
                                #Use partial to create closures for two of your functions
                                #Use nested maps to call your functions
                                #Print out your formatted posts
                        
                    
                 
                
                    Example Output
                    
                    Here are some examples of reddit.py in action:
# Show Linux subreddit
hw1 ➜ python3 reddit_scrape.py --subreddits ultimate,houseplants,cats --num 5 --regex '.*cat.*'
Show me your most adorable pictures of your cat/cats -- 0.96
Hi! This is stray cat I've made friends with this summer, now it's colder so I let him stay at home. He often sleeps like this, face down, is this normal? Looks both depressing but also kinda cute. -- 0.98
I'm sick and was taking a nap. Woke up to this. I don't have a cat. -- 0.98
Here are some great resources for answering common questions about feline aggression. And remember, it's always best to talk with your veterinarian about specific issues regarding your cat(s)!🐱❤️ -- 0.99
Stinko's Tumor Battle [UPDATE] I made a video of his journey for the reddit cat community! -- 1.0
 
 We'll do it live
Note, since we are pulling data from an active website, the articles may
change between runs.
 
                 
                
                    
Submission Instructions
                    
                
                
                    
                        
                            This assignment is due by 11:59 PM on Monday, February 6th (02/06).
                            To submit, please create a folder named HW1 in your dropbox. Then
                            put your python file, named reddit_scrape.py, into this folder. Assignments are programmatically
                            collected at the due date.
                        
                     
                 
                
                    
Grading Rubric
                    
                
                
                    
                        
                            
                                
                                    | Component | 
                                    Points | 
                                
                            
                            
                                
                                    
                                        Scraper function follows guidelines:  
                                            - Requests module
                                     | 
                                    5 | 
                                
                                
                                    Searcher function follows guidelines:  
                                            - List comprehension  
                                            - Regex  
                                            - 1 LOC
                                     | 
                                    5 | 
                                
                                
                                    Sorter function follows guidelines:  
                                            - Lambda function  
                                            - 1 LOC
                                     | 
                                    5 | 
                                
                                
                                    Formatter function follows guidelines:  
                                            - Generator  
                                            - 1 LOC
                                     | 
                                    5 | 
                                
                                
                                    Main function follows guidelines:  
                                            - Argument parsing  
                                            - Map calls  
                                     | 
                                    5 | 
                                
                                
                                    | Code runs without errors using different arguments and inputs | 
                                    25 | 
                                
                                
                                    | Code output correct given reasonable inputs | 
                                    20 | 
                                
                                
                                    | Code style | 
                                    5 | 
                                
                                
                                    | Total | 
                                    75 |