[] Reddit Comment Summarizer - A Lang Chain Project
Reddit Comment Summarizer - A chatGPT API Project
Introduction
Hate reading all the comments on Reddit? Want to know what people are discussing quickly?
I used Reddit's API and Lang Chain to create a Reddit post comment summarizer. This website application allows you to input any Reddit post, and summarize the comment threads content in a click of a button. The goal of this project is to promote information literacy and help users to be smart, critical, and non-bias when consuming online information.
Working principle
The scrapped comment from Reddit is filtered, formatted into a prompt string that is then sent to OpenAI's ChatGPT API with the Lang Chain framework. Basically the prompt contains all the comment content of interest, and GPT will perform a specific summary with engineered prompting that was designed by me. The output of GPT is then sent back to the Web App and displayed.
Demo
Python files: Reddit summarizer
The algorithm behind
First, I use Reddit's API to scrape comment text. A nested search on each comment thread is performed. The algorithm can be set to scrape a specific depth of comment threads, a specific number of comment threads, or by specific commenters.
Here is the function scrapping the comments:
def scrape_comments(self):
dp = DebugPrinter(working=self.debug)
submission = self.submission
if type(submission) == None:
return "No post submittion retrieved!"
self.post_title = submission.title
# ------scrape post
terminate = False
total_token_count = 0
# token_limit = 7000 # leave room for the ~8000 token limit
queue = []
queue_set_pointer = 1 # scrape first layer comments by sets of 10s
back_up_queue = []
scraped_comments = []
# num_comment_layer = [10,10,3,3,3,3,10,10,10,10]
# max_depth = 7
depth = 0
while not terminate:
dp.dprint(f"depth: {depth}, queue len: {len(queue)}")
indent = "| "*depth
next_queue = []
if len(queue) == 0:
if len(back_up_queue) > 0:
dp.dprint("#\n"*3+"."*10+f"scraping backup queue #{len(back_up_queue)}")
queue = back_up_queue
back_up_queue = []
else:
depth = 0
indent = "| "*depth
for i in range((queue_set_pointer-1)*self.num_comment_layer[0],
(queue_set_pointer)*self.num_comment_layer[0]):
if i >= len(submission.comments):
terminate = True
break
comment = self.MyComment(submission.comments[i], [i+1], depth)
queue.append(comment)
queue_set_pointer+=1
for i, que in enumerate(queue):
# scrape queued comments
try:
body = que.comment.body
except:
dp.dprint("")
continue
this_comment = que.comment
dp.dprint(f"{indent} this comment pos: {que.pos}, total_token_before: {total_token_count}")
dp.dprint(f"{indent} comment: {body[:20]}...")
scraped_comments.append(que)
token_count = str_token_count(this_comment.body)
total_token_count+=token_count
if total_token_count > self.token_limit:
terminate = True
break
try:
foo = this_comment.replies[0]
except:
dp.dprint("")
continue
if depth>=self.max_depth:
dp.dprint("")
continue
# add new comments to queue
if type(this_comment.replies[0]) == praw.models.reddit.more.MoreComments:
replies = this_comment.replies[0].comments(0)
else:
replies = this_comment.replies
dp.dprint(f"reply type: {type(this_comment.replies[0])}")
for j, reply in enumerate(replies):
dp.dprint(f"{indent} reply # {j+1}")
reply_pos = que.pos.copy()
reply_pos.append(j+1)
queued_comment = self.MyComment(reply, reply_pos, depth)
if j+1 <= self.num_comment_layer[depth+1]:
dp.dprint(" *queued")
next_queue.append(queued_comment)
else:
dp.dprint("")
back_up_queue.append(queued_comment)
dp.dprint("")
queue = next_queue
depth+=1
return scraped_comments
Then, I filter and format the scrapped information to be sent to GPT. The comments are grouped by author, index, or range of prime comments.
The Prompt
The scrapped comment is inserted into this prompt as a chunk of string at the {discussion} position. This prompt is then sent to GPT to get a summary.
Below is the prompt structure.
template_3 = """Generate a information summary of a Reddit post called "{title}" based on all the comments.
The overview should include a summary and a reflection.
For the summary, write a point-form summary of the discussion.
- Help me get a comprehensive understanding of the key points of discussion.
- When you mention post-specific nouns and words, you should explain clearly what they mean in context.
- Reference the discussion number if possible.
For the reflection, write a reflection after reading the discussion.
- Be a reading mentor for me. Analyze the discussion critically.
- Provide unique and insightful opinions of the discussion. Critique biases and highlight high quality arguments.
Here are the comments:
{discussion}
Output the following response in markdown format.
- summary:
- reflection:"""