RAG / Reddit / LangChain
How to feed Reddit Sentiments directly into Pinecone/LangChain
A practical pattern for converting subreddit posts into compact sentiment documents with source URLs, then loading them into LangChain and Pinecone.
What this pipeline does
This guide turns Reddit posts into retrieval documents that can power product research, buyer objection mining, and market sentiment agents. The HarvestLab Reddit actor returns source URLs, engagement metrics, author context, and post text in a flat schema.
The pipeline extracts posts from selected communities, maps each post into a compact document, embeds the useful text, and stores the result in Pinecone through LangChain.
Document shape
{
"id": "reddit:LocalLLaMA:abc123",
"text": "Title and post body prepared for embedding.",
"metadata": {
"subreddit": "LocalLLaMA",
"score": 144,
"num_comments": 38,
"source_url": "https://reddit.com/r/LocalLLaMA/comments/abc123",
"captured_at": "2026-05-20T18:00:00Z"
}
}
LangChain loading pattern
const documents = redditItems.map((item) => ({
pageContent: `${item.title}\n\n${item.selftext ?? ""}`,
metadata: {
subreddit: item.subreddit,
score: item.score,
comments: item.num_comments,
source_url: item.permalink,
},
}));
Retrieval notes
Keep the raw Reddit URL in metadata so the agent can cite the original thread. Store engagement metrics beside the embedded text so downstream ranking can prefer high-signal discussions over low-activity posts.