Log in to leave a comment
No posts yet
Markdown files scraped from the web are often masses of noise, filled with advertisements and menu bars. When this kind of text is mixed in, RAG (Retrieval-Augmented Generation) performance drops noticeably. While unrefined data distracts the model's attention, cleanly organized Markdown significantly improves search accuracy. Cutting out unnecessary text also reduces the tokens consumed by local LLMs by over 30%, so you won't be wasting money.
You should use Python's BeautifulSoup library to strip away this noise.
pip install beautifulsoup4 lxml in the terminal.decompose() method to remove unnecessary CSS classes like .ad-container or .nav-menu entirely.lxml parser, save it as Markdown, and move it into your Obsidian vault.When files grow into the hundreds, even the smartest models lose context. Instead of just piling files up, divide areas based on the status of the information. I use a 3-stage structure (01_Raw_Inbox, 02_Processed_Wiki, 03_Project_Action) by tweaking the PARA framework. This provides a physical guideline for which information Claude Code should trust and reference.
Use file names and terminal options to prevent the agent from wandering.
YYYY-MM-DD to let the model know how fresh the information is.--newer-than option to make it read only files changed within the last 24 hours.03_Project_Action folder.Simple text searches cannot distinguish whether a document is "important" or a "completed task." This is why you must include YAML Frontmatter at the top of your documents. With metadata, you can give Claude Code much more sophisticated commands.
For a knowledge entrepreneur's work engine, three fields are sufficient:
topic for categorization, source_importance for priority, and status for task state at the top of the note.The terminal-based Claude Code truly shows its power when paired with shell scripts. By typing just one command when you get to work, you complete an engine that analyzes what you studied yesterday and even drafts the emails you need to send today. There is no need to waste energy wondering what to do first every morning.
Set up the automation routine as follows:
.sh or .bat) containing the claude --bare command to speed up initial startup.find -mtime -1 command into the script to pass only the notes created within the last day to Claude.Once files exceed a thousand, even a 200k token context window fills up quickly. From this point, instead of having it read every file, you should use a 2-step method where it first looks at master_index.md, which acts as a master map. This method reduces the number of API calls by nearly 60%.
To maintain performance, you must manage context smartly.
/compact command to summarize history if the conversation gets long.