Categories We Write About

Create a personal search engine

Creating a personal search engine can range from a simple script that searches through a set of your files or favorite websites, to a more advanced system with indexing, ranking, and a user-friendly interface. Below is a basic but functional guide to building a personal search engine using Python and Flask. It will index and search content from websites or text files.


Prerequisites:

  • Basic Python knowledge

  • Python 3.x installed

  • Libraries: Flask, beautifulsoup4, requests, whoosh (for indexing and searching)


🧠 Step-by-Step Guide


Step 1: Install Required Libraries

bash
pip install flask beautifulsoup4 requests whoosh

Step 2: Web Crawler & Content Indexer

This will scrape content from a set of predefined websites and index it.

python
# crawler.py import requests from bs4 import BeautifulSoup from whoosh.index import create_in from whoosh.fields import Schema, TEXT, ID import os def get_content(url): try: res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser') title = soup.title.string if soup.title else url body = ' '.join(p.text for p in soup.find_all('p')) return title, body except: return "", "" def build_index(): schema = Schema(title=TEXT(stored=True), content=TEXT, url=ID(stored=True, unique=True)) if not os.path.exists("indexdir"): os.mkdir("indexdir") ix = create_in("indexdir", schema) writer = ix.writer() urls = [ 'https://en.wikipedia.org/wiki/Web_search_engine', 'https://realpython.com/', # Add more URLs here ] for url in urls: title, content = get_content(url) if content: writer.add_document(title=title, content=content, url=url) writer.commit()

Step 3: Flask-Based Search Interface

python
# app.py from flask import Flask, request, render_template_string from whoosh.index import open_dir from whoosh.qparser import QueryParser app = Flask(__name__) TEMPLATE = """ <!doctype html> <title>My Personal Search Engine</title> <h2>Search</h2> <form method=post> <input type=text name=query size=50> <input type=submit value=Search> </form> {% if results %} <h3>Results for: {{ query }}</h3> <ul> {% for r in results %} <li><a href="{{ r['url'] }}">{{ r['title'] }}</a></li> {% endfor %} </ul> {% endif %} """ @app.route('/', methods=['GET', 'POST']) def search(): results = [] query = "" if request.method == 'POST': query = request.form['query'] ix = open_dir("indexdir") with ix.searcher() as searcher: qp = QueryParser("content", schema=ix.schema) q = qp.parse(query) hits = searcher.search(q, limit=10) results = [{'title': h['title'], 'url': h['url']} for h in hits] return render_template_string(TEMPLATE, results=results, query=query) if __name__ == '__main__': app.run(debug=True)

Step 4: Build Index and Run Server

bash
python crawler.py # This will index the content python app.py # This will start the local search engine

⚙️ Optional Features to Add

  1. File System Search: Index .txt, .pdf, or .docx files from local folders.

  2. Natural Language Query Expansion: Integrate with NLP tools like spaCy or GPT for smarter queries.

  3. Ranking Algorithm: Score results based on frequency, recency, or manual weights.

  4. Bookmarking & Annotation: Let users save or annotate search results.

  5. Authentication: Secure your engine with a login.


🔐 Security Note

If you deploy it online, add security layers to avoid open access or misuse of the search engine.


This simple search engine can be expanded into a more powerful tool with additional data sources, metadata indexing, and AI-driven ranking.

Share This Page:

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories We Write About