The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Build a suggestion engine from past data

To build a suggestion engine from past data, follow this structured approach. This guide outlines the architecture, data requirements, model choices, and implementation strategy. The focus is on a content-based or collaborative filtering engine, depending on your data structure and goals.


1. Define Objective

  • What are you suggesting? (e.g., products, articles, movies)

  • Who are the users? (e.g., customers, readers, viewers)

  • What defines success? (e.g., clicks, purchases, time spent)


2. Gather and Prepare Data

Essential Data Types:

  • User Data: ID, demographics, preferences

  • Item Data: ID, attributes (e.g., category, tags, price, brand)

  • Interaction Data: User-item interactions such as:

    • Views

    • Clicks

    • Ratings

    • Purchases

    • Time spent

Preprocessing Steps:

  • Clean missing or inconsistent values

  • Normalize/standardize numerical features

  • Encode categorical variables (e.g., One-Hot, LabelEncoding)

  • Convert timestamps to datetime formats

  • Aggregate interaction metrics (e.g., total views or ratings per item)


3. Choose Recommendation Approach

A. Content-Based Filtering (CBF)

  • Recommends similar items based on item features and user’s past preferences

  • Works well when user-item interaction is limited

Techniques:

  • Cosine similarity (TF-IDF for text-based data)

  • KNN on item embeddings

  • NLP models (for textual attributes like product descriptions)

B. Collaborative Filtering (CF)

  • Learns from user-item interaction patterns

  • Needs more interaction data

Techniques:

  • Memory-based CF: User-user or item-item similarity

  • Model-based CF: Matrix factorization (e.g., SVD, ALS)

  • Neural CF: Embedding layers + dense networks

C. Hybrid Systems

  • Combine CBF and CF

  • Blend predictions or stack models (e.g., meta-learning)


4. Model Building

Example: Matrix Factorization with SVD (Collaborative Filtering)

python
from surprise import Dataset, Reader, SVD from surprise.model_selection import train_test_split from surprise import accuracy # Load data reader = Reader(rating_scale=(1, 5)) data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader) # Train-test split trainset, testset = train_test_split(data, test_size=0.2) # Train SVD model model = SVD() model.fit(trainset) # Predictions and evaluation predictions = model.test(testset) print("RMSE:", accuracy.rmse(predictions))

Example: Content-Based Filtering with Cosine Similarity

python
from sklearn.metrics.pairwise import cosine_similarity from sklearn.feature_extraction.text import TfidfVectorizer tfidf = TfidfVectorizer(stop_words='english') item_features = tfidf.fit_transform(df['item_description']) cos_sim = cosine_similarity(item_features, item_features) # Get recommendations def get_similar_items(item_index, top_n=5): sim_scores = list(enumerate(cos_sim[item_index])) sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) return [i[0] for i in sim_scores[1:top_n+1]]

5. Evaluate the Model

Metrics:

  • Offline:

    • Precision@K, Recall@K

    • Mean Average Precision (MAP)

    • Root Mean Squared Error (for rating prediction)

  • Online:

    • Click-Through Rate (CTR)

    • Conversion Rate

    • A/B testing


6. Serve Recommendations (Production)

  • Use a web API (e.g., Flask/FastAPI) to serve real-time suggestions

  • Cache frequent queries using Redis or Memcached

  • Use a database (e.g., PostgreSQL, MongoDB) for storing user history

Example: Flask-based API

python
from flask import Flask, request, jsonify app = Flask(__name__) @app.route('/recommend', methods=['GET']) def recommend(): user_id = request.args.get('user_id') # get top 5 item IDs for this user recommended_items = get_recommendations(user_id) return jsonify(recommended_items)

7. Personalize Suggestions Over Time

  • Track user activity (views, likes, purchases)

  • Store updated interaction data

  • Retrain model on a regular basis (batch/real-time)

  • Consider reinforcement learning for adapting suggestions dynamically


8. Advanced Enhancements

  • Embedding models: Use deep learning for richer representations (Word2Vec, BERT for textual data, or autoencoders)

  • Knowledge Graphs: Add contextual relationships between items

  • Session-based Recommender Systems: Use RNNs for sequential user behavior


9. Tools and Libraries

  • Surprise – Simple CF algorithms

  • LightFM – Hybrid recommendation models (CBF + CF)

  • Implicit – Matrix factorization for implicit datasets

  • Scikit-learn – Similarity models, clustering

  • TensorFlow / PyTorch – Deep learning recommendations

  • Faiss – Fast nearest-neighbor search for large vector datasets


10. Scalability Considerations

  • Batch preprocessing for large datasets (Apache Spark, Dask)

  • Use vector databases (like Pinecone, Weaviate, or FAISS) for fast nearest-neighbor lookup

  • Shard models/data for distributed inference


This approach creates a foundation for building a robust, personalized suggestion engine driven by past data. You can expand or fine-tune depending on your domain, such as e-commerce, media, education, or content platforms.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About