## Architecture

```mermaid
---
config:
  flowchart:
    curve: basis
---
flowchart TB
    __start__(["<p>__start__</p>"]) --> analyze_query("analyze_query")
    analyze_query -. &nbsp;exit&nbsp; .-> __end__(["<p>__end__</p>"])
    analyze_query -. &nbsp;seeking_information or comparison or recommendation&nbsp; .-> check_profile_completeness("check_profile_completeness")
    analyze_query -. &nbsp;smalltalk/ greeting&nbsp; .-> handle_greeting("handle_greeting")
    build_retrieval_filter("build_retrieval_filter") --> retrieve_documents("retrieve_documents")
    check_profile_completeness -. &nbsp;missing&nbsp; .-> ask_clarifying_questions("ask_clarifying_questions")
    ask_clarifying_questions --> analyze_query
    check_profile_completeness -. &nbsp;complete&nbsp; .-> build_retrieval_filter
    retrieve_documents --> generate_answer("generate_answer")
    generate_answer --> __end__
    handle_greeting --> __end__
     __start__:::first
     __end__:::last
    classDef default fill:#f2f0ff,line-height:1.2
    classDef first fill-opacity:0
    classDef last fill:#bfb6fc
```
---
> Note:
The `ask_clarifying_questions` node calls an interrupt prompting the user to supply missing profile fields and when the input is received it loops back to `analyze_query`.

## Key Components

* **Query Analysis**
  Classifies user intent (information-seeking, comparison, recommendation, greeting, etc.) and extracts relevant constraints in a single LLM call.

* **Human-in-the-Loop**
  If required profile constraints are missing, the system pauses execution and asks targeted clarifying questions before proceeding.

* **Retrieval (RAG)**
  Uses vector search to retrieve relevant chunks from:

  * University JSON datasets
  * Blogs
  * Excel files

* **Generation**
  The LLM synthesizes the final response using:

  * User profile constraints
  * Retrieved documents
  * Conversation history

---

## Setup

This project uses **[uv](https://github.com/astral-sh/uv)** for Python version management and dependency resolution.

### Prerequisites

* `uv` installed
* PostgreSQL available (credentials must be configured manually)

### Run the setup script

```sh
$ chmod +x setup.sh
$ ./setup.sh
```

This will:

* Ensure the correct Python version (from `.python-version`)
* Install dependencies from `pyproject.toml` / `uv.lock`
* Create a `.env` file if it does not already exist
* Prepare local directories
* Run initial data ingestion scripts

> **Note**: PostgreSQL credentials must be configured separately.

---

## Data Ingestion

Before running the chatbot, populate the vector database.

> **Important**:
>
> * Ensure university JSON data is available and configured via `UNIVERSITY_DATA_PATH` in `app/config.py`
> * Ensure Excel files are available at `EXCEL_FILE_DIR`

Run the ingestion scripts:

```sh
$ uv run python -m scripts.sync_blogs
$ uv run python -m scripts.sync_universities
$ uv run python -m scripts.sync_excel_files
```

---

## Usage

Start the Streamlit frontend:

```sh
$ uv run streamlit run ui.py --server.port 8000
```
