Skip to content

Collections

Overview

Collections are a fundamental component of a RAG system, serving as containers for storing and managing documents. The Brevia library provides a robust and flexible framework for working with collections, enabling efficient indexing, querying, and retrieval of documents.

Key Features:

Flexible Data Model: BrevIA utilizes a JSON-based metadata schema, allowing for flexible storage of collection-specific information and document-level metadata. This schema can be easily extended to accommodate custom requirements.

Powerful Indexing: Documents within a collection are indexed using vector representations, facilitating efficient and accurate retrieval based on semantic similarity.

Intuitive API: Brevia provides a comprehensive API for managing collections, including creation, retrieval, modification, and deletion.

Collections in Brevia

Each collection in Brevia is represented by two tables in the underlying database:

langchain_pg_collection: Stores general collection information, including its name, creation timestamp, and metadata.

langchain_pg_embeddings: Contains the vector representations for all documents within the collection. The cmetadata field in both tables plays a crucial role, allowing for the storage of various types of information:

Collection-level metadata: This can include general information about the collection, such as its purpose, creator, or creation date.

Document-level metadata: This can include specific details about each document, such as its author, title, publication date, or any custom attributes relevant to the application.

Collection Management API

Brevia provides a RESTful API for managing collections, empowering developers to perform CRUD operations and access collection information programmatically.

Endpoints:

GET /collections Retrieves a list of all existing collections.

GET /collections?name={{collection_name}} Retrieves a specific collection by its name.

GET /collections/{{collection_id}} Retrieves a specific collection by its ID.

POST /collections Creates a new collection.

Example:

{
  "name": "test_collection",
  "cmetadata": {
    "note": "Sample collection for testing purposes",
    "description": "This collection contains documents related to AI and NLP"
  }
}

PATCH /collections/{{collection_id}} Updates an existing collection.

Example:

{
  "name": "updated_collection",
  "cmetadata": {
    "note": "Updated collection metadata",
    "description": "This collection now includes additional information about documents",
    "documents_metadata": {
      "$schema": "http://json-schema.org/draft-06/schema#",
      "type": "object",
      "properties": {
        "valid_from": {
          "type": "string",
          "format": "date",
          "description": "Document validity start date"
        },
        "valid_to": {
          "type": "string",
          "format": "date",
          "description": "Document validity end date"
        },
        "integration": {
          "type": "boolean",
          "description": "Integration flag"
        },
        "category": {
          "type": "string",
          "description": "Document category",
          "enum": ["cat1", "cat2", "cat3"]
        }
      }
    }
  }
}

In this example, we update the collection metadata to include additional information about the documents, such as their validity period, integration status, and category.

DELETE /collections/{{collection_id}} Deletes a collection.

Further Resources

Indexing: link to indexing documentation Database: link to database documentation