Skip to content

Weaviate — 向量数据库

简介

Weaviate 是支持混合搜索(向量 + 关键词)的开源向量数据库,内置多种向量化模块。

bash
pip install weaviate-client

快速开始

python
import weaviate
from weaviate.classes.config import Configure, Property, DataType

client = weaviate.connect_to_local()

# 创建集合
client.collections.create(
    "FinanceDoc",
    vectorizer_config=Configure.Vectorizer.none(),  # 手动提供向量
    properties=[
        Property(name="content", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
    ]
)

collection = client.collections.get("FinanceDoc")

# 插入
with collection.batch.dynamic() as batch:
    batch.add_object(
        properties={"content": "不良贷款率定义", "source": "风控手册"},
        vector=[0.1] * 512  # 实际使用 Embedding 模型生成
    )

# 向量搜索
results = collection.query.near_vector(
    near_vector=[0.1] * 512,
    limit=3,
    return_properties=["content", "source"]
)

# 混合搜索(向量 + BM25)
results = collection.query.hybrid(
    query="不良贷款",
    vector=[0.1] * 512,
    limit=3,
    alpha=0.5  # 0=纯BM25, 1=纯向量
)

client.close()

与 LangChain 集成

python
from langchain_community.vectorstores import Weaviate

vectorstore = Weaviate.from_documents(
    documents=chunks,
    embedding=embeddings,
    weaviate_url="http://localhost:8080",
    by_text=False
)

本站内容由 褚成志 整理编写,仅供学习参考