Weaviate — 向量数据库
简介
Weaviate 是支持混合搜索(向量 + 关键词)的开源向量数据库,内置多种向量化模块。
bash
pip install weaviate-client快速开始
python
import weaviate
from weaviate.classes.config import Configure, Property, DataType
client = weaviate.connect_to_local()
# 创建集合
client.collections.create(
"FinanceDoc",
vectorizer_config=Configure.Vectorizer.none(), # 手动提供向量
properties=[
Property(name="content", data_type=DataType.TEXT),
Property(name="source", data_type=DataType.TEXT),
]
)
collection = client.collections.get("FinanceDoc")
# 插入
with collection.batch.dynamic() as batch:
batch.add_object(
properties={"content": "不良贷款率定义", "source": "风控手册"},
vector=[0.1] * 512 # 实际使用 Embedding 模型生成
)
# 向量搜索
results = collection.query.near_vector(
near_vector=[0.1] * 512,
limit=3,
return_properties=["content", "source"]
)
# 混合搜索(向量 + BM25)
results = collection.query.hybrid(
query="不良贷款",
vector=[0.1] * 512,
limit=3,
alpha=0.5 # 0=纯BM25, 1=纯向量
)
client.close()与 LangChain 集成
python
from langchain_community.vectorstores import Weaviate
vectorstore = Weaviate.from_documents(
documents=chunks,
embedding=embeddings,
weaviate_url="http://localhost:8080",
by_text=False
)