728x90
LangChain No using OpenAI API
(1) QA๋ฅผ ์ํ Document ๋ถ๋ฌ์ค๊ธฐ
# Load and process the text files
# loader = TextLoader("./data/texts")
loader = DirectoryLoader('./pdf/', glob="./*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()
# Document ๋ถ์
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)
(2) Embedding
# HuggingFaceInstructEmbedding
from langchain.embeddings import HuggingFaceInstructEmbeddings
instructor_embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl",
model_kwargs={"device": "cpu"}) # device "cuda"
OpenAIEmbedding์ ์ฌ์ฉํ์ง ์๊ณ HuggingFace์ Embedding ์ฌ์ฉ
(3) FastChat Model Import
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained('lmsys/fastchat-t5-3b-v1.0')
model = AutoModelForSeq2SeqLM.from_pretrained('lmsys/fastchat-t5-3b-v1.0')
https://github.com/lm-sys/FastChat/blob/main/LICENSE
fast-chat ์์ ์ ์ผ๋ก ์ด์ฉ์ด ๊ฐ๋ฅ
(4) pipeline task config
from transformers import pipeline
from langchain.llms import HuggingFacePipeline
import torch
pipe = pipeline(
"text2text-generation",
model=model,
tokenizer=tokenizer,
max_length=256
)
local_llm = HuggingFacePipeline(pipeline=pipe)
(5) RetrievalQA llm config
# create the chain to answer questions
qa_chain = RetrievalQA.from_chain_type(
llm=local_llm,
chain_type="stuff",
retriever=retriever,
#return_source_docummets=True,
)
(6) Test
query = "What is BERT language model?"
llm_response = qa_chain(query)
process_llm_response(llm_response)
<pad> BERT (Boolean Recurrent Entity Recognition) is a language model that is used to train and evaluate
natural language processing (NLP) systems. It is a probabilistic model that uses a combination of a
probabilistic representation of the input and a generative model to generate a sequence of recurrent units
(recurrent units). The BERT model is a subset of the generative model and is used to train NLP systems that
are trained to recognize and generate human language.
๋ฐ์ํ
'๐ฃ๏ธ Natural Language Processing' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[LangChain] Sentence-Transformer (0) | 2023.06.01 |
---|---|
[OpenAI API] OpenAI Token (0) | 2023.05.30 |
[Mac] Transformer model downloaded path (0) | 2023.05.28 |
[LangChain] Retrieval PDF (0) | 2023.05.26 |
[LangChain] Building Custom Tool (0) | 2023.05.24 |