Unverified Commit 2eaadf39 authored by Aryan Utkarsh's avatar Aryan Utkarsh Committed by GitHub
Browse files

Merge branch 'main' into main

Showing with 108 additions and 58 deletions
+108 -58
# ALL
*.dev
# for server
server/models/*
!server/models/download.sh
......
......@@ -5,7 +5,7 @@
## Updates
+ [2023.04.03] We added the CLI mode and provided parameters for configuring the scale of local endpoints.
+ You can enjoy a lightweight experience with Jarvis without deploying the models locally. See <a href="#Configuration">here</a>.
+ Just run `python awesome_chat.py` to experience it
+ Just run `python awesome_chat.py --config light.yaml` to experience it.
+ [2023.04.01] We updated a version of code for building.
## Overview
......@@ -32,11 +32,10 @@ We introduce a collaborative system that consists of **an LLM as the controller*
## Quick Start
First replace `openai.key` and `huggingface.cookie` in `server/config.yaml` with **your personal key** and **your cookies at huggingface.co**.
First replace `openai.key` and `huggingface.cookie` in `server/config.yaml` with **your personal key** and **your cookies at huggingface.co**. Then run the following commands:
> The absence of the HuggingFace cookie may result in error message: `Rate limit reached. Please log in or use your apiToken`.
Then run the following commands:
### For server:
......@@ -54,8 +53,8 @@ sh download.sh
# run server
cd ..
python models_server.py
python bot_server.py --config config.yaml # for text-davinci-003
python models_server.py --config config.yaml # required when `inference_mode` is `local` or `hybrid`
python awesome_chat.py --config config.yaml --mode server # for text-davinci-003
```
Now you can access Jarvis' services by the Web API. For example:
......@@ -73,29 +72,13 @@ curl --location 'http://localhost:8004/hugginggpt' \
}'
```
### For web:
We provide a user-friendly web page. You can run the commands to communicate with Jarvis in your browser:
```bash
cd web
npm install
npm run dev
```
Note that in order to display the video properly in HTML, you need to compile `ffmpeg` manually with H.264
```bash
# This command need be executed without errors.
LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/ffmpeg -i input.mp4 -vcodec libx264 output.mp4
```
### For CLI:
You can also run Jarvis more easily in chatbot console mode:
```bash
cd server
python awesome_chat.py
python awesome_chat.py --config config.yaml --mode cli
```
Examples of CLI mode:
......@@ -110,6 +93,22 @@ Welcome to Jarvis! A collaborative system that consists of an LLM as the control
[ Jarvis ]: Sure. I have generated a canny image based on /examples/savanna.jpg. To do this, I first used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text description of the image, which is "a herd of giraffes and zebras grazing in a field". Then I used the canny-control model to generate the canny image of the original image, which is stored in the path "/images/1ea2.png". Finally, I used the canny-text-to-image model lllyasviel/sd-controlnet-canny to generate the canny image based on the text description and the canny image, which is stored in the path "/images/ba66.png". Do you need anything else?
```
### For web:
We provide a user-friendly web page. After starting `awesome_chat.py` in a server mode, you can run the commands to communicate with Jarvis in your browser:
```bash
cd web
npm install
npm run dev
```
Note that in order to display the video properly in HTML, you need to compile `ffmpeg` manually with H.264
```bash
# This command need be executed without errors.
LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/ffmpeg -i input.mp4 -vcodec libx264 output.mp4
```
## Configuration
The server-side configuration file is `server/config.yaml`, and some parameters are presented as follows:
......
......@@ -17,10 +17,15 @@ from PIL import Image, ImageDraw
from diffusers.utils import load_image
from pydub import AudioSegment
import multiprocessing
import flask
from flask import request, jsonify
import waitress
from flask_cors import CORS
from get_token_ids import get_token_ids_for_task_parsing, get_token_ids_for_choose_model, count_tokens, get_max_context_length
parser = argparse.ArgumentParser()
parser.add_argument("--config", type=str, default="config.yaml")
parser.add_argument("--mode", type=str, default="cli")
args = parser.parse_args()
config = yaml.load(open(args.config, "r"), Loader=yaml.FullLoader)
......@@ -87,8 +92,16 @@ inference_mode = config["inference_mode"]
HTTP_Server = "http://" + config["httpserver"]["host"] + ":" + str(config["httpserver"]["port"])
Model_Server = "http://" + config["modelserver"]["host"] + ":" + str(config["modelserver"]["port"])
if inference_mode!="huggingface" and requests.get(Model_Server + "/running").status_code != 200:
raise ValueError("Model Server is not running")
# check the HTTP_Server
if inference_mode!="huggingface":
message = "The server of local inference endpoints is not running, please start it first. (or using `inference_mode: huggingface` in config.yaml for a feature-limited experience)"
try:
r = requests.get(Model_Server + "/running")
if r.status_code != 200:
raise ValueError(message)
except:
raise ValueError(message)
parse_task_demos_or_presteps = open(config["demos_or_presteps"]["parse_task"], "r").read()
choose_model_demos_or_presteps = open(config["demos_or_presteps"]["choose_model"], "r").read()
......@@ -886,7 +899,7 @@ def test():
]
chat_huggingface(messages)
def cli_chat():
def cli():
handler.setLevel(logging.WARNING)
messages = []
print("Welcome to Jarvis! A collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors. Jarvis can plan tasks, schedule Hugging Face models, generate friendly responses based on your requests, and help you with many things. Please enter your request (`exit` to exit).")
......@@ -899,5 +912,29 @@ def cli_chat():
print("[ Jarvis ]: ", answer["message"])
messages.append({"role": "assistant", "content": answer["message"]})
def server():
handler.setLevel(logging.CRITICAL)
httpserver = config["httpserver"]
host = httpserver["host"]
port = httpserver["port"]
app = flask.Flask(__name__, static_folder="public", static_url_path="/")
app.config['DEBUG'] = False
CORS(app)
@app.route('/hugginggpt', methods=['POST'])
def chat():
data = request.get_json()
messages = data["messages"]
response = chat_huggingface(messages)
return jsonify(response)
waitress.serve(app, host=host, port=port)
if __name__ == "__main__":
cli_chat()
\ No newline at end of file
if args.mode == "test":
test()
elif args.mode == "server":
server()
elif args.mode == "cli":
cli()
\ No newline at end of file
import argparse
import flask
from flask import request, jsonify
import waitress
from flask_cors import CORS
from awesome_chat import chat_huggingface
import yaml
parser = argparse.ArgumentParser()
parser.add_argument("--config", type=str, default="config.yaml")
args = parser.parse_args()
config = yaml.load(open(args.config, "r"), Loader=yaml.FullLoader)
httpserver = config["httpserver"]
host = httpserver["host"]
port = httpserver["port"]
app = flask.Flask(__name__, static_folder="public", static_url_path="/")
CORS(app)
@app.route('/hugginggpt', methods=['POST'])
def chat():
data = request.get_json()
messages = data["messages"]
response = chat_huggingface(messages)
return jsonify(response)
if __name__ == '__main__':
waitress.serve(app, host=host, port=port)
......@@ -9,7 +9,7 @@ debug: false
log_file: logs/debug.log
model: text-davinci-003 # text-davinci-003
use_completion: true
inference_mode: huggingface # local, huggingface or hybrid
inference_mode: hybrid # local, huggingface or hybrid
local_deployment: minimal # no, minimal, standard or full
num_candidate_models: 5
max_description_length: 100
......
openai:
key: your_personal_key # gradio, your_personal_key
huggingface:
cookie: # required for huggingface inference
local: # ignore: just for development
endpoint: http://localhost:8003
dev: false
debug: false
log_file: logs/debug.log
model: text-davinci-003 # text-davinci-003
use_completion: true
inference_mode: huggingface # local, huggingface or hybrid
local_deployment: minimal # no, minimal, standard or full
num_candidate_models: 5
max_description_length: 100
proxy:
httpserver:
host: localhost
port: 8004
modelserver:
host: localhost
port: 8005
logit_bias:
parse_task: 0.1
choose_model: 5
tprompt:
parse_task: >-
#1 Task Planning Stage: The AI assistant can parse user input to several tasks: [{"task": task, "id", task_id, "dep": dependency_task_id, "args": {"text": text or <GENERATED>-dep_id, "image": image_url or <GENERATED>-dep_id,"audio": audio_url or <GENERATED>-dep_id}}]. The special tag "<GENERATED>-dep_id" refer to the one genereted text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and "dep_id" must be in "dep" list. The "dep" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The "args" field must in ["text", "image", "audio"], nothing else. The task MUST selected from the following options: "token-classification", "text2text-generation", "summarization", "translation", "question-answering", "conversational", "text-generation", "sentence-similarity", "tabular-classification", "object-detection", "image-classification", "image-to-image", "image-to-text", "text-to-image", "text-to-video", visual-question-answering", "document-question-answering", "image-segmentation", "depth-estimation", "text-to-speech", "automatic-speech-recognition", "audio-to-audio", "audio-classification", "canny-control", "hed-control", "mlsd-control", "normal-control", "openpose-control", "canny-text-to-image", "depth-text-to-image", "hed-text-to-image", "mlsd-text-to-image", "normal-text-to-image", "openpose-text-to-image", "seg-text-to-image". There may be multiple tasks of the same type. Think step by step about all the tasks needed to resolve the user's request. Parse out as few tasks as possible while ensuring that the user request can be resolved. Pay attention to the dependencies and order among tasks. If the user input can't be parsed, you need reply empty JSON [].
choose_model: >-
#2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.
response_results: >-
#4 Response Generation Stage: With the task execution logs, the AI assistant needs to describe the process and inference results.
demos_or_presteps:
parse_task: demos/demo_parse_task.json
choose_model: demos/demo_choose_model.json
response_results: demos/demo_response_results.json
prompt:
parse_task: The chat log [ {{context}} ] may contain the resources I mentioned. Now I input { {{input}} }, please parse out as many as the required tasks to solve my request in a JSON format.
choose_model: >-
Please choose a most suitable model from {{metas}} for the task {{task}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}.
response_results: >-
Yes. You must first answer my request directly. Please think step by step about my request based on the inference results of the models. Then please detail your workflow step by step including the used models and all inference results for my request in your friendly tone. If any generated files of images, audios or videos in the inference results, must tell me the complete path. If there is nothing in the results, please tell me you can't make it. }
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment