+ [2023.04.03] We added the CLI mode and provided parameters for configuring the scale of local endpoints.
+ You can enjoy a lightweight experience with Jarvis without deploying the models locally. See <ahref="#Configuration">here</a>.
+ Just run `python awesome_chat.py` to experience it
+ Just run `python awesome_chat.py --config light.yaml` to experience it.
+ [2023.04.01] We updated a version of code for building.
## Overview
...
...
@@ -32,11 +32,10 @@ We introduce a collaborative system that consists of **an LLM as the controller*
## Quick Start
First replace `openai.key` and `huggingface.cookie` in `server/config.yaml` with **your personal key** and **your cookies at huggingface.co**.
First replace `openai.key` and `huggingface.cookie` in `server/config.yaml` with **your personal key** and **your cookies at huggingface.co**. Then run the following commands:
> The absence of the HuggingFace cookie may result in error message: `Rate limit reached. Please log in or use your apiToken`.
Then run the following commands:
### For server:
...
...
@@ -54,8 +53,8 @@ sh download.sh
# run server
cd ..
python models_server.py
python bot_server.py --config config.yaml # for text-davinci-003
python models_server.py--config config.yaml # required when `inference_mode` is `local` or `hybrid`
python awesome_chat.py --config config.yaml --mode server # for text-davinci-003
```
Now you can access Jarvis' services by the Web API. For example:
@@ -110,6 +93,22 @@ Welcome to Jarvis! A collaborative system that consists of an LLM as the control
[ Jarvis ]: Sure. I have generated a canny image based on /examples/savanna.jpg. To do this, I first used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text description of the image, which is "a herd of giraffes and zebras grazing in a field". Then I used the canny-control model to generate the canny image of the original image, which is stored in the path "/images/1ea2.png". Finally, I used the canny-text-to-image model lllyasviel/sd-controlnet-canny to generate the canny image based on the text description and the canny image, which is stored in the path "/images/ba66.png". Do you need anything else?
```
### For web:
We provide a user-friendly web page. After starting `awesome_chat.py` in a server mode, you can run the commands to communicate with Jarvis in your browser:
```bash
cd web
npm install
npm run dev
```
Note that in order to display the video properly in HTML, you need to compile `ffmpeg` manually with H.264
message="The server of local inference endpoints is not running, please start it first. (or using `inference_mode: huggingface` in config.yaml for a feature-limited experience)"
print("Welcome to Jarvis! A collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors. Jarvis can plan tasks, schedule Hugging Face models, generate friendly responses based on your requests, and help you with many things. Please enter your request (`exit` to exit).")
inference_mode:huggingface# local, huggingface or hybrid
local_deployment:minimal# no, minimal, standard or full
num_candidate_models:5
max_description_length:100
proxy:
httpserver:
host:localhost
port:8004
modelserver:
host:localhost
port:8005
logit_bias:
parse_task:0.1
choose_model:5
tprompt:
parse_task:>-
#1 Task Planning Stage: The AI assistant can parse user input to several tasks: [{"task": task, "id", task_id, "dep": dependency_task_id, "args": {"text": text or <GENERATED>-dep_id, "image": image_url or <GENERATED>-dep_id,"audio": audio_url or <GENERATED>-dep_id}}]. The special tag "<GENERATED>-dep_id" refer to the one genereted text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and "dep_id" must be in "dep" list. The "dep" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The "args" field must in ["text", "image", "audio"], nothing else. The task MUST selected from the following options: "token-classification", "text2text-generation", "summarization", "translation", "question-answering", "conversational", "text-generation", "sentence-similarity", "tabular-classification", "object-detection", "image-classification", "image-to-image", "image-to-text", "text-to-image", "text-to-video", visual-question-answering", "document-question-answering", "image-segmentation", "depth-estimation", "text-to-speech", "automatic-speech-recognition", "audio-to-audio", "audio-classification", "canny-control", "hed-control", "mlsd-control", "normal-control", "openpose-control", "canny-text-to-image", "depth-text-to-image", "hed-text-to-image", "mlsd-text-to-image", "normal-text-to-image", "openpose-text-to-image", "seg-text-to-image". There may be multiple tasks of the same type. Think step by step about all the tasks needed to resolve the user's request. Parse out as few tasks as possible while ensuring that the user request can be resolved. Pay attention to the dependencies and order among tasks. If the user input can't be parsed, you need reply empty JSON [].
choose_model:>-
#2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.
response_results:>-
#4 Response Generation Stage: With the task execution logs, the AI assistant needs to describe the process and inference results.
demos_or_presteps:
parse_task:demos/demo_parse_task.json
choose_model:demos/demo_choose_model.json
response_results:demos/demo_response_results.json
prompt:
parse_task:The chat log [ {{context}} ] may contain the resources I mentioned. Now I input { {{input}} }, please parse out as many as the required tasks to solve my request in a JSON format.
choose_model:>-
Please choose a most suitable model from {{metas}} for the task {{task}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}.
response_results:>-
Yes. You must first answer my request directly. Please think step by step about my request based on the inference results of the models. Then please detail your workflow step by step including the used models and all inference results for my request in your friendly tone. If any generated files of images, audios or videos in the inference results, must tell me the complete path. If there is nothing in the results, please tell me you can't make it. }