Merge branch 'main' into main

2eaadf39 · Aryan Utkarsh · GitHub · 2f8d7c17 · 4888719b · 2eaadf39
Unverified Commit 2eaadf39 authored 2 years ago by Aryan Utkarsh Committed by GitHub 2 years ago
Hide whitespace changes
Inline Side-by-side

Showing

with 108 additions and 58 deletions
+108 -58
--- a/.gitignore
+++ b/.gitignore
+# ALL
+*.dev
+
 # for server
 server/models/*
 !server/models/download.sh

--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
 ## Updates
 +  [2023.04.03] We added the CLI mode and provided parameters for configuring the scale of local endpoints.
   +  You can enjoy a lightweight experience with Jarvis without deploying the models locally. See <a href="#Configuration">here</a>.
-   +  Just run `python awesome_chat.py` to experience it
+   +  Just run `python awesome_chat.py --config light.yaml` to experience it.
 +  [2023.04.01] We updated a version of code for building.

 ## Overview
@@ -32,11 +32,10 @@ We introduce a collaborative system that consists of **an LLM as the controller*

 ## Quick Start

-First replace `openai.key` and `huggingface.cookie` in `server/config.yaml` with **your personal key** and **your cookies at huggingface.co**. 
+First replace `openai.key` and `huggingface.cookie` in `server/config.yaml` with **your personal key** and **your cookies at huggingface.co**. Then run the following commands:

 > The absence of the HuggingFace cookie may result in error message: `Rate limit reached. Please log in or use your apiToken`.

-Then run the following commands:

 ### For server:

@@ -54,8 +53,8 @@ sh download.sh

 # run server
 cd ..
-python models_server.py
-python bot_server.py --config config.yaml # for text-davinci-003
+python models_server.py --config config.yaml # required when `inference_mode` is `local` or `hybrid`
+python awesome_chat.py --config config.yaml --mode server # for text-davinci-003
 ```

 Now you can access Jarvis' services by the Web API. For example:
@@ -73,29 +72,13 @@ curl --location 'http://localhost:8004/hugginggpt' \
 }'
 ```

-### For web:
-
-We provide a user-friendly web page. You can run the commands to communicate with Jarvis in your browser:
-
-```bash
-cd web
-npm install
-npm run dev
-```
-Note that in order to display the video properly in HTML, you need to compile `ffmpeg` manually with H.264
-
-```bash
-# This command need be executed without errors.
-LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/ffmpeg -i input.mp4 -vcodec libx264 output.mp4
-```
-
 ### For CLI:

 You can also run Jarvis more easily in chatbot console mode:

 ```bash
 cd server
-python awesome_chat.py
+python awesome_chat.py --config config.yaml --mode cli
 ```

 Examples of CLI mode:
@@ -110,6 +93,22 @@ Welcome to Jarvis! A collaborative system that consists of an LLM as the control
 [ Jarvis ]: Sure. I have generated a canny image based on /examples/savanna.jpg. To do this, I first used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text description of the image, which is "a herd of giraffes and zebras grazing in a field". Then I used the canny-control model to generate the canny image of the original image, which is stored in the path "/images/1ea2.png". Finally, I used the canny-text-to-image model lllyasviel/sd-controlnet-canny to generate the canny image based on the text description and the canny image, which is stored in the path "/images/ba66.png". Do you need anything else?
 ```

+### For web:
+
+We provide a user-friendly web page. After starting `awesome_chat.py` in a server mode, you can run the commands to communicate with Jarvis in your browser:
+
+```bash
+cd web
+npm install
+npm run dev
+```
+Note that in order to display the video properly in HTML, you need to compile `ffmpeg` manually with H.264
+
+```bash
+# This command need be executed without errors.
+LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/ffmpeg -i input.mp4 -vcodec libx264 output.mp4
+```
+
 ## Configuration

 The server-side configuration file is `server/config.yaml`, and some parameters are presented as follows:

--- a/server/awesome_chat.py
+++ b/server/awesome_chat.py
@@ -17,10 +17,15 @@ from PIL import Image, ImageDraw
 from diffusers.utils import load_image
 from pydub import AudioSegment
 import multiprocessing
+import flask
+from flask import request, jsonify
+import waitress
+from flask_cors import CORS
 from get_token_ids import get_token_ids_for_task_parsing, get_token_ids_for_choose_model, count_tokens, get_max_context_length

 parser = argparse.ArgumentParser()
 parser.add_argument("--config", type=str, default="config.yaml")
+parser.add_argument("--mode", type=str, default="cli")
 args = parser.parse_args()

 config = yaml.load(open(args.config, "r"), Loader=yaml.FullLoader)
@@ -87,8 +92,16 @@ inference_mode = config["inference_mode"]
 HTTP_Server = "http://" + config["httpserver"]["host"] + ":" + str(config["httpserver"]["port"])
 Model_Server = "http://" + config["modelserver"]["host"] + ":" + str(config["modelserver"]["port"])

-if inference_mode!="huggingface" and requests.get(Model_Server + "/running").status_code != 200:
-    raise ValueError("Model Server is not running")
+# check the HTTP_Server
+if inference_mode!="huggingface":
+    message = "The server of local inference endpoints is not running, please start it first. (or using `inference_mode: huggingface` in config.yaml for a feature-limited experience)"
+    try:
+        r = requests.get(Model_Server + "/running")
+        if r.status_code != 200:
+            raise ValueError(message)
+    except:
+        raise ValueError(message)
+

 parse_task_demos_or_presteps = open(config["demos_or_presteps"]["parse_task"], "r").read()
 choose_model_demos_or_presteps = open(config["demos_or_presteps"]["choose_model"], "r").read()
@@ -886,7 +899,7 @@ def test():
    ]
    chat_huggingface(messages)

-def cli_chat():
+def cli():
    handler.setLevel(logging.WARNING)
    messages = []
    print("Welcome to Jarvis! A collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors. Jarvis can plan tasks, schedule Hugging Face models, generate friendly responses based on your requests, and help you with many things. Please enter your request (`exit` to exit).")
@@ -899,5 +912,29 @@ def cli_chat():
        print("[ Jarvis ]: ", answer["message"])
        messages.append({"role": "assistant", "content": answer["message"]})

+
+def server():
+    handler.setLevel(logging.CRITICAL)
+    httpserver = config["httpserver"]
+    host = httpserver["host"]
+    port = httpserver["port"]
+
+    app = flask.Flask(__name__, static_folder="public", static_url_path="/")
+    app.config['DEBUG'] = False
+    CORS(app)
+
+    @app.route('/hugginggpt', methods=['POST'])
+    def chat():
+        data = request.get_json()
+        messages = data["messages"]
+        response = chat_huggingface(messages)
+        return jsonify(response)
+    waitress.serve(app, host=host, port=port)
+
 if __name__ == "__main__":
-    cli_chat()
\ No newline at end of file
+    if args.mode == "test":
+        test()
+    elif args.mode == "server":
+        server()
+    elif args.mode == "cli":
+        cli()
\ No newline at end of file
--- a/server/bot_server.py
+++ b/server/bot_server.py
-import argparse
-import flask
-from flask import request, jsonify
-import waitress
-from flask_cors import CORS
-from awesome_chat import chat_huggingface
-import yaml
-
-
-parser = argparse.ArgumentParser()
-parser.add_argument("--config", type=str, default="config.yaml")
-args = parser.parse_args()
-
-
-config = yaml.load(open(args.config, "r"), Loader=yaml.FullLoader)
-httpserver = config["httpserver"]
-host = httpserver["host"]
-port = httpserver["port"]
-
-app = flask.Flask(__name__, static_folder="public", static_url_path="/")
-CORS(app)
-
-@app.route('/hugginggpt', methods=['POST'])
-def chat():
-    data = request.get_json()
-    messages = data["messages"]
-    response = chat_huggingface(messages)
-    return jsonify(response)
-
-if __name__ == '__main__':
-    waitress.serve(app, host=host, port=port)
--- a/server/config.yaml
+++ b/server/config.yaml
@@ -9,7 +9,7 @@ debug: false
 log_file: logs/debug.log
 model: text-davinci-003 # text-davinci-003
 use_completion: true
-inference_mode: huggingface # local, huggingface or hybrid
+inference_mode: hybrid # local, huggingface or hybrid
 local_deployment: minimal # no, minimal, standard or full
 num_candidate_models: 5
 max_description_length: 100

--- a/server/light.yaml
+++ b/server/light.yaml
+openai:
+  key: your_personal_key # gradio, your_personal_key
+huggingface:
+  cookie: # required for huggingface inference
+local: # ignore: just for development
+  endpoint: http://localhost:8003
+dev: false
+debug: false
+log_file: logs/debug.log
+model: text-davinci-003 # text-davinci-003
+use_completion: true
+inference_mode: huggingface # local, huggingface or hybrid
+local_deployment: minimal # no, minimal, standard or full
+num_candidate_models: 5
+max_description_length: 100
+proxy: 
+httpserver:
+  host: localhost
+  port: 8004
+modelserver:
+  host: localhost
+  port: 8005
+logit_bias:
+  parse_task: 0.1
+  choose_model: 5
+tprompt:
+  parse_task: >-
+    #1 Task Planning Stage: The AI assistant can parse user input to several tasks: [{"task": task, "id", task_id, "dep": dependency_task_id, "args": {"text": text or <GENERATED>-dep_id, "image": image_url or <GENERATED>-dep_id,"audio": audio_url or <GENERATED>-dep_id}}]. The special tag "<GENERATED>-dep_id" refer to the one genereted text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and "dep_id" must be in "dep" list. The "dep" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The "args" field must in ["text", "image", "audio"], nothing else. The task MUST selected from the following options: "token-classification", "text2text-generation", "summarization", "translation",  "question-answering", "conversational", "text-generation", "sentence-similarity", "tabular-classification", "object-detection", "image-classification", "image-to-image", "image-to-text", "text-to-image", "text-to-video", visual-question-answering", "document-question-answering", "image-segmentation", "depth-estimation", "text-to-speech", "automatic-speech-recognition", "audio-to-audio", "audio-classification", "canny-control", "hed-control", "mlsd-control", "normal-control", "openpose-control", "canny-text-to-image", "depth-text-to-image", "hed-text-to-image", "mlsd-text-to-image", "normal-text-to-image", "openpose-text-to-image", "seg-text-to-image". There may be multiple tasks of the same type. Think step by step about all the tasks needed to resolve the user's request. Parse out as few tasks as possible while ensuring that the user request can be resolved. Pay attention to the dependencies and order among tasks. If the user input can't be parsed, you need reply empty JSON []. 
+  choose_model: >-
+    #2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.
+  response_results: >-
+    #4 Response Generation Stage: With the task execution logs, the AI assistant needs to describe the process and inference results.
+demos_or_presteps:
+  parse_task: demos/demo_parse_task.json
+  choose_model: demos/demo_choose_model.json
+  response_results: demos/demo_response_results.json 
+prompt:
+  parse_task: The chat log [ {{context}} ] may contain the resources I mentioned. Now I input { {{input}} }, please parse out as many as the required tasks to solve my request in a JSON format.
+  choose_model: >-
+    Please choose a most suitable model from {{metas}} for the task {{task}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}. 
+  response_results: >-
+    Yes. You must first answer my request directly. Please think step by step about my request based on the inference results of the models. Then please detail your workflow step by step including the used models and all inference results for my request in your friendly tone. If any generated files of images, audios or videos in the inference results, must tell me the complete path. If there is nothing in the results, please tell me you can't make it. }
\ No newline at end of file