pytorch · msaroufim · Jun 27, 2023 · Jun 25, 2023 · Jun 25, 2023 · Jun 26, 2023
diff --git a/docs/internals.md b/docs/internals.md
@@ -38,7 +38,7 @@ And backend is the Python code (most Pytorch specific stuff)
 
 ### Backend (Python)
 
-https://github.com/pytorch/serve/blob/master/ts/arg_parser.py#L64
+https://github.com/pytorch/serve/blob/master/ts/arg_parser.py
 
 * Arg parser controls config/not workflow and can also setup a model service worker with a custom socket
 

diff --git a/examples/instruction_embedding/README.md b/examples/instruction_embedding/README.md
@@ -0,0 +1,87 @@
+# A TorchServe handler for Instructor Embedding models
+
+A simple handler that you can use to serve [Instructor Embedding models](https://instructor-embedding.github.io/) with TorchServe, supporting both single inference and batch inference.
+
+## Setup:
+
+**1.** [Download an Instructor model (i.e. Instructor-XL)](https://huggingface.co/hkunlp/instructor-xl/tree/main?clone=true) from HuggingFace into your model store directory of choosing. Copy the `instructor-embedding-handler.py` into the same directory as your newly downloaded directory containing all the model-related files.
+
+**2.** Create the .MAR Model Archive using [`torch-model-archiver`](https://github.com/pytorch/serve/blob/master/model-archiver/README.md):
+
+```bash
+torch-model-archiver --model-name <YOUR_MODEL_NAME_OF_CHOOSING> --version 1.0 --handler PATH/TO/instructor-embedding-handler.py --extra-files <DOWNLOADED_MODEL_DIR> --serialized-file <DOWNLOADED_MODEL_DIR>/pytorch_model.bin --f
+```
+
+**3.** Use [TorchServe](https://pytorch.org/serve/server.html) to startup the server and deploy the Instruction Embedding model you downloaded. 
+
+**Note:** Instructor Embedding models are around ~4 GB. By default, torchserve will autoscale workers (each with a loaded copy of the model). [At present](https://github.com/pytorch/serve/issues/2432), if you have memory concerns, you have to make use of the [Management API](https://pytorch.org/serve/management_api.html) to bring up the server and deploy your model.
+
+
+## Performing Inference
+To perform inference for an instruction and corresponding sentence, use the following format for the request body: 
+```json
+{
+    "inputs": [INSTRUCTION, SENTENCE]
+}
+```
+
+To perform batch inference, use the following format for the request body:
+```json
+{
+    "inputs": [
+        [INSTRUCTION_1, SENTENCE_1],
+        [INSTRUCTION_2, SENTENCE_2],
+        ...
+    ]
+}
+```
+
+## Example: Single Inference 
+Request Endpoint: /predictions/<model_name>
+
+Request Body:
+```json
+{
+    "inputs": ["Represent the Science title:", "3D ActionSLAM: wearable person tracking in multi-floor environments"]
+}
+```
+
+### Response:
+```json
+[
+  0.010738617740571499,
+  ...
+  0.10961631685495377
+]
+```
+
+## Example: Batch Inference 
+Request Endpoint: /predictions/<model_name>
+
+Request Body:
+```json
+{
+    "inputs": [
+        ["Represent the Science title:", "3D ActionSLAM: wearable person tracking in multi-floor environments"],
+        ["Represent the Medicine sentence for retrieving a duplicate sentence:", "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear."]
+    ]
+}
+```
+
+### Response:
+```json
+[
+  [
+    0.010738617740571499,
+    ...
+    0.10961631685495377
+  ],
+  [
+    0.014582153409719467,
+    ...
+    0.08006688207387924
+  ]
+]
+```
+
+**Note:** The above request example was for batch inference on 2 distinct instruction/sentence pairs.
diff --git a/examples/instruction_embedding/instructor-embedding-handler.py b/examples/instruction_embedding/instructor-embedding-handler.py
@@ -0,0 +1,26 @@
+from InstructorEmbedding import INSTRUCTOR
+from ts.torch_handler.base_handler import BaseHandler
+import logging
+
+logger = logging.getLogger(__name__)
+
+class InstructorEmbeddingHandler(BaseHandler):
+    def __init__(self):
+        super().__init__()
+        self.initialized = False
+        self.model = None
+
+    def initialize(self, context):
+        properties = context.system_properties
+        logger.info("Initializing Instructor Embedding model...")
+        model_dir = properties.get("model_dir")
+        self.model = INSTRUCTOR(model_dir)
+        self.initialized = True
+
+    def handle(self, data, context):
+        inputs = data[0].get("body").get("inputs")
+        if type(inputs[0]) == str:
+            # single inference
+            inputs = [inputs]
+        pred_embeddings = self.model.encode(inputs)
+        return [pred_embeddings.tolist()]
diff --git a/examples/instruction_embedding/requirements.txt b/examples/instruction_embedding/requirements.txt
@@ -0,0 +1 @@
+InstructorEmbedding