Triton Inference Server Customization

Github repository: T5 python backend triton server


This repository provides an example of customizing a Python backend for the Triton Inference Server. The implementation demonstrates how to modify the Triton server to support specific and efficient deep learning inference workflows.


Getting Started


Ensure you have the following dependencies installed:


Clone the repository:

git clone
cd triton-test-t5

Running the Triton Server

You can start the Triton server with the custom model by running:

docker build -t tritonserver-custom .

docker run --gpus=all -it --shm-size=256m --rm -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}:/workspace/  -v ${PWD}/model_repository:/models tritonserver-custom

Testing the Inference

You can send health check requests using curl:

curl --location 'http://localhost:8000/v2/health/ready'

And send inference requests:

curl --location '' \
--header 'Content-Type: application/json' \
--data '{
        "inputs": [
                "name": "input_text",
                "shape": [1],  
                "datatype": "BYTES",
                "data": ["abc"]

Modifying the Custom Model

You can edit in the model repository to modify the inference logic. Ensure that your script follows the Triton Python backend model structure.

I hope it’s helpful for you