git clone https://github.com/atoma-network/atoma-node.git
cd atoma-node
Configure environment variables by creating .env file:
Start by running
cp .env.example .env
You should then see a file of the form:
# Hugging Face Configuration
HF_CACHE_PATH=~/.cache/huggingface
HF_TOKEN= # Required for gated models
# Inference Server Configuration
INFERENCE_SERVER_PORT=50000 # External port for vLLM service
MODEL=meta-llama/Llama-3.1-70B-Instruct
MAX_MODEL_LEN=4096 # Context length
GPU_COUNT=1 # Number of GPUs to use
TENSOR_PARALLEL_SIZE=1 # Should be equal to GPU_COUNT
# Sui Configuration
SUI_CONFIG_PATH=~/.sui/sui_config
# Atoma Node Service Configuration
ATOMA_SERVICE_PORT=3000 # External port for Atoma service
You need to fill the HF_TOKEN variable with your HuggingFace api key. See the official [website](https://huggingface.co/docs/hub/security-tokens) for more information on how to set an HF api key.
Configure config.toml, using config.example.toml as template, by running:
cp config.example.toml config.toml
You should now have a config.toml file with the following contents
[atoma-service]
inference_service_url = "http://vllm:8000" # Internal Docker network URL for inference service
embeddings_service_url = ""
multimodal_service_url = ""
models = ["meta-llama/Llama-3.1-70B-Instruct"] # Replace it with the list of models you want to deploy
revisions = [""]
service_bind_address = "0.0.0.0:3000" # Bind to all interfaces
[atoma-sui]
http_rpc_node_addr = ""
atoma_db = ""
atoma_package_id = ""
toma_package_id = ""
request_timeout = { secs = 300, nanos = 0 }
max_concurrent_requests = 10
limit = 100
node_small_ids = [0, 1, 2] # List of node IDs under control
task_small_ids = [] # List of task IDs under control
sui_config_path = "~/.sui/sui_config/client.yaml"
sui_keystore_path = "~/.sui/sui_config/sui.keystore"
[atoma-state]
database_url = "sqlite:///app/data/atoma.db"
You can set multiple services for your node such as inference, embeddings and multi-modal, by setting the public url.
Create required directories
mkdir -p data logs
Start the containers
If you plan to run a chat completions service:
# Build and start all services
COMPOSE_PROFILES=chat_completions_vllm docker compose up --build
# Or run in detached mode
COMPOSE_PROFILES=chat_completions_vllm docker compose up -d --build
For text embeddings:
# Build and start all services
COMPOSE_PROFILES=embeddings_tei docker compose up --build
# Or run in detached mode
COMPOSE_PROFILES=embeddings_tei docker compose up -d --build
For image generation:
# Build and start all services
COMPOSE_PROFILES=image_generations_mistral docker compose up --build
# Or run in detached mode
COMPOSE_PROFILES=image_generations_mistral docker compose up -d --build
It is possible to run any combination of the above, if a node has enough GPU compute available. For example to run all services simultaneously, simply run:
# Build and start all services
COMPOSE_PROFILES=chat_completions_vllm,embeddings_tei,image_generations_mistral docker compose up --build
# Or run in detached mode
COMPOSE_PROFILES=chat_completions_vllm,embeddings_tei,image_generations_mistral docker compose up -d --build
Container Architecture
The deployment consists of two main services:
vLLM Service: Handles the AI model inference
Atoma Node: Manages the node operations and connects to the Atoma Network
Service URLs
vLLM Service: http://localhost:50000 (configured via INFERENCE_SERVER_PORT)
Atoma Node: http://localhost:3000 (configured via ATOMA_SERVICE_PORT)