vLLM Inference Server with Intel Gaudi

The following sections provide instructions for setting up and using vLLM with Intel® Gaudi® AI accelerator. They include troubleshooting tips, performance tuning guidelines, warmup time optimization strategies, instructions for enabling FP8 calibration and inference, deploying vLLM containers, and profiling methods. Additionally, answers to common questions are provided to assist with getting started.