Author: Het Trivedi
-
A practical guide on using cutting-edge optimization techniques to speed up inference
7 min read -
A tutorial on using rerankers to improve your RAG pipeline
11 min read -
A guide on accelerating inference performance
16 min read -
Real-world benchmarks for Llama-2 13B
7 min read -
Running Falcon-7B in the cloud as a microservice
18 min read