osdi '24 - taming throughput-latency tradeoff in llm inference with sarathi-serve
Published 2 months ago • 581 plays • Length 15:36Download video MP4
Download video MP3
Similar videos
-
15:41
osdi '24 - serverlessllm: low-latency serverless inference for large language models
-
16:01
osdi '24 - llumnix: dynamic scheduling for large language model serving
-
16:12
osdi '24 - harvesting memory-bound cpu stall cycles in software with msh
-
16:02
osdi '24 - fairness in serving large language models
-
14:14
osdi '24 - nnscaler: constraint-guided parallelization plan generation for deep learning training
-
14:34
osdi '24 - dlora: dynamically orchestrating requests and adapters for lora llm serving
-
15:45
osdi '24 - microkernel goes general: performance and compatibility in the hongmeng production...
-
1:01:36
usenix atc '24 and osdi '24 - joint keynote address: scaling ai sustainably: an uncharted territory
-
3:59
#ocpsummit24: ai workload emulation
-
15:33
osdi '24 - servicelab: preventing tiny performance regressions at hyperscale through...
-
19:19
osdi '24 - when will my ml job finish? toward providing completion time estimates through...
-
14:10
osdi '24 - performance interfaces for hardware accelerators
-
20:19
usenix atc '24 - oper: optimality-guided embedding table parallelization for large-scale...
-
15:49
nsdi '24 - automatic parallelization of software network functions
-
14:27
osdi '24 - high-throughput and flexible host networking for accelerated computing
-
22:17
osdi '24 - mononn: enabling a new monolithic optimization space for neural network inference...
-
11:39
usenix security '23 - elasm: error-latency-aware scale management for fully homomorphic encryption
-
14:52
osdi '24 - distserve: disaggregating prefill and decoding for goodput-optimized large language...