Interesting, I got very different results depending on how I ran the model, will... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		veryrealsid on June 14, 2024 \| parent \| context \| favorite \| on: Cost of self hosting Llama-3 8B-Instruct Interesting, I got very different results depending on how I ran the model, will definitely give this a try! edit: Actually could you share how long it took to make a query? One of our issues is we need it to respond in a fast time frame

jezzarax on June 14, 2024 [–]

I checked some logs from my past experiments, the decoding went for about 400 tps over a ~3k token query, so about 7 seconds to process it, and then the generation speed was about 28 tokens.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact