Path to the model on the filesystem.
OptionalbatchPrompt processing batch size.
OptionalcontextText context size.
OptionalembeddingEmbedding mode only.
Optionalf16Use fp16 for KV cache.
OptionalgbnfGBNF string to be used to format output. Also known as grammar.
OptionalgpuNumber of layers to store in VRAM.
OptionaljsonJSON schema to be used to format output. Also known as grammar.
OptionallogitsThe llama_eval() call computes all logits, not just the last one.
OptionalmaxOptionalprependAdd the begining of sentence token.
OptionalseedIf null, a random seed will be used.
OptionaltemperatureThe randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.
OptionalthreadsNumber of threads to use to evaluate tokens.
OptionaltopKConsider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature > 0.
OptionaltopPSelects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature > 0.
OptionaltrimTrim whitespace from the end of the generated text Disabled by default.
OptionaluseForce system to keep model in RAM.
OptionaluseUse mmap if possible.
OptionalvocabOnly load the vocabulary, no weights.
Note that the modelPath is the only required parameter. For testing you can set this in the environment variable
LLAMA_PATH.