Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

mookiezi/interface

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

41 Commits

Repository files navigation

Discord-Micae Model Chat Interface

A Python-based interactive CLI interface for chatting with Hugging Face language models, optimized for casual, Discord-style conversation using ChatML. Supports both quantized and full-precision models, live token streaming with color formatting, and dynamic generation parameter adjustment.


Interface Screenshot


Features

  • Multiple Model Formats

    • Hugging Face Transformers (AutoModelForCausalLM)
    • GGUF (llama.cpp) backend
    • LoRA adapter loading
    • 4-bit / 8-bit quantization with bitsandbytes
  • Custom Prompt Controls

    • Chain-of-Thought context management
    • Raw blank mode, no system prompts, or assistant-only modes
    • DeepHermes and ChatML formatting options
    • Optional code detection and filtering
  • Interactive Chat

    • Multi-line input with prompt_toolkit
    • Persistent conversation history (/back, /clear)
    • Runtime parameter adjustment (/min, /max, /temp, /p, /k, /r, /rh)
  • Streaming Output

    • Token-by-token display with Rich coloring
    • Emoji filtering and cleanup
    • Automatic lowercasing rules
    • EOS-Aware Extension: starts with a short randomized budget (40–75 tokens), then automatically extends generation in steps (64 tokens) until <|im_end|> or EOS is reached, a hard cap (1024 tokens), or manual /stop is triggered

Installation

Install with requirements.txt:

pip install -r requirements.txt

Or install manually:

pip install torch transformers peft bitsandbytes prompt_toolkit rich

Optional dependencies

If using GGUF (llama.cpp models):

pip install llama-cpp-python

CLI Arguments (with defaults)

usage: interface.py [-h] [-c] [-m MODEL]
 [--deephermes] [--gguf] [--gguf-chat-format FORMAT]
 [--blank] [--assistant-system-combo] [--assistant-system]
 [--just-system-prompt] [--no-system-prompt]
 [--no-assistant-prompt] [--code-check]
 [--quantization] [--bnb-4bit] [--bnb-8bit]
 [--custom-tokens]
optional arguments:
 -h, --help Show this help message and exit
 -m MODEL, --model MODEL Model path or Hugging Face repo ID
 (default: mookiezii/Discord-Hermes-3-8B)
Feature toggles:
 -m, --model Model path or Hugging Face repo ID (default: mookiezii/Discord-Hermes-3-8B)
 -q, --quant Quantization mode: 4 or 8 (default: off). Use `-q` (no value) for 4-bit, or `-q 8` for 8-bit
 -fl, --frozen-lora Model path or Hugging Face repo ID of the base LoRa adapter to load and freeze
 -c, --checkpoint Model path or Hugging Face repo ID of the LoRa adapter to load
 -chs, --checkpoint-subfolder Subfolder of the path or Hugging Face repo ID of the LoRa adapter to load
 --deephermes Enable DeepHermes formatting instead of ChatML
 --gguf Use GGUF model format with llama.cpp backend
 --gguf-chat-format Chat format for GGUF models (default: "chatml")
 --blank Raw user input only, no prompts/system context
 -asc, --assistant-system-combo Include both system and assistant system prompts
 -as, --assistant-system Use assistant system prompt instead of standard
 --just-system-prompt Use only the system prompt with user input
 --no-system-prompt Do not include system prompt
 --no-assistant-prompt Do not include assistant prompt
 --code-check Enable code detection and filtering via classifier
 -au, --auto Run preset inputs (hello → what do you do → wow tell me more) 5 times with /clear in between, then exit

Default Parameters

  • MIN_NEW_TOKENS = 1
  • MAX_NEW_TOKENS = random.randint(40, 75)
  • TEMPERATURE = random.uniform(0.5, 0.9)
  • TOP_P = random.uniform(0.7, 0.9)
  • TOP_K = random.randint(40, 75)
  • MIN_P = 0.08
  • NO_REPEAT_NGRAM_SIZE = 3
  • REPETITION_PENALTY = 1.2
  • EOS Handling = <|im_end|> and tokenizer.eos_token_id (extension continues until one is reached, or hard cap of 1024 tokens)

Commands

Command Description
/clear /reset /c Clear conversation history
/back /b Undo last user+assistant exchange and preview recent history
/h VAL Enable Chain-of-Thought with last VAL exchanges (default: all available)
/d Disable Chain-of-Thought
/min VAL Set min_new_tokens to VALb
/max VAL Set max_new_tokens to VAL
/temp VAL or /t VAL Set temperature to VAL
/p VAL Set top_p to VAL
/k VAL Set top_k to VAL
/params /settings Show current generation parameters
/r Randomize parameters (short-range defaults)
/rh Randomize parameters with high variance (wider temp/top_p/top_k ranges)
/stop Toggle extension ON/OFF (controls continuation beyond initial budget)

License

MIT License

About

A Python-based interactive CLI interface for chatting with Hugging Face language models, optimized for casual, Discord-style conversation using ChatML. Supports both quantized and full-precision models, live token streaming with color formatting, and dynamic generation parameter adjustment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /