Showing posts with label software development. Show all posts
Showing posts with label software development. Show all posts

Thursday, October 2, 2025

Is Worse Really Better?

In Richard Gabriel's essay “Worse is Better ”, Gabriel contrasts the “MIT approach” of designing for correctness and completeness with the “New Jersey approach” of designing for simplicity of implementation. He argues that the MIT approach, which is more principled, is likely to be overtaken by the New Jersey approach, which is easier to reproduce. While writing the prompt for the Advent of Code problems, I noticed that my prompt was reminiscent of Gabriel's characterizations. I decided to get the LLM to compare the two approaches by running it on each type of prompt and seeing how the output differed.

Prompts

The salient part of the MIT prompt is

As an Elite Common Lisp Developer, your unwavering and paramount mission is to design and meticulously craft Common Lisp programs that are not only correct, but also simple, efficient, and robust. Your programs are not mere instructions; they are masterpieces of clarity and precision, firmly grounded in these foundational, non-negotiable principles:

  • Correctness Your program must be correct in all observable aspects. Incorrectness is simply not allowed.
  • Consistency Your program must be consistent. Consistency is as important as correctness. Simplicity may be sacrificed for consistency and correctness.
  • Simplicity Your program must be as simple as possible, but no simpler. It is more important for the interface to be simple than the implementation. Simplicity is not allowed to overly reduce completeness.
  • Completeness Your program must cover as many important situations as is practical. All reasonably expected situations must be handled.

The salient part of the New Jersey prompt is

As an Expert Common Lisp Developer, your primary and overriding objective is to create Common Lisp programs that are simple. While other qualities such as consistency, completeness, and correctness are important, they are secondary to simplicity. Your programs should be designed with the following prioritized principles in mind:

  • Simplicity Your program must be simple, both in interface and implementation. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in the design.
  • Correctness Your program must be correct in all observable aspects. It is slightly better that your program be simple than correct.
  • Consistency Your program must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
  • Completeness Your program must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must be sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.

Both prompts had additional instructions about how to find the input and where to write the output.

Results

I had Gemini analyze the solution pairs to the Advent of Code problems. In almost every case, Gemini preferred the MIT approach to the New Jersey approach. Here is an example analysis of day 10:

Excellent. I have the contents of both files. I will now proceed with a detailed analysis and comparison, as requested.

Analysis of day10_right_thing_solution.lisp

This solution presents a very clean, structured, and pragmatic approach. It reads as a piece of code written by a developer who values clarity and standard algorithmic patterns.

Structure:

  • The code is well-organized into distinct functions, each with a single, clear purpose: parse-input, find-trailheads, get-neighbors, calculate-score (for Part 1), calculate-rating (for Part 2), and a main solve function to orchestrate the process.
  • This separation of concerns makes the code easy to read, test, and debug. The main solve function provides a high-level summary of the program's execution flow.

Algorithms & Style:

  • Part 1 (calculate-score): This function uses a classic iterative Breadth-First Search (BFS). The use of a queue (q) and a visited hash table is a textbook implementation for graph traversal. This approach is highly efficient and guarantees finding the shortest path in terms of steps, though that is not a requirement here. It is robust and not susceptible to stack depth limits.
  • Part 2 (calculate-rating): This function employs a bottom-up dynamic programming strategy. It builds a path-counts table, iterating through the heights from 1 to 9. For each cell, it calculates the number of paths to it by summing the paths from its valid, lower neighbors. This is a very efficient and elegant method that avoids the combinatorial explosion of a naive search.
  • Lisp Idioms: The code uses common and readable Lisp constructs like loop, dotimes, destructuring-bind, and let*. The use of hash tables for visited sets is appropriate and performant.

Strengths:

  • Clarity and Readability: The code is exceptionally clear. Function names are descriptive, and the logic is straightforward and easy to follow.
  • Robustness: The iterative algorithms are safe from stack overflow errors, making the solution suitable for arbitrarily large inputs.
  • Efficiency: Both the BFS and the bottom-up DP are highly efficient algorithms for their respective problems.

Weaknesses:

  • Minor Environmental Dependency: It relies on a placeholder readFileLines function, making it slightly less portable out-of-the-box compared to a solution using a standard library like UIOP.

Analysis of day10_new_jersey_solution.lisp

This solution uses a recursive approach, which can be very elegant but also carries certain trade-offs. It feels more like a direct translation of a mathematical recurrence relation.

Structure:
  • The overall structure is similar, with parsing and solving functions. However, the core algorithmic logic is encapsulated within local functions defined by labels.
  • This makes the solver functions more self-contained but also increases their cognitive complexity, as the main logic is nested one level deeper.
Algorithms & Style:
  • Part 1 (solve-part1): This function uses a recursive Depth-First Search (DFS) with memoization. The find-reachable-nines local function explores paths recursively. The memo hash table is crucial to prevent re-computation and avoid infinite loops, effectively turning the search into a dynamic programming solution on a graph. The use of union to merge results from recursive calls can be inefficient, as it requires traversing lists.
  • Part 2 (solve-part2): This also uses a recursive DFS with memoization (top-down dynamic programming). The count-paths function is a classic example of this pattern. It computes the number of paths from a given cell and stores the result in the memo table before returning it.
  • Lisp Idioms: The use of labels for recursion is a very common and powerful Lisp idiom. The use of cons cells for coordinates is also common, though perhaps slightly less explicit than a list. It correctly uses uiop:read-file-lines for better portability.
Strengths:
  • Elegance of Recursion: For those comfortable with recursion, the code can be seen as a very direct and elegant expression of the problem's structure.
  • Portability: The use of uiop for file reading is a good practice.
Weaknesses:
  • Stack Overflow Risk: The primary weakness is its reliance on deep recursion. For a sufficiently large or complex input grid, this solution could exceed the call stack limit and crash, whereas an iterative solution would not.
  • Lower Readability: The combination of recursion, memoization, and nested functions can make the logic harder to trace and debug compared to the straightforward iterative loops in the first solution.
  • Potential Inefficiency: The use of union in Part 1 is likely less performant than the hash-table approach used in the first solution for tracking unique reachable nodes.
### Comparison and Final Recommendation
Feature right_thing_solution new_jersey_solution
Paradigm Iterative (BFS, Bottom-up DP) Recursive (DFS with Memoization)
Robustness Superior. Immune to stack overflow. Weaker. Vulnerable to stack overflow.
Clarity Superior. Clear, flat structure. Standard algorithms. Good, but more complex. Nested logic can be harder to follow.
Performance Excellent. Efficient, standard algorithms. Good, but recursion overhead and union may be slower.
Style Pragmatic, clear, maintainable. Elegant, concise, mathematical.
Recommendation

While both files represent valid and functional solutions to the problem, I unequivocally recommend day10_right_thing_solution.lisp as the better overall solution.

Justification:

  1. Robustness is Paramount: The most critical advantage of the "Right Thing" solution is its immunity to stack overflow errors. This makes it fundamentally more reliable and suitable for general-purpose use where input constraints may not be known. In a professional software engineering context, this is a decisive factor.
  2. Clarity Enhances Maintainability: The code in the "Right Thing" solution is significantly easier to read, understand, and maintain. A developer can grasp the BFS and bottom-up DP algorithms quickly, making debugging and future modifications simpler. The recursive nature of the "New Jersey" solution, while elegant, requires more mental effort to trace.
  3. Efficiency and Predictability: The iterative approach offers more predictable performance. The BFS algorithm and the systematic table-filling of the DP solution have well-understood and excellent time complexities, free from the overhead of function calls inherent in recursion.

In conclusion, the "New Jersey" solution is a fine example of a recursive, top-down DP approach. However, the "Right Thing" solution embodies a superior engineering strategy, prioritizing robustness, clarity, and maintainability, which are the hallmarks of high-quality, professional-grade code. It is the solution I would write and the one I would want to inherit in a team setting.

Friday, September 26, 2025

A Prompt to Create a Starter Common Lisp Project

I decided to try some prompt engineering. The following prompt will query the user for a project name and instantiate a project in ~/quicklisp/local-projects/ with an initial system definition, package file, and basic files for the project all set up and ready to load. It works on my machine, but your milage may vary. This is just an example prompt, it assumes you like named-let, fold, and series. You should tweak this prompt to your tastes. Let me know if it works for you.

Perform these steps:
 0) Pay careful attention to the directory paths and filenames used below. Avoid typos and do not be sloppy.
 1) Query the user for a case-sensitive project name like `Foo`. Call this the `case-sensitive-system-name`.
 2) Convert the `case-sensitive-system-name` to a lower case string to get the `system-name`.
 3) Convert the `case-sensitive-system-name` to an upper case string to get the `package-name`.
 4) If the `~/quicklisp/` directory exists, list the directory contents. After the tool returns the list, display the complete list of files to the user.
 5) If the `~/quicklisp/local-projects/` exists, list the directory contents. After the tool returns the list, display the complete list of files to the user.
 6) Check for existence of directory of `~/quicklisp/local-projects/{case-sensitive-system-name}/`. If it does not exist, create it. This is the `project-root` directory.
 7) If project-root directory is not a git repository, make it be a git repository.
 8) Create a `{project-root}/src/` subdirectory.
 9) Create an appropriate `README.md` file in the project-root directory.
 10) Stage the `README.md` for git.
 11) Create `{project-root}/src/package.lisp` file. 
 * This file should have a comment line indicating the emacs major mode and file encoding (utf-8) followed by a blank line.
 * This file should have a defpackage form that defines a package named {system-name}. 
 * The package should shadowing-import `compose' from `function`.
 * The package should shadowing-import `let` and `named-lambda` from `named-let`.
 * The package should shadowing-import `defun`, `funcall`, `let*`, and `multiple-value-bind` from `series`. 
 * The :shadowing-import clauses should be first.
 * The package :use clause should be last.
 * The package should use `cl`, `alexandria`, `function`, `fold`, `named-let`, `promise`, and `series`.
 **Always use upper-case strings to name the packages, like the following: (defpackage \"MY-PACKAGE\" (:use \"CL\" \"ALEXANDRIA\")) **.
 **Always use upper-case strings to name the symbols**, like `(:shadowing-import-from \"SERIES\" \"DEFUN\" \"FUNCALL\" \"LET*\)
 12) Now create some lisp files in the `{project-root}/src/` directory. Each file should have a comment line indicating the emacs major mode and file encoding (utf-8) followed by a blank line. Each file should have an `in-package` form that uses the {package-name}. **Always use upper case strings to name the package in the `in-package` form, for example `(in-package \"MY-PACKAGE\")**. Each file should contain a comment describing the purpose of the file. Each file should include a sample Lisp form appropriate for the file.
 a) `data.lisp` - purpose: basic data structures and classes.
 b) `generics.lisp` - purpose: to define signatures of generic functions.
 c) `macros.lisp` - purpose: base macros
 d) `misc.lisp` - purpose: miscellaneous low-level lisp functions.
 e) `vars.lisp` - purpose: to hold global variables, constants, and parameters
 f) `{system-name}.lisp` - purpose: entry point of program.
 13) Create a `{system-name}.asd` file in the `{project-root}` directory.
 * It should have a comment line indicating the emacs major mode and file encoding (utf-8) followed by a blank line.
 * It should *not* have an `in-package` form. 
 * It should have one defsystem form.
 * The defsystem form should not be package qualified.
 * The defsystem should define a system named by the string {system-name}.
 * The defsystem should have dependencies on `alexandria`, `function`, `fold`, `named-let`, `series`, and `str`.
 * The depended upon systems should be named with lower case strings.
 * It should have one module called `src`.
 * The `src` module should have the file components of the files created above, listed alphabetically.
 * The `package` file should have no dependencies.
 * All other files should at least depend on `package`. 
 * All files other than `package` and `macros` should depend on `macros`.
 * The `{system-name}` file should depend on the other lisp files.
 14) Stage all the lisp files and the system definition file.
 15) Commit.

Monday, September 15, 2025

Analyzing Lisp Redux: One Form At a Time

Instead of analyzing an entire Common Lisp file at once, you might want to analyze the file one function (or top-level form) at a time. Functions are basically black-box abstractions. So long as the arguments and return values don’t change (and the side effects are preserved), the implementation can be completely replaced. The LLM is much more constrained in this case. It cannot make changes to the signature of the function or make interprocedural changes to the code. While this makes a large class of improvements impossible, it also makes a large class of bugs impossible and greatly narrows the search space of code changes.

We use a specialized READ-ANALYZE-PRINT loop. We use a special version of read that preserves comments (see yesterday’s post) to read the file one top-level form at a time. Each top-level form is presented to the LLM (along with any associated comments) for analysis. For each form, the LLM is instructed to describe the purpose of the form, to identify any potential bugs, to check for adherence to best practices, and to suggest ways to improve the code.

The system instruction for analysis is as follows:

"You are a world class Common Lisp programmer."
"You will be analyzing a Common Lisp file one top-level form at a time."
"Your analysis should be thorough and insightful, demonstrating
a deep understanding of Common Lisp programming practices."

In addition we have these directives:

"If there is no package definition, assume an appropriate
one exists elsewhere and do not mention this."

Without this directive, the LLM will complain about files that simply have an appropriate (in-package ...) form.

"Assume that popular utility packages such as alexandria
and series have been loaded and made available."
"Assume that undefined functions are defined elsewhere and do not mention this."

Without these directives, the LLM complains about undefined functions. We specifically tell it to assume we know what we are doing.

"Do not suggest using defconstant, even if it would be appropriate."

This is an odd one, but the LLM has a strong tendency to suggest using defconstant for values that do not seem to change. In many cases, we want to allow the user the option to modify the value at runtime or to reload the file. Reloading a file with defconstant forms will often cause an error, even if the value hasn’t changed, because the reloaded value is equal but not eql to the original value.

Then we direct the LLM to perform the analysis:

"For each top-level form:"
" * Determine its purpose."
" * Determine whether a library function might achieve the same thing."
" * Check that it follows best practices."
" * Look for typical Common Lisp errors and pitfalls."
" * Look for bugs."
"Then report on the form:"
" * Describe the purpose of the form."
" * If the form is missing a docstring, suggest one."
" * If the form fails to follow best practices, say so."
" * If there are bugs or errors, describe them."
" * If improvements can be made, give up to three concrete, actionable suggestions for improvement."
" * For each suggestion, indicate whether it is a critical improvement or merely nice to have."
" * For each suggestion, describe your reasoning."
" * If the form is error free and does not need improvement, say so."

The prompt for each top-level form recapitulates the form specific section of the system instruction. This is followed by the form itself.

Example Analysis

Here is an example of the analysis of the analyze.lisp file.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 9.23 seconds.
;; Prompt Tokens: 7,160
;; Thoughts Tokens: 766
;; Candidate Tokens: 94
;; Accumulated Prompt Tokens: 8,276,974
;; Accumulated Response Tokens: 341,259
;; Analyzing form:
;;; -*- Lisp -*-
(in-package "GEMINI")

The purpose of the form (in-package "GEMINI") is to set the current package for the subsequent forms in the file to "GEMINI". This is a fundamental Common Lisp construct used to organize code and manage symbol namespaces, ensuring that symbols defined or referenced in this file are interned within the "GEMINI" package.

This form is error-free and does not need improvement. It follows best practices for Common Lisp file organization.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 8.84 seconds.
;; Prompt Tokens: 7,577
;; Thoughts Tokens: 652
;; Candidate Tokens: 140
;; Accumulated Prompt Tokens: 8,284,551
;; Accumulated Response Tokens: 342,051
;; Analyzing form:
(defparameter +analyze-source-system-instruction+
 "You are a world class Common Lisp programmer. You will be analyzing files from the '~a' system. Your analysis should be thorough and insightful, demonstrating a deep understanding of Common Lisp programming practices."
 "LLM system instruction to analyze a Common Lisp ASDF system.")

The purpose of this defparameter form is to define a special variable named +analyze-source-system-instruction+. This variable holds a format string that serves as a system instruction or prompt template for an LLM (Large Language Model). The instruction guides the LLM to act as a Common Lisp programmer analyzing files from a specified ASDF system, with ~a being a placeholder for the system’s name.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, follows the +earmuffs+ naming convention for special variables, and includes a clear docstring.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 6.94 seconds.
;; Prompt Tokens: 8,069
;; Thoughts Tokens: 523
;; Candidate Tokens: 163
;; Accumulated Prompt Tokens: 8,292,620
;; Accumulated Response Tokens: 342,737
;; Analyzing form:
(defparameter +analyze-system-definition-prompt+
 (str:join
 #\Space
 (list
 "The following is the definition of the `~a` system."
 "Analyze the system definition, checking for best practices."
 "Make sure there is a primary package definition component that has no *other internal ASDF dependencies* and is depended upon by all other components."))
 "LLM prompt to analyze a Common Lisp ASDF system definition.")

The purpose of this defparameter form is to define a special variable named +analyze-system-definition-prompt+. This variable holds a string that serves as a prompt template for an LLM. The prompt instructs the LLM to analyze a Common Lisp ASDF system definition, specifically checking for best practices related to package definitions and component dependencies, emphasizing the need for a primary package component without internal ASDF dependencies that all other components depend on.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, follows the +earmuffs+ naming convention for special variables, and includes a clear docstring. The use of str:join to construct the prompt string is idiomatic and readable.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 21.61 seconds.
;; Prompt Tokens: 8,824
;; Thoughts Tokens: 1,703
;; Candidate Tokens: 557
;; Accumulated Prompt Tokens: 8,301,444
;; Accumulated Response Tokens: 344,997
;; Analyzing form:
(defun analyze-system-definition (system-name)
 "Analyzes the ASDF system definition file for the specified SYSTEM-NAME.
 It checks for best practices, especially concerning the primary package
 definition component and its internal ASDF dependencies, by sending
 the system definition to an LLM for detailed analysis."
 (check-type system-name (or string symbol))
 (let ((asdf-system (asdf:find-system system-name)))
 (unless asdf-system
 (error "ANALYZE-SYSTEM-DEFINITION: ASDF system '~a' not found." system-name))
 (let ((source-file (asdf:system-source-file asdf-system)))
 (unless (and source-file (probe-file source-file))
 (error "ANALYZE-SYSTEM-DEFINITION: Source file for system '~a' not found." system-name))
 (let ((*system-instruction*
 (content
 :parts
 (list
 (part (format nil +analyze-source-system-instruction+ system-name)))
 :role "system")))
 (invoke-gemini
 (list
 (part (format nil +analyze-system-definition-prompt+ system-name))
 (part (handler-case (uiop:read-file-string source-file)
 (error (c)
 (error "ANALYZE-SYSTEM-DEFINITION: Failed to read source file '~a': ~a"
 source-file c))))))))))

The purpose of the analyze-system-definition function is to retrieve the ASDF system definition file for a given system-name, read its contents, and then send this content along with a specific prompt and system instruction to an LLM (Large Language Model) for analysis. The LLM is instructed to act as a Common Lisp programmer and analyze the system definition for best practices, particularly concerning the primary package definition component and its dependencies.

The function has a good docstring that clearly explains its purpose, arguments, and the process it undertakes.

The form is largely error-free and follows best practices, including robust error handling for missing systems, non-existent source files, and file reading failures, as well as check-type for argument validation.

Here are some minor suggestions for improvement:

  1. Minor Improvement: Refine error message consistency.
    • Reasoning: While the current error messages are informative, adopting a consistent prefix for all errors originating from this function (e.g., "ANALYZE-SYSTEM-DEFINITION Error: ...") can improve readability in logs and make it easier to identify the source of an error.
    • Actionable Suggestion: Modify the error messages to include a consistent prefix, for example:
      • (error "ANALYZE-SYSTEM-DEFINITION Error: ASDF system '~a' not found." system-name)
      • (error "ANALYZE-SYSTEM-DEFINITION Error: Source file for system '~a' not found." system-name)
      • (error "ANALYZE-SYSTEM-DEFINITION Error: Failed to read source file '~a': ~a" source-file c)
  2. Barely worth mentioning: Consider defgeneric and methods for invoke-gemini context.
    • Reasoning: The use of a special variable *system-instruction* to pass context to invoke-gemini is a valid Common Lisp pattern. However, for more complex scenarios or if invoke-gemini were to have different ways of handling system instructions based on the context, a defgeneric with methods could offer a more extensible and object-oriented approach. For this specific use case, the special variable is perfectly adequate and simpler.
    • Actionable Suggestion: (No concrete action needed for this specific form, as the current approach is fine.) If the LLM interaction logic becomes more intricate, consider defining invoke-gemini as a generic function and dispatching on context objects rather than relying solely on special variables.

The form is otherwise error-free and well-written.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 6.14 seconds.
;; Prompt Tokens: 9,796
;; Thoughts Tokens: 345
;; Candidate Tokens: 183
;; Accumulated Prompt Tokens: 8,311,240
;; Accumulated Response Tokens: 345,525
;; Analyzing form:
(defparameter +analyze-source-file-prompt+
 "Given the following ASDF system definition for '~a' and the source code for component '~a', analyze the component's code and provide a detailed summary of its functionality, purpose, and any notable features or patterns you observe. In addition, note where the code is not following best practices, where there are potential bugs, and where the code can be improved. Give up to three concrete, actionable suggestions for how the code could be improved or refactored. If there are no suggestions for improvement, say so explicitly. Indicate for each suggestion whether the change is a critical improvement or merely nice to have. Be sure to include your reasoning for each suggestion."
 "LLM prompt to analyze a Common Lisp source file.")

The purpose of this defparameter form is to define a special variable named +analyze-source-file-prompt+. This variable stores a detailed prompt template for an LLM (Large Language Model). The prompt instructs the LLM to analyze a Common Lisp source file, given the context of its ASDF system definition. The LLM is expected to provide a summary of the component’s functionality, identify best practice deviations, potential bugs, and offer up to three concrete, actionable suggestions for improvement, including reasoning and an indication of criticality.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The prompt itself is well-crafted to elicit a comprehensive analysis from the LLM.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 5.62 seconds.
;; Prompt Tokens: 10,450
;; Thoughts Tokens: 273
;; Candidate Tokens: 311
;; Accumulated Prompt Tokens: 8,321,690
;; Accumulated Response Tokens: 346,109
;; Analyzing form:
(defun get-system-and-component (system-name component-name)
 "Retrieves the ASDF system object and the specified component object
 for a given component in a specified system. Returns two values:
 the ASDF system object and the component object. Signals an error
 if the system or component is not found."
 (check-type system-name (or string symbol))
 (check-type component-name (or string symbol))
 (let ((asdf-system (asdf:find-system system-name)))
 (unless asdf-system
 (error "GET-SYSTEM-AND-COMPONENT: ASDF system '~a' not found." system-name))
 (let ((component (asdf:find-component asdf-system component-name)))
 (unless component
 (error "GET-SYSTEM-AND-COMPONENT: Component '~a' not found in system '~a'." component-name system-name))
 (values asdf-system component))))

The purpose of the get-system-and-component function is to locate and return both the ASDF system object and a specific component object within that system, given the system’s name and the component’s name. It returns these two objects as multiple values. The function includes robust error handling, signaling an error if either the system or the component cannot be found.

The function has an excellent docstring that clearly describes its purpose, arguments, return values, and error conditions.

The form is error-free and follows best practices, including check-type for argument validation and clear, specific error messages.

Here is a minor suggestion for improvement:

  1. Minor Improvement: Refine error message consistency.
    • Reasoning: While the current error messages are informative, adopting a consistent prefix for all errors originating from this function (e.g., "GET-SYSTEM-AND-COMPONENT Error: ...") can improve readability in logs and make it easier to identify the source of an error.
    • Actionable Suggestion: Modify the error messages to include a consistent prefix, for example:
      • (error "GET-SYSTEM-AND-COMPONENT Error: ASDF system '~a' not found." system-name)
      • (error "GET-SYSTEM-AND-COMPONENT Error: Component '~a' not found in system '~a'." component-name system-name)

The form is otherwise error-free and well-written.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 17.45 seconds.
;; Prompt Tokens: 11,416
;; Thoughts Tokens: 757
;; Candidate Tokens: 1,018
;; Accumulated Prompt Tokens: 8,333,106
;; Accumulated Response Tokens: 347,884
;; Analyzing form:
(defun get-system-and-component-contents (system-name component-name)
 "Retrieves the ASDF system object, its definition file content,
 the ASDF component object, and its source file content.
 Returns four values:
 1. The ASDF system object.
 2. A string containing the system definition file's content.
 3. The ASDF component object.
 4. A string containing the component's source file content.
 Signals an error if the system, component, or their respective
 source files are not found."
 (multiple-value-bind (system component) (get-system-and-component system-name component-name)
 (let ((system-pathname (asdf:system-source-file system))
 (component-pathname (asdf:component-pathname component)))
 (unless (and system-pathname (probe-file system-pathname))
 (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Source file for system '~a' not found." system-name))
 (unless (and component-pathname (probe-file component-pathname))
 (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Source file for component '~a' not found." component-name))
 (values system (handler-case (uiop:read-file-string system-pathname)
 (error (c)
 (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Failed to read system definition file '~a': ~a"
 system-pathname c)))
 component (handler-case (uiop:read-file-string component-pathname)
 (error (c)
 (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Failed to read component source file '~a': ~a"
 component-pathname c)))))))

The purpose of the get-system-and-component-contents function is to retrieve comprehensive information about a specific ASDF system and one of its components. It returns four values: the ASDF system object, the content of the system’s definition file as a string, the ASDF component object, and the content of the component’s source file as a string. The function ensures robustness by signaling errors if the system, component, or their respective source files cannot be found or read.

The function has an excellent docstring that clearly describes its purpose, the four values it returns, and its error conditions.

The form is error-free and follows best practices, including the use of multiple-value-bind for handling multiple return values, probe-file for checking file existence, and handler-case for robust file reading with informative error messages.

Here are two suggestions for improvement:

  1. Minor Improvement: Refine error message consistency.
    • Reasoning: While the current error messages are informative, adopting a consistent prefix for all errors originating from this function (e.g., "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: ...") can improve readability in logs and make it easier to identify the source of an error.
    • Actionable Suggestion: Modify the error messages to include a consistent prefix, for example:
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for system '~a' not found." system-name)
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for component '~a' not found." component-name)
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read system definition file '~a': ~a" system-pathname c)
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read component source file '~a': ~a" component-pathname c)
  2. Minor Improvement: Extract a helper function for safe file reading.
    • Reasoning: The pattern of reading a file string with handler-case to catch and re-signal errors is repeated twice within this function. Extracting this logic into a small helper function would improve code conciseness, adhere to the DRY (Don’t Repeat Yourself) principle, and make the main function body cleaner.
    • Actionable Suggestion: Define a local helper function (e.g., read-file-string-safely) that takes a pathname and an error message format string, and then use this helper for both system-pathname and component-pathname.
       ```lisp
       (defun get-system-and-component-contents (system-name component-name)
       "Retrieves the ASDF system object, its definition file content,
       the ASDF component object, and its source file content.
       Returns four values:
       1. The ASDF system object.
       2. A string containing the system definition file's content.
       3. The ASDF component object.
       4. A string containing the component's source file content.
       Signals an error if the system, component, or their respective
       source files are not found."
       (multiple-value-bind (system component) (get-system-and-component system-name component-name)
       (let ((system-pathname (asdf:system-source-file system))
       (component-pathname (asdf:component-pathname component)))
       (unless (and system-pathname (probe-file system-pathname))
       (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for system '~a' not found." system-name))
       (unless (and component-pathname (probe-file component-pathname))
       (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for component '~a' not found." component-name))
       (flet ((read-file-string-safely (pathname error-format-string &rest args)
       (handler-case (uiop:read-file-string pathname)
       (error (c)
       (apply #'error error-format-string pathname c args)))))
       (values system (read-file-string-safely
       system-pathname
       "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read system definition file '~a': ~a")
       component (read-file-string-safely
       component-pathname
       "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read component source file '~a': ~a"))))))
       ```
       
;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 5.99 seconds.
;; Prompt Tokens: 12,744
;; Thoughts Tokens: 426
;; Candidate Tokens: 174
;; Accumulated Prompt Tokens: 8,345,850
;; Accumulated Response Tokens: 348,484
;; Analyzing form:
(defparameter +system-definition-context-instruction+
 "What follows is the definition of the `~a` system. Do not analyze this file, it is here so that you understand the dependencies within the system."
 "LLM instruction to provide system definition as context without analysis.")

The purpose of this defparameter form is to define a special variable named +system-definition-context-instruction+. This variable stores a format string that serves as an instruction for an LLM (Large Language Model). The instruction explicitly tells the LLM that a subsequent piece of text is an ASDF system definition, but it should be used only for contextual understanding of system dependencies and not for analysis itself. The ~a placeholder is intended for the system’s name.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The instruction itself is well-phrased to guide the LLM’s behavior.

;; Invoking Gemini API model `gemini-2.5-flash`...
GC 3521
;; Gemini API finished in 11.46 seconds.
;; Prompt Tokens: 13,433
;; Thoughts Tokens: 652
;; Candidate Tokens: 457
;; Accumulated Prompt Tokens: 8,359,283
;; Accumulated Response Tokens: 349,593
;; Analyzing form:
(defun analyze-component (system-name component-name)
 "Analyzes the source code of a specific ASDF component within a given system
 using an LLM. It provides the LLM with the system definition for context
 and the component's source code. The LLM is prompted to summarize
 functionality, purpose, and features, identify deviations from best practices,
 potential bugs, and suggest up to three concrete improvements with reasoning
 and severity."
 (multiple-value-bind (system system-contents component component-contents)
 (get-system-and-component-contents system-name component-name)
 (declare (ignore system component))
 (let ((*system-instruction*
 (content
 :parts
 (list
 (part (format nil +analyze-source-system-instruction+ system-name)))
 :role "system")))
 (invoke-gemini
 (list
 (part (format nil +system-definition-context-instruction+ system-name))
 (part system-contents)
 (part (format nil +analyze-source-file-prompt+ system-name component-name))
 (part component-contents))))))

The purpose of the analyze-component function is to orchestrate the analysis of a specific ASDF component’s source code by an LLM (Large Language Model). It first retrieves the ASDF system object, its definition file content, the ASDF component object, and its source file content using get-system-and-component-contents. It then sets up a system instruction for the LLM, providing context about the system being analyzed. Finally, it invokes the LLM with a carefully constructed prompt that includes the system definition (for context on dependencies), the component’s source code, and instructions for the LLM to summarize functionality, identify issues, and suggest improvements.

The function has an excellent docstring that clearly describes its purpose, arguments, and the detailed analysis expected from the LLM.

The form is error-free and follows best practices. It leverages the get-system-and-component-contents function for robust data retrieval and error handling, uses multiple-value-bind effectively, and correctly declares ignored variables. The construction of the LLM prompt and system instruction using predefined templates and format is also a good practice for consistency.

Here is one minor suggestion for improvement:

  1. Minor Improvement: Explicitly name the parts for invoke-gemini for enhanced readability.
    • Reasoning: While the current list of part forms is functional, explicitly naming the role of each part within the invoke-gemini call could improve the readability and maintainability of the code, especially as the complexity of LLM interactions grows. This makes it immediately clear what each piece of content represents in the overall prompt structure.
    • Actionable Suggestion: Consider using a more descriptive structure for the parts list, perhaps by introducing helper functions or a more structured way to represent the different segments of the LLM input. For example, if invoke-gemini supported keyword arguments for different content types, it would be more explicit. As it stands, a comment or a local variable for each part could also serve this purpose, but that’s less integrated. For now, the current approach is acceptable.
;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 6.49 seconds.
;; Prompt Tokens: 14,452
;; Thoughts Tokens: 313
;; Candidate Tokens: 221
;; Accumulated Prompt Tokens: 8,373,735
;; Accumulated Response Tokens: 350,127
;; Analyzing form:
(defparameter +analyze-form-prompt+
 (str:join
 #\Newline
 (list 
 " * Use the preceeding forms as context for analysis."
 " * Determine the purpose of the form."
 " * Determine whether a library function might achieve the same thing."
 " * Check that it follows best practices."
 " * Look for opportunities to simplify or improve the code."
 " * Look for typical Common Lisp errors and pitfalls."
 " * Look for bugs."
 "Then report on the form:"
 " * Describe the purpose of the form."
 " * Suggest a docstring if one is needed, or if it can be improved, otherwise don't mention it."
 " * If the form fails to follow best practices, say so."
 " * If there are bugs or errors, describe them."
 " * If improvements can be made, give up to three concrete, actionable suggestions for improvement."
 " * For each suggestion, indicate whether it is a critical improvement, major improvement, minor improvement, merely nice to have, or barely worth mentioning."
 " * For each suggestion, describe your reasoning."
 " * If the form is error free and does not need improvement, say so."))
 "LLM prompt detailing the analysis criteria and reporting format for a single Common Lisp top-level form.")

The purpose of this defparameter form is to define a special variable named +analyze-form-prompt+. This variable stores a multi-line string that serves as a comprehensive prompt template for an LLM (Large Language Model). The prompt meticulously outlines the criteria for analyzing a single Common Lisp top-level form, including what aspects to consider (context, purpose, library functions, best practices, simplification, errors, bugs) and the exact format for reporting the analysis (description, docstring suggestions, best practice adherence, bug descriptions, and up to three concrete, actionable suggestions with criticality and reasoning).

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The use of str:join with #\Newline is an idiomatic and readable way to construct a multi-line string from a list of lines, making the prompt definition clear and easy to manage.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 17.41 seconds.
;; Prompt Tokens: 15,190
;; Thoughts Tokens: 1,246
;; Candidate Tokens: 738
;; Accumulated Prompt Tokens: 8,388,925
;; Accumulated Response Tokens: 352,111
;; Analyzing form:
(defparameter +analyze-file-system-instruction+
 (str:join #\newline
 (list
 (str:join #\Space
 (list
 "You are a world class Common Lisp programmer."
 "You will be analyzing a Common Lisp file one top-level form at a time."
 "Your analysis should be thorough and insightful, demonstrating a deep understanding of Common Lisp programming practices."
 "If there is no package definition, assume an appropriate one exists elsewhere and do not mention this."
 "Assume that popular utility packages such as `alexandria` and `series` have been loaded and made available."
 "Assume that undefined functions are defined elsewhere and do not mention this."
 "Assume variables decorated with *earmuffs* have been declared as special variables."
 "Implicit dependencies are expected, so do not mention them."
 "Do not suggest using defconstant, even if it would be appropriate or a best practice."
 "Prefer `check-type` over type declarations for robustness."))
 "For each top-level form:"
 +analyze-form-prompt+))
 "LLM system instruction to analyze a Common Lisp file one top-level form at a time.")

The purpose of this defparameter form is to define a special variable named +analyze-file-system-instruction+. This variable holds a comprehensive multi-line string that serves as the primary "system instruction" for an LLM (Large Language Model) when it is tasked with analyzing Common Lisp files. This instruction sets the LLM’s persona as a "world class Common Lisp programmer" and provides detailed guidelines, assumptions (e.g., package existence, utility libraries, special variables), and constraints (e.g., no defconstant suggestions, prefer check-type). It also incorporates the previously defined +analyze-form-prompt+ to specify the detailed reporting format for each top-level form analyzed.

The form is error-free and follows best practices. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention, and includes a clear docstring. The use of str:join to construct the multi-line string from a list of components, including another defparameter (+analyze-form-prompt+), is idiomatic and promotes readability and modularity.

Here is one suggestion for improvement:

  1. Minor Improvement: Extract the initial general instructions into a separate defparameter.
    • Reasoning: The first str:join block, which defines the LLM’s persona, assumptions, and general constraints, is a distinct logical unit. Separating this into its own defparameter (e.g., +llm-lisp-programmer-persona-instruction+) would further modularize the overall system instruction. This would make +analyze-file-system-instruction+ a cleaner composition of two other named instruction blocks, potentially improving readability and making it easier to modify specific parts of the instruction without affecting the overall structure.
    • Actionable Suggestion:
       ```lisp
       (defparameter +llm-lisp-programmer-persona-instruction+
       (str:join #\Space
       (list
       "You are a world class Common Lisp programmer."
       "You will be analyzing a Common Lisp file one top-level form at a time."
       "Your analysis should be thorough and insightful, demonstrating a deep understanding of Common Lisp programming practices."
       "If there is no package definition, assume an appropriate one exists elsewhere and do not mention this."
       "Assume that popular utility packages such as `alexandria` and `series` have been loaded and made available."
       "Assume that undefined functions are defined elsewhere and do not mention this."
       "Assume variables decorated with *earmuffs* have been declared as special variables."
       "Implicit dependencies are expected, so do not mention them."
       "Do not suggest using defconstant, even if it would be appropriate or a best practice."
       "Prefer `check-type` over type declarations for robustness."))
       "LLM system instruction defining the persona and general guidelines for Common Lisp analysis.")
       (defparameter +analyze-file-system-instruction+
       (str:join #\newline
       (list
       +llm-lisp-programmer-persona-instruction+
       "For each top-level form:"
       +analyze-form-prompt+))
       "LLM system instruction to analyze a Common Lisp file one top-level form at a time.")
       ```
      
;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 7.22 seconds.
;; Prompt Tokens: 16,245
;; Thoughts Tokens: 463
;; Candidate Tokens: 182
;; Accumulated Prompt Tokens: 8,405,170
;; Accumulated Response Tokens: 352,756
;; Analyzing form:
(defparameter +analyze-file-form-prompt+
 (str:join #\Newline
 (list "Analyze the following top-level form in a Common Lisp file." +analyze-form-prompt+))
 "LLM prompt to analyze a single top-level form in a Common Lisp file.")

The purpose of this defparameter form is to define a special variable named +analyze-file-form-prompt+. This variable stores a string that serves as a prompt template for an LLM (Large Language Model). The prompt instructs the LLM to analyze a single top-level form in a Common Lisp file, incorporating the detailed analysis criteria and reporting format defined in the +analyze-form-prompt+ variable.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The use of str:join to combine a specific instruction with a more general prompt template (+analyze-form-prompt+) is a good practice for modularity and readability.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 23.46 seconds.
;; Prompt Tokens: 17,199
;; Thoughts Tokens: 1,842
;; Candidate Tokens: 556
;; Accumulated Prompt Tokens: 8,422,369
;; Accumulated Response Tokens: 355,154
;; Analyzing form:
(defun analyze-file (filename &key (temperature 0.01) (verbose t))
 "Analyze a Common Lisp file one top-level form at a time."
 (check-type filename (or pathname string))
 (let ((forms (file-forms filename))
 (*temperature* temperature)
 (*system-instruction*
 (content :parts (list (part +analyze-file-system-instruction+))
 :role "system"))
 (start-prompt-tokens *accumulated-prompt-tokens*)
 (start-response-tokens *accumulated-response-tokens*)
 (start-time (get-universal-time))
 (aborted t))
 (unless (consp forms)
 (error "No top-level forms found in file '~a'." filename))
 (unwind-protect
 (prog1 (let iter ((current-form (car forms))
 (analysis (invoke-gemini
 (list (part +analyze-file-form-prompt+)
 (part (car forms)))))
 (results '())
 (more (cdr forms)))
 (when verbose
 (format t "~&~%;; Analyzing form:~%~%~a~%--------~%~a~%~%" current-form analysis))
 (if (null more)
 (reverse (acons current-form analysis results))
 (iter (car more)
 (continue-gemini
 (list (part +analyze-file-form-prompt+)
 (part (car more))))
 (acons current-form analysis results)
 (cdr more))))
 (setq aborted nil))
 (when verbose
 (let ((total-prompt-tokens (- *accumulated-prompt-tokens* start-prompt-tokens))
 (total-response-tokens (- *accumulated-response-tokens* start-response-tokens))
 (elapsed-time (- (get-universal-time) start-time)))
 (format t "~&;; Analysis ~:[complete in~;aborted at~] ~d seconds.~%" aborted elapsed-time)
 (format t ";; Prompt tokens: ~7,d~%" total-prompt-tokens)
 (format t ";; Response tokens: ~7,d~%" total-response-tokens))))))

The purpose of the analyze-file function is to read a specified Common Lisp file, extract its top-level forms, and then sequentially send each form to an LLM (Large Language Model) for detailed analysis. It sets up the LLM’s system instruction and a specific prompt for analyzing individual forms, dynamically adjusting the LLM’s temperature. The function also tracks and reports the total prompt and response tokens used, as well as the elapsed time for the analysis, even if an error occurs.

The docstring is concise but could be improved to explicitly state the return value.

The form follows best practices, including check-type for argument validation, let for dynamic scope of special variables (*temperature*, *system-instruction*), unwind-protect for guaranteed cleanup and reporting, and a let iter construct for robust iteration over the file’s forms. Error handling for files with no top-level forms is also present.

Here are two concrete, actionable suggestions for improvement:

  1. Minor Improvement: Enhance the docstring to include the return value.
    • Reasoning: While the docstring describes the function’s action, it doesn’t explicitly state what the function returns. Knowing the exact structure of the return value is crucial for users of the function to correctly process its output.
    • Actionable Suggestion: Update the docstring to:
       ```lisp
       "Analyze a Common Lisp file one top-level form at a time.
       Returns a list of cons cells, where each cons cell is `(top-level-form . llm-analysis-result-string)`."
       ```
       
  2. Minor Improvement: Add a check-type for the forms returned by file-forms.
    • Reasoning: The file-forms function is an external dependency. While it’s assumed to return a list of Lisp forms (s-expressions), explicitly checking that the result is indeed a list ensures robustness. If file-forms were to return something unexpected (e.g., a single atom or NIL when forms are present but not a list), this check-type would catch it early and provide a clearer error message than a subsequent car or cdr error.
    • Actionable Suggestion: After the line (let ((forms (file-forms filename)) ...)), add:
       ```lisp
       (check-type forms list)
       ```
      

The form is otherwise error-free and well-written.

;; Analysis complete in 150 seconds.
;; Prompt tokens: 152555
;; Response tokens: 14755

Conclusion

This approach works quite well. The LLM’s analysis is generally accurate and insightful. It often finds ways to improve the code without introducing bugs. It also often finds potential bugs that I had not considered. The analysis is not perfect, but it is a useful tool for improving the quality of Common Lisp code.

An important thing to keep in mind is that the tool makes an effort to find something to say about each top-level form. If you follow its suggestions, you could endlessly “improve” your code. It also suggests “improvements” where deliberately have decided to do things in a non-standard way. You need to exercise judgment in deciding which suggested improvements are worth making.

The analysis is slow enough that you would not want to use it as part of a tool chain, but it is fast enough that you can use it regularly. It does consume tokens fairly rapidly, so a lot of analysis will add up.

Tuesday, September 9, 2025

The LLM Experience Gap

If you're a senior programmer, LLMs are nothing to worry about. They aren't anywhere near smart enough to replace you, and they are a powerful tool to help you get stuff done.

If you're a junior programmer, you're going to have to step up your game, a lot. LLMs go astray. They make stuff up. They hallucinate. They make dumb mistakes. A senior programmer can spot this stuff a mile away: they've made the same dumb mistakes themselves. A junior programmer hasn't. Let me give a concrete example.

Mutation

There are three attitudes towards mutation in programming. The first is "so what?" This is the attitude we all start with. But as we start building large systems, we find that mutation opens up a can of worms involving object identity, synchronization, race conditions, etc. The second attitude is "be very careful", and the third is "avoid it like the plague". Hard core functional programmers take the third approach, but most of the world seems to take the second approach. If you're going to be careful about mutation, you can either avoid it unless you know it is safe, or you use it unless you think it is dangerous. Most people seem to take the latter approach. It takes a fair amount of experience to know what situations cause problems, and to recognize when you are in one of those situations.

If you have mutation errors, your program starts acting pretty weirdly. There are other reasons your program might act weirdly, but if you notice that it misbehaves only under load, and that the tests work fine, you might suspect a mutation problem. These can be fiendishly difficult to track down, and you need experience in debugging them to be any good at fixing them. A senior programmer will have seen a few of these. A junior programmer probably hasn't. A senior programmar has probably fixed a few of these that a junior programmer missed.

LLMs are not very good at reasoning about mutation. (They don't really reason about your code at all.) One thing you learn pretty early on as a programmer is that you should not mutate data structures you are traversing. Modifying a collection as you are attempting to iterate over it is not going to work (unless you are very clever). All sorts of weird behaviors can occur if items are being inserted and removed from a collection as you are trying walk over it. We've all seen these. And we also know that if items aren't being processed in the order we expect, there is a fair bet that someone is modifying the collection that contains them. This isn't something you are taught in class, you learn it the hard way.

I had the LLM code up a process to modify dozens of git repositories that were held in subdirectories. It was working fine until it wasn't. It got into an infinite loop where it was repeatedly processing the same repository over and over. Fortunately, I had two things in my favor: First, I had gone out of my way to make sure my code was idempotent, so running it over and over didn't do any harm, and second, I recognized that the problem was probably due to the LLM naively modifying the file system as it was trying to traverse it. It took me all of five minutes to fix the issue. Would a junior programmer just out of college have instructed the LLM to write idempotent code? Would he have spotted the LLM modifying the file system while traversing it?

Conclusion

But LLMs are pretty good at being code monkeys. With the right supervision, they can crank out a lot of code. A junior programmer is going to have to work hard to provide the right supervision. Learning to program X in 20 days isn't going to cut it. The LLM already knows 20 days worth of X. The senior programmer brings a few years of experience to the table. What does the junior programmer bring? The real value is not just in writing code, but in recognizing when the code is flawed and knowing how to fix it—a skill that can't be learned in 20 days.

Saturday, August 16, 2025

Dinosaurs

What did the dinosaurs think in their twilight years as their numbers dwindled and small scurrying mammals began to challenge their dominance? Did they reminisce of the glory days when Tyrannosaurus Rex ruled the land and Pteranodon soared through the air? Probably not. They were, after all, just dumb animals.

Our company has decided to buy in to Cursor as an AI coding tool. Cursor is one of many AI coding tools that have recently been brought to market, and it is a fine tool. It is based on a fork of VSCode and has AI coding capabilities built in to it. One of the more useful ones (and one that is available in many other AI tools) is AI code completion. This anticipates what you are going to type and tries to complete it for you. It gets it right maybe 10-20% of the time if you are lucky, and not far wrong maybe 80% of the time. You can get into a flow where you reflexively keep or discard its suggestions or accept the near misses and then correct them. This turns out to be faster than typing everything yourself, once you get used to it. It isn't for everyone, but it works for me.

Our company has been using GitHub Copilot for several months now. There is an Emacs package that allows you to use the Copilot code completion in Emacs, and I have been using it for these past few months. In addition to code completion, it will complete sentences and paragraphs in text mode and html mode. I generally reject its suggestions because it doesn't phrase things the way I prefer, but I really like seeing the suggestions as I type. It offers an alternative train of thought that I can mull over. If the suggestions wildly diverge from what I am thinking, it is usually because I didn't lay the groundwork for my train of thought, so I can go back and rework my text to make it clearer. It seems to make my prose more focused.

But now comes Cursor, and it has one big problem. It is a closed proprietary tool with no API or SDK. It won't talk to Emacs. So do I abandon Emacs and jump on the Cursor bandwagon, or do I stick with Emacs and miss out on the latest AI coding tools? Is there really a question? I've been using Emacs since before my manager was born, and I am not about to give it up now. My company will continue with a few GitHub Copilot licenses for those that have a compelling reason to not switch to Cursor, and I think Emacs compatibility is pretty compelling.

But no one uses Emacs and Lisp anymore but us dinosaurs. They all have shiny new toys like Cursor and Golang. I live for the schadenfreude of watching the gen Z kids rediscover and attempt to solve the same problems that were solved fifty years ago. The same bugs, but the tools are now clumsier.

Wednesday, July 30, 2025

JRM runs off at the mouth

Although LLMs perform a straightforward operation — they predict the next tokens from a sequence of tokens — they can be almost magical in their results if the stars are aligned. And from the look of it, the stars align often enough to be useful. But if you're unlucky, you can end up with a useless pile of garbage. My LLM started spitting out such gems as Cascadescontaminantsunnatural and exquisiteacquire the other day when I requested it imagine some dialog. Your mileage will vary, a lot.

The question is whether the magic outweighs the glossolalia. Can we keep the idiot savant LLM from evangelically speaking in tongues?

Many people at work are reluctant to use LLMs as an aid to programming, preferring to hand craft all their code. I understand the sentiment, but I think it is a mistake. LLMs are a tool of extraordinary power, but you need to develop the skill to use them, and that takes a lot of time and practice.

The initial key to using LLMs is to get good at prompting them. Here a trained programmer has a distinct advantage over a layperson. When you program at a high level, you are not only thinking about how to solve your problem, but also all the ways you can screw up. This is “defensive programming”. You check your inputs, you write code to handle “impossible” cases, you write test cases that exercise the edge cases. (I'm no fan of test-driven development, but if I have code that is supposed to exhibit some complex behavior, I'll often write a few test cases to prove that the code isn't egregiously broken.)

When you prompt an LLM, it helps a lot to think in the same way you program. You need to be aware of the ways the LLM can misinterpret your prompt, and you need to write your prompt so that it is as clear as possible. You might think that this defeats the purpose. You are essentially performing the act of programming with an extra natural language translation step in the middle. This is true, and you will get good results if you approach the task with this in mind. Learning to effectively prompt an LLM is very similar to learning a new programming language. It is a skill that a trained programmer will have honed over time. Laypeople will find it possible to generate useful code with an LLM, but they will encounter bugs and problems that they will have difficulty overcoming. A trained programmer will know precisely how to craft additional clauses to the prompt to avoid these problems.

Context engineering is the art of crafting a series of prompts to guide the LLM to produce the results you want. If you know how to program, you don't necessarily know how to engineer large systems. If you know how to prompt, you don't necessarily know how to engineer the context. Think of Mickey Mouse in Fantasia. He quickly learns the prompts that get the broom to carry the water, but he doesn't foresee the consequences of exponential replication.

Ever write a program that seems to be taking an awfully long time to run? You do a back-of-the-envelope calculation and realize that the expected runtime will be on the order of 1050 seconds. This sort of problem won't go away with an LLM, but the relative number of people ill-equipped to diagnose and deal with the problem will certainly go up. Logical thinking and foreseeing of consequences will be skills in higher demand than ever in the future.

You won't be able to become a “machine whisperer” without a significant investment of time and effort. As a programmer, you already have a huge head start. Turn on the LLM and use it in your daily workflow. Get a good feel for its strengths and weaknesses (they'll surprise you). Then leverage this crazy tool for your advantage. It will make you a better programmer.

Monday, June 30, 2025

You Are The Compiler

Consider a complex nested function call like

(foo (bar (baz x)) (quux y))

This is a tree of function calls. The outer call to foo has two arguments, the result of the inner call to bar and the result of the inner call to quux. The inner calls may themselves have nested calls.

One job of the compiler is to linearize this call tree into a sequential series of calls. So the compiler would generate some temporaries to hold the results of the inner calls, make each inner call in turn, and then make the outer call.

 temp1 = baz(x)
 temp2 = bar(temp1)
 temp3 = quux(y)
 return foo (temp2, temp3)

Another job of the compiler is to arrange for each call to follow the calling conventions that define where the arguments are placed and where the results are returned. There may be additional tasks done at function call boundaries, for example, the system might insert interrupt checks after each call. These checks are abstracted away at the source code level. The compiler takes care of them automatically.

Sometimes, however, you want to want modify the calling conventions. For example, you might want to write in continuation passing style. Each CPS function will take an additional argument which is the continuation. The compiler won't know about this convention, so it will be incumbent on the programmer to write the code in a particular way.

If possible, a macro can help with this. The macro will ensure that the modified calling convention is followed. This will be less error prone than expecting the programmer to remember to write the code in a particular way.

The Go language has two glaring omissions in the standard calling conventions: no dynamic (thread local) variables and no error handling. Users are expected to impose their own calling conventions of passing an additional context argument between functions and returning error objects upon failures. The programmer is expected to write code at the call site to check the error object and handle the failure.

This is such a common pattern of usage that we can consider it to be the de facto calling convention of the language. Unfortunately, the compiler is unaware of this convention. It is up to the programmer to explicitly write code to assign the possible error object and check its value.

This calling convention breaks nested function calls. The user has to explicitly linearize the calls.

temp1, err1 := baz(ctx, x)
 if err1 != nil {
 return nil, err1
 }
 temp2, err2 := bar(ctx, temp1)
 if err2 != nil {
 return nil, err2
 }
 temp3, err3 := quux(ctx, y)
 if err2 != nil {
 return nil, err2
 }
 result, err4 := foo(ctx, temp2, temp3)
 if err4 != nil {
 return nil, err4
 }
 return result, nil

Golang completely drops the ball here. The convention of returning an error object and checking it is ubiquitous in the language, but there is no support for it in the compiler. The user ends up doing what is normally considered the compiler's job of linearizing nested calls and checking for errors. Of course users are less disciplined than the compiler, so unconventional call sequences and forgetting to handle errors are common.

Monday, June 9, 2025

Avoiding Management

You can probably guess from prior blog posts that I am not a manager. I have made it clear in my last several gigs that I have no interest in management and that I only wish to be an individual contributor. I've managed before and I don't like it. The perqs - more control over the project, more money, a bigger office - don't even come close to the disadvantages - more meetings, less contact with the code, dealing with people problems, and so on.

The worst part of managing is dealing with people. In any group of people, there is some median level of performance, and by definition of the median, half the people are below that level. And one is at the bottom. He's the one who is always late, always had the bad luck, never checks in code, and so on. He struggles. Perhaps the right thing to do is to let him go, but I've been there. I've had jobs where the fit wasn't right and I wasn't performing at my best. I empathize. The last thing you need when you're in that position is the sword of Damocles hanging over your head. It's a horrible position to be in and I don't want to exacerbate it. I'll gladly forego the perqs of management to avoid being in the situation where I have to fire someone or even threaten to fire them.

But I do like to help people. I like to mentor and coach. I can usually keep the big picture in my head and I can help people find the places where they can be most effective. I'm pretty comfortable being a senior engineer, a technical lead, or an architect.

Scheduling is hard. I've got a rule of thumb for estimating how long a project will take. I take the best estimate I can get from the team, and then I triple it. If I think it will take two weeks, I tell upper management it will take six. It is almost always a realistic estimate - I'm not padding or sandbagging. When I get the time I ask for, I almost always deliver on time. But manegement rarely likes to hear the word "six" when they were hoping for "two".

There is always overhead that is hard to account for. You come in some day and find that your infrastructure is down and you have to spend a whole day fixing it. Or you need to coordinate with another team and it takes two days to get everyone on the same page. Or you discover that the API you were depending on hasn't really been implemented yet, and you have to spend a day or two writing a workaroud. These add up. As a contributor, I'm 100% busy developing and putting out fires, but managers only want to count the time spent developing.

The best managers I had didn't manage. They didn't assign work, or try to tell me what to do or try to direct the project. Instead, they acted as my off-board memory. I could rathole on some problem and when I solved it and came up for air, they would be there to remind me what I was in the middle of to begin with. They would keep track of the ultimate goals of the project and how we were progressing towards them, but they would not try to direct the activity. They trusted that I knew how to do my job. They acted as facilitators and enablers, not as managers. They would interface with upper management. They would handle the politics. They would justify the six weeks I asked for.

So I prefer to be an individual contributor and I avoid management like the plague. Programming makes me happy, and I don't need the stress of management.

Thursday, May 29, 2025

Dependency Injection with Thunks vs. Global Variables

Revision 2

A thunk (in MIT parlance) is a function that takes no arguments and returns a value. Thunks are simple, opaque objects. The only thing you can do with a thunk is call it. The only control you can exert over a thunk is whether and when to call it.

A thunk separates the two concerns of what to compute and when to compute it. When a thunk is created, it captures the lexical bindings of its free variables (it is, after all, just a lexical closure). When the thunk is invoked, it uses the captured lexical values to compute the answer.

There are a couple of common ways one might use a thunk. The first is to delay computation. The computation doesn't occur until the thunk is invoked. The second is as a weaker, safer form of a pointer. If the thunk is simply a reference to a lexical variable, then invoking the thunk returns the current value of the variable.

I once saw an article about Python that mentioned that you could create functions of no arguments. It went on to say that such a function had no use because you could always just pass its return value directly. I don't know how common this misconception is, but thunks are a very useful tool in programming.

Here is a use case that came up recently:

In most programs but the smallest, you will break the program into modules that you try to keep independent. The code within a module may be fairly tightly coupled, but you try to keep the dependencies between the modules at a minimum. You don't want, for example, the internals of the keyboard module to affect the internals of the display module. The point of modularizing the code is to reduce the combinatorics of interactions between the various parts of the program.

If you do this, you will find that your program has a “linking” phase at the beginning where you instantiate and initialize the various modules and let them know about each other. This is where you would use a technique like “depenedency injection” to set up the dependencies between the modules.

If you want to share data between modules, you have options. The crudest way to do this is to use global variables. If you're smart, one module will have the privilege of writing to the global variable and the other modules will only be allowed to read it. That way, you won't have two modules fighting over the value. But global variables come with a bit of baggage. Usually, they are globally writable, so nothing is enforcing the rule that only one module is in charge of the value. In addition, anyone can decide to depend on the value. Some junior programmer could add a line of code in the display handler that reads a global value that the keyboard module maintains and suddenly you have a dependency that you didn't plan on.

A better option is to use a thunk that returns the value you want to share. You can use dependency injection to pass the thunk to the modules that need it. When the module needs the value, it invokes the thunk. Modules cannot modify the shared value because the thunk has no way to modify what it returns. Modules cannot accidentally acquire a dependency on the value because they need the thunk to be passed to them explicitly upon initialization. The value that the thunk returns can be initialized after the thunk is created, so you can link up all the dependencies between the modules before you start computing any values.

I used this technique recently in some code. One thread sits in a loop and scrapes data from some web services. It updates a couple dozen variables and keeps them up to date every few hours. Other parts of the code need to read these values, but I didn't want to have a couple dozen global variables just hanging around flapping in the breeze. Instead, I created thunks for the values and injected them into the constructors for the URL handlers that needed them. When a handler gets a request, it invokes its thunks to get the latest values of the relevant variables. The values are private to the module that updates them, so no other modules can modify them, and they aren't global.

I showed this technique to one of our junior programmers the other day. He hadn't seen it before. He just assumed that there was a global variable. He couldn't figure out why the global variables were never updated. He was trying to pass global variables by value at initialization time. The variables are empty at that time, and updates to the variable later on have no effect on value that have already been passed. This vexed him for some time until I showed him that he should be passing a thunk that refers to the value rather than the value itself.

Tuesday, May 13, 2025

Purchasing White Elephants

As a software engineer, I'm constantly trying to persuade management to avoid doing stupid things. Management is of the opinion that because they are paying the engineers anyway, the software is essentially free. In my experience, bespoke software is one of the most expensive things you can waste money on. You're usually better off setting your money on fire than writing custom software.

But managers get ideas in their heads and it falls upon us engineers to puncture them. I wish I were less ethical. I'd just take the money and spend it as long as it kept flowing. But I wouldn't be able to live with myself. I have to at least try to persuade them to avoid the most egregious boondoggles. If they still insist on doing the project, well, so be it.

I'm absolutely delighted to find that these LLMs are very good at making plausible sounding proposals for software projects. I was asked about a project recently and I just fed the parameters into the LLM and asked it for an outline of the project, estimated headcount, time, and cost. It suggested we could do it in 6 months with 15 engineers at a cost of 3ドルM. (I think it was more than a bit optimistic, frankly, but it was a good start.) It provided a phased breakdown of the project and the burn rate. Management was curious about how long it would take 1 engineer and the LLM suggested 3-6 years.

Management was suitably horrified.

I've been trying to persuade them that the status quo has been satisfying our needs, costs nothing, needs no engineers, and is ready today, but they didn't want to hear it. But now they are starting to see the light.

Sunday, April 6, 2025

When Laymen Try to be Programmers

When I worked on Google Offers (a Groupon clone), we had a user interaction problem. When a user purchased an item, there were three important steps that we wanted to acknowledge:

  1. When Google recieves an order. The user should get positive feedback that the order was recognized. If the user doesn’t get this, they will either re-submit the order, or be surprised when the order is fulfilled. The order is placed on a queue.
  2. When Google processes an order from the queue. The user should get positive feedback that Google is working on fulfilling the order. Usually, an order is fulfilled reasonably quickly, but there are situations where critical systems are unavailable and the order has to remain in the queue for an extended period of time (several hours). The user gets nervous if they get ghosted by the system after getting positive feedback that Google got the order.
  3. Finally, when Google has made the decision whether to fulfill the order or has declined the order. The user needs to be told whether to expect shipment or a refund. If a refund, then the user can take action to re-submit the order.

So in submitting an order to Google Offers, a total of three emails would be sent to the user so they could watch their order proceed through the system. The sending of these emails was controlled by the “commerce” API. The commerce API was a walled off section of the Google infrastructure that knew how to talk to the credit card companies and charge money. Normal Google code was not allowed to do these things but had to work via the commerce API, and the commerce API would take care of ensuring that the appropriate pop-ups would appear on the user’s screen and that the credit card information was securely obtained. Normal Google code never got its hands on the user’s credit card information, it only got a "go/no-go" signal from the commerce API.

So the commerce API would be the system actually taking the steps of recieving the order, charging the credit card, and returning the go/no-go signal to our system. We would instruct it to send email for each of these steps. So far so good.

The problem was that often the order would go through very quickly. The emails were processed in batches, so the email that acknowledged the reciept of the order could end up being received after the email that acknowledged that the order had been fulfilled. The user would first get an email saying "We charged your card." and only after this would they get an email saying "We got your order." This would confuse the user.

There was no way to add an artifical time lag, nor would we want to. We could not guarantee that the emails would arrive in the right order. (Even if we could, we couldn’t guarantee that the user would read them in the right order.) The solution that I proposed was to explicitly number the emails so that each email would say "This is email number N of 3 expected emails." and even perhaps a small message that said "Emails can arrive in the wrong order." If a user got email 2 first, then email 1, they could figure it out.

But the product managers didn’t see it that way. As far as they were concerned, it was confusing when email 1 arrived after email 2, so they wanted to not send email 1 if email 2 had been recieved. This is a nice idea, but I pointed out that we had no control over the order of arrival of emails, so there was no way to know if email 2 would be received prior to email 1 at the time we were sending email 1. They were adamant: "Don’t send the second email (that is, email 1, which arrived second in some situations)."

Ok, then. I adjusted the code to suppress the sending of email 1. This solved the problem of email 1 arriving after email 2, sure, but recall that email 1 was the "Google has received your order and has placed it on a queue to be processed" acknowledgement. Now when people placed an order, they would no longer get confirmation that Google had received it. Usually, Google would process the order in a timely manner and they’d quickly get email 2 which said "We are processing your order", but if there were some reason that we could not process the queue for some time, the user would be left in the dark about whether Google had heard them in the first place.

Complaints poured in.

The subsequent conversation was something like this:

“Why aren’t you acknowledging that the order has been received?”

“You explicitly told us to not send that email. See this communication (cite reference).”

“But that was not what we meant. We meant, don’t send it if the user has received the ‘Your order has been processed.’ email.”

“How are we supposed to know if email system delivered his mail and that he read it in the right order? We’re good, but not that good.”

“You mean emails can arrive or be read in the wrong order?”

“Exactly what we said in this communication (cite reference).”

“...”

“May I suggest we number the emails so that the user can figure it out if they arrive in the wrong order?”

“No, don’t do that. Just put it back the way it was.”

Done, and we are back to the original problem.

Out-of-order packets are an old problem that existed and was solved before computers. Computer programmers are, of course, very aware of the problem and the solutions. Computer programmers are well versed in the problems of “process”, and when laymen try to play at being programmers by interfering with process, they generally screw it up.

Monday, March 31, 2025

Avoiding Stringly Typed Code

It can be tempting to implement certain objects by their printed representation. This is especially true when you call out to other programs and pass the parameters in command line arguments and get a result back through the stdout stream. If an object is implemented by its printed representation, then serialization and deserialization of the object across program boundaries is trivial.

Objects implemented by their printed representation are jokingly referred to as “stringly typed”. The type information is lost so it is possible to pass strings representing objects of the wrong type and get nonsense answers. There are no useful predicates on arbitrary strings, so you cannot do type checking or type dispatch. This becomes a big problem for objects created from other utilities. When you call out to a bash script, you usually get the response as stream or string.

The solution? Slap a type on it right away. For any kind of string we get back from another program, we at least define a CLOS class with a single slot that holds a string. I define two Lisp bindings for any program implemented by a shell script. The one with a % prefix is the program that takes and returns strings. Without the % it takes and returns Lisp objects that are marshaled to and from strings before the % version is called. The % version obviously cannot do type checking, but the non-% entry point can and does enforce the runtime type.

Sunday, March 30, 2025

Keep a REPL Open

I keep a REPL open at all times whether or not I’m actually doing Lisp development. It’s my assistant that evaluates random expressions for me. I’ll script up little tasks in Common Lisp rather than in Bash. When I need to rapidly prototype something larger, I’ll switch to the REPL and do it there.

At work, my REPL has accumulated a bunch of code for talking to systems like GitHub, CircleCI, and LDAP as well as our in-house tools. These are utilities for my use only. I don’t write mission critical apps in Common Lis. No one else in the company uses it, and it is more important that the code be maintainable by the rest of the team than that it be written in a language I like. So I write the mission critical code in Python, or Golang, or Java, or whatever the rest of the team is using. I keep my Common Lisp to myself. I have, however, used it to protype code that evetually ends up ported to Python or Golang.

On occasion, I’ve wanted to quickly share some functionality before I have taken the time to port it. I’ve found two ways to do this. The first is to slap a web server on it. I use Hunchentoot for this. I translate JSON to Lisp coming in to the web server and Lisp back to JSON going out. This is all you effectively need for a black-box microservice. There have been a couple of transient projects where the whole thing was not expected to be maintained for a long time and by anyone other than me, so I can just throw up a microservice and tell my colleagues to hit it with a curl command.

The second way is to create a docker image that contains the Common Lisp code and all of its dependencies. It can take a bit of work to configure a lisp setup in your environment, so having it hiding inside a docker image allows me to correctly set up the Lisp environment along with the Lisp interpreter and the rest of the code. My colleagues can just pull and run the container and it will work. Again, this is only for small, throwaway projects that no one else is expected to modify or maintain. For anything that is mission critical or is expected to be shared at some point, I write it in Python or Golang or Java, etc.

I could have written these as a series of Bash scripts or Python programs, but when you start connecting a series of these together, you quickly run into the limitations of using a pipe to talk between programs. My Lisp scripts all reside in the same address space, so they can share structured data without any fancy marshaling protocol.

Wednesday, June 5, 2024

Multithreading and Immutable Data

I was amusing myself by looking at Lisp tutorials. They used the idea of a Tic-Tac-Toe service as a motivating example. You’d be able to play Tic-Tac-Toe against the computer or another opponent.

My immediate thought went to the issue of multithreading. If you were going to serve hundreds of people at once, you’d need to have a multi-threaded service. Multi-threaded code is hard to write and debug, and it is much better if you have a plan before you start than if you try to retrofit it later (that trick never works).

The magic bullet for multi-threading is immutable data. Immutable data is inherently thread-safe. It doesn’t need synchronization or locks. If all your data are immutable, you can pretty much ignore multi-threading issues and your code will just work.

Using a 2D array to represent a Tic-Tac-Toe board is the obvious thing that first comes to mind, but not only are arrays mutable, they virtually require mutation to be of any use. The Lisp tutorials I was looking at all used arrays to represent the board, none of them locked the board or used atomic operations to update it, and all had the potential for race conditions if two threads tried to update the board at the same time. Arrays are essentially inherently thread-unsafe.

I thought about alternative representations for the board. Different representations are more or less amenable for writing code that avoids mutation. I came up with a few ideas:

  • Use a 2d array, but copy it before each mutation. This is horribly inefficient, but it is simple.
  • Use a 1d array, again copying it before each mutation. This isn’t much different from the 2d array, but iterating over the cells in the board is simpler.
  • Keep a list of moves. Each move is a pair of player and position. To determine the state of the board, you iterate over the list of moves and apply them in order. This is a bit more complicated than the array representations, but it is inherently immutable. It also has the advantage that you can rewind the board to any prior position.
  • Encode the board as a pair of bitmaps, one for each player.
  • Encode the board as a single bitmap, with each cell represented by two bits.
  • There are only 39 ways to fill out a Tic-Tac-Toe grid, so you could represent the board as an integer.

Each one of these representations has pros and cons. I wrote up some sample code for each representation and I found that the representation had a large influence on the character of the code that used that representation. In other words, there wasn’t a single general Tic-Tac-Toe program that ended up being specialized to each representation, but rather there were six different Tic-Tac-Toe programs each derived from its own idiosyncratic representation.

In conclusion, it is a good idea to plan on using immutable data when you might be working with a multi-threaded system, and it is worth brainstorming several different representations of your immutable data rather than choosing the first one that comes to mind.

Subscribe to: Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /