Abstract Heresies: GitHub

Showing posts with label GitHub. Show all posts

Wednesday, September 3, 2025

Google API

To experiment with LLMs I chose to use Gemini. The cost is modest and I expect Google to regularly improve the model so that it remains competitive. I wrote a Gemini client in Common Lisp. Some people have already forked the repo, but I've refactored things recently so I thought I'd give a quick rundown of the code.

google repository

https://github.com/jrm-code-project/google

When I add APIs to Google, I intend to put them in this repository. It is pretty empty so far, but here is where they will go. Contributions welcome.

Basic Requirements

There are some basic libraries you can load with Zach Beane's Quicklisp. alexandria for some selected Common Lisp extensions, cl-json for JSON parsing, dexador for HTTP requests, and str for string manipulation. I also use series (available from SourceForge). My own fold, function, named-let, and jsonx libraries are also used (available from GitHub).

fold gives you fold-left for list reduction. function gives you compose. jsonx modifies cl-json to remove the ambiguity betweeen JSON objects and nested arrays. named-let gives you the named let syntax from Scheme for local iteration and recursion.

Google API

The google repository contains the beginnings of a Common Lisp client to Google services. It currently only supports an interface to Blogger that can return your blog content as a series of posts and an interface to Google's Custom Search Engine API. You create a Google services account (there is a free tier, and a paid tier if you want more). You create a project (you'll want at least a default project), and within that project you enable the Google services that you want that project to have access to.

API Keys and Config Files

For each service you can get an API key. In your XDG config directory (usually "~/.config/") you create a googleapis directory and subdirectories for each project you have created. The ~/.config/googleapis/default-project file contains the name of your default project. In the examples below, it contains the string my-project.

Within each project directory you create subdirectories for each service you want to use. In each service directory you create a file to hold the API key and possibly other files to hold IDs, options, and other config information. For example, for Blogger you create ~/.config/googleapis/my-project/Blogger/apikey to hold your Blogger API key and ~/.config/googleapis/my-project/Blogger/blog-id to hold your blog ID.

Here is a sample directory structure:

 ~/.config/googleapis/
 ├── my-project/
 │ ├── Blogger/
 │ │ ├── apikey
 │ │ └── blog-id
 │ ├── CustomSearchEngine/
 │ │ ├── apikey
 │ │ ├── hyperspec-id
 │ │ └── id
 │ └── Gemini/
 │ └── apikey
 └── default-project

When you create the API keys for the various services, it is a good idea to restrict the API key for only that service. That way, if the API key is compromised, the damage is limited to a single service.

Blogger

The blogger API will eventually have more features, but it currently only allows you to get the posts from a blog as a series.

scan-blogger-posts blog-id
Returns a series of posts from the blog with the given ID. Each post comes back as a JSON object (hash-table). The :content field contains the HTML content of the post.

Custom Search Engine

The Custom Search Engine API allows you to search a set of web pages. You create a search engine on Google and pass the ID custom-search. It is assumed that the default CustomSearchEngine id is a default, vanilla search, and that the hyperspec id searches the Common Lisp Hyperspec.

In each search, be sure to replace the spaces in the query with plus signs (+) before searching.

custom-search query &key custom-search-engine-id
Returns search results for query. Each search result comes back as a JSON object (hash-table).

web-search query
Search the web and return search results for query. Each search result comes back as a JSON object (hash-table).

hyperspec-search query
Search the Common Lisp Hyperspec and return search results for query. Each search result comes back as a JSON object (hash-table).

Posted by Joe Marshall at 12:00 AM 1 comment:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Common Lisp, GitHub, Google

Sunday, January 5, 2025

GitHub glitch bites hard (and update)

Update: Possible rogue process

GitHub reports that the call that removed the users was not the Copilot API but rather a call to the org membership API made by one of our bots.

We have a cron job that runs daily and keeps GitHub in sync with our internal databases. When GitHub and our internal databases disagree, the cron job makes API calls to reconcile the difference. It has the ability to remove users if it think they are no longer supposed to be members of the org.

It seems to have erroneously removed a large number of members. It was purely coincidence that I was editing copilot licenses at or around the time.

The question now is why? My hypothesis is that a query to our internal database only produced a partial result. The number of people determined to be valid users was far fewer than it should have been, and the cron job acted (correctly) and removed the users that were not verified by the database query. But it is hard to say for sure. I’ll need to check the cron job logs to see if I can determine what went wrong. It is very unusual, though. I’ve been here for years and I’ve never seen the cron job glitch out before. This is my working hypothesis for the moment. Perhaps it was some other error that made it think that the membership was greatly reduced.

I got bit hard by a GitHub bug last week.

Now GitHub has “organizations” which are owners of groups of repositories. GitHub carefully handles organization membership. You cannot directly join an organization, you must be invited by the organization. This gives the organization control over who can join the organization. But an organization also cannot directly add you as a member. It can invite you to join, but you must choose to accept the invitation. This gives you control over which organizations you are associated with. Membership in an organization is jointly controlled by the organization and the member. There is no way to bypass this.

This is source of friction in the onboarding process in our company. We have a few repositories on GitHub that are owned by the company. When a new hire joins the company, we want to make them members of the organization. GitHub does not provide any way to automate this. Instead, we direct new hires to an internal web site that will authenticate and authorize them and then let them issue an invitation to join the organization. GitHub won’t give them access until they accept the invitation. This is a manual process that is error prone and places the burden of doing it correctly on the new hire. We often have to intervene and walk them through the process.

Keep this in mind.

Our company provides GitHub Copilot to our developers. Some developers like it, but many of our developers choose not to use it. While Copilot licenses are cheap, there is no point in paying for a license that is not used. The UI for GitHub Copilot will display the last time a person used Copilot. It is easy to see a small set of our users who have never logged on to Copilot. We decided to save a few bucks by revoking unused Copilot licenses. We reasoned that we could always turn it back on for them if they wanted to use it.

To test this out, I selected a few of the users who had never logged in to Copilot. I turned off the checkbox next to their names in the Copilot UI and clicked the save button. It appeared to work.

Within an hour I started getting complaints. People who claimed to be active Copilot users were getting messages that their Copilot access was revoked. It seems that the UI had listed several active users as “never logged in” and I had just revoked their access.

It got worse. I had only revoked a few licenses, but dozens of people had had their access revoked. It seems that GitHub had eagerly revoked the licenses of far more people than I had selected.

It got even worse. I have a list of everyone who should have access, so I know who to re-enable. But I cannot re-enable them. It seems that in addition to revoking their Copilot access, GitHub had taken the extra step of removing their membership in the organization. I cannot restore their membership because of the way GitHub handles organization membership, so until they visit our internal web site and re-issue the invitation to the organization, I cannot restore their Copilot access. This has been a monumental headache.

I’ve spent the week trying to explain to people why their Copilot access and organization membership was revoked, what steps they need to take to restore it, and why I cannot restore it for them.

It looks like I’m going to be spending a lot of time on this next week as well.

GitHub has an enterprize offering that allows you to automate account creation and organization membership. We've been considering this for a while. Unfortunately, you cannot mix legacy accounts with enterprize accounts, so we would have to atomically migrate the entire company and all the accounts to the enterprize offering. This would be a risky endeavor for only a little gain in convenience.

Posted by Joe Marshall at 9:38 AM 4 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Copilot, GitHub

Friday, November 24, 2023

GitHub Co-pilot Review

I recently tried out GitHub CoPilot. It is a system that uses generative AI to help you write code.

The tool interfaces to your IDE — I used VSCode — and acts as an autocomplete on steroids … or acid. Suggested comments and code appear as you move the cursor and you can often choose from a couple of different completions. The way to get it to write code was to simply document what you wanted it to write in a comment. (There is a chat interface where you can give it more directions, but I did not play with that.)

I decided to give it my standard interview question: write a simple TicTacToe class, include a method to detect a winner. The tool spit out a method that checked an array for three in a row horizontally, vertically, and along the two diagonals. Almost correct. While it would detect three ‘X’s or ‘O’s, it also would detect three nulls in a row and declare null the winner.

I went into the class definition and simply typed a comment character. It suggested an __init__ method. It decided on a board representation of a 1-dimensional array of 9 characters, ‘X’ or ‘O’ (or null), and a character that determined whose turn it was. Simply by moving the cursor down I was able to get it to suggest methods to return the board array, return the current turn, list the valid moves, and make a move. The suggested code was straightforward and didn’t have bugs.

I then decided to try it out on something more realistic. I have a linear fractional transform library I wrote in Common Lisp and I tried porting it to Python. Co-pilot made numerous suggestions as I was porting, to various degrees of success. It was able to complete the equations for a 2x2 matrix multiply, but it got hopelessly confused on higher order matrices. For the print method of a linear fractional transform, it produced many lines of plausible looking code. Unfortunately, the code has to be better than “plausible looking” in order to run.

As a completion tool, co-pilot muddled its way along. Occasionally, it would get a completion impressively right, but just as frequently — or more often — it would get the completion wrong, either grossly or subtly. It is the latter that made me nervous. Co-pilot would produce code that looked plausible, but it required a careful reading to determine if it was correct. It would be all too easy to be careless and accept buggy code.

The code Co-Pilot produced was serviceable and pedestrian, but often not what I would have written. I consider myself a “mostly functional” programmer. I use mutation sparingly, and prefer to code by specifying mappings and transformations rather than sequential steps. Co-pilot, drawing from a large amount of code written by a variety of authors, seems to prefer to program sequentially and imperatively. This isn’t surprising, but it isn’t helpful, either.

Co-pilot is not going to put any programmers out of work. It simply isn’t anywhere near good enough. It doesn’t understand what you are attempting to accomplish with your program, it just pattern matches against other code. A fair amount of code is full of patterns and the pattern matching does a fair job. But exceptions are the norm, and Co-pilot won’t handle edge cases unless the edge case is extremely common.

I found myself accepting Co-pilot’s suggestions on occasion. Often I’d accept an obviously wrong suggestion because it was close enough and the editing seemed less. But I always had to guard against code that seemed plausible but was not correct. I found that I spent a lot of time reading and considering the code suggestions. Any time savings from generating these suggestions was used up in vetting the suggestions.

One danger of Co-pilot is using it as a coding standard. It produces “lowest common denominator” code — code that an undergraduate that hadn’t completed the course might produce. For those of us that think the current standard of coding is woefully inadequate, Co-pilot just reinforces this style of coding.

Co-pilot is kind of fun to use, but I don’t think it helps me be more productive. It is a bit quicker than looking things up on stackoverflow, but its results have less context. You wouldn’t go to stackoverflow and just copy code blindly. Co-pilot isn’t quite that — it will at least rename the variables — but it produces code that is more likely buggy than not.

Posted by Joe Marshall at 7:50 AM 3 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Co-pilot, GitHub, software development

Thursday, June 8, 2023

Lisp Essential, But Not Required

Here’s a weird little success story involving Lisp. The code doesn’t rely on anything specific to Lisp. It could be rewritten in any language. Yet it wouldn’t have been written in the first place if it weren’t for Lisp.

I like to keep a Lisp REPL open in my Emacs for tinkering around with programming ideas. It only takes a moment to hook up a REST API or scrape some subprocess output, so I have a library of primitives that can talk to our internal build tools and other auxiliary tools such as GitHub or CircleCI. This comes in handy for random ad hoc scripting.

I found out that CircleCI is written in Clojure, and if you connect to your local CircleCI server, you can start a REPL and run queries on the internal CircleCI database. Naturally, I hooked up my local REPL to the Clojure REPL so I could send expressions over to be evaluated. We had multiple CircleCI servers running, so I could use my local Lisp to coordinate activity between the several CircleCI REPLs.

Then a need arose to transfer projects from one CircleCI server to another. My library had all the core capabilities, so I soon had a script for transferring projects. But after transferring a project, we had to fix up the branch protection in GitHub. The GitHub primitives came in handy. Of course our internal systems had to be informed that the project moved, but I had scripting primitives for that system as well.

More requirements arose: package the tool into a docker image, deploy it as a microservice, launch it as a kubernetes batch job, etc. At each point, the existing body of code was 90% of the solution, so it only required small changes to the code to handle the new requirements. As of now, the CircleCI migration tool is deployed as a service used by dozens of our engineers.

Now Lisp isn’t directly necessary for this project. It could easily (for some definitions of easy) be rewritten in another language. But the initial idea of connecting to a Clojure REPL from another Lisp is an obvious thing to try out and only takes moments to code up. If I were coding in another language, I could connect to the REPL, but then I’d have to translate between my other language and Lisp. It’s not an obvious thing to try out and would take a long time to code up. So while this project could be written in another language, it never would have been. And Lisp’s flexibility meant that there was never a reason for a rewrite, even as the requirements were changing.

Posted by Joe Marshall at 12:42 PM 1 comment:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Common Lisp, GitHub

Saturday, July 16, 2022

Let's talk to GitHub

Let's teach Common Lisp to talk to GitHub.

We'll need an API token. I like to put these sorts of things in config files. This makes it easier to configure scripts that are deployed to containers. You simply make the config files available through a mount point when starting the container. That way, you can avoid baking credentials into the script.

(defun config-directory ()
 (merge-pathnames
 (make-pathname :directory '(:relative ".config" "github"))
 (user-homedir-pathname)))
(defun config-file (&rest keyargs)
 (merge-pathnames (apply #'make-pathname keyargs) (config-directory)))
(defun load-token (pathname)
 (with-open-file (stream pathname :direction :input)
 (str:trim (read-line stream))))
(defun github-api-token ()
 (load-token (config-file :name "api-token")))

We'll make a lot of use miscellaneous, ad hoc CLOS objects. It is so common for these things to have names that it is worth its own mixin.

(defgeneric get-name (object))
(defclass named-object-mixin ()
 ((name :initarg :name
 :initform (require-initarg :name)
 :reader get-name
 :type string)))

And we'll define a default print-object method. Classes that use this mixin and don't provide their own print-object method will get this one by default.

(defmethod print-object ((obj named-object-mixin) stream)
 (print-unreadable-object (obj stream :identity t :type t)
 (format stream "~a" (slot-value obj 'name))))

We'll make an object to represent GitHub and put the API token in there.

(defclass github (named-object-mixin)
 ((api-token :initarg :api-token
 :initform (require-initarg :api-token)
 :reader get-api-token)))
(defparameter +github+ nil)
(defun github () 
 (unless (and (boundp '+github+)
 (symbol-value '+github+))
 (setf (symbol-value '+github+)
 (make-instance 'github 
 :name "GitHub"
 :api-token (github-api-token))))
 (symbol-value '+github+))

To authenticate to GitHub, we need to pass the API token in the HTTP request headers.

(defun authorization-header (github)
 (cons "Authorization" (format nil "token ~a" (get-api-token github))))

So let's make a request:

* (dex:get "https://api.github.com/user"
 :headers (list (authorization-header (github))
 '("Accept" . "application/vnd.github.v3+json")))
"{"login":"joseph-marshall69","id":60371090,"node_id":"MDQ6VXNlcjYwMzcxMDkw","avatar_url":"https://avatars.githubusercontent.com/u/60371090?v=4","gravatar_id":"","url":"https://api.github.com/users/jos...[sly-elided string of length 1535]"
200 (8 bits, #xC8, #o310, #b11001000)
#<HASH-TABLE :TEST EQUAL :COUNT 26 {1002A9B2A3}>
#<QURI.URI.HTTP:URI-HTTPS https://api.github.com/user>
#<CL+SSL::SSL-STREAM for #<FD-STREAM for "socket 172.26.126.123:33674, peer: 192.30.255.116:443" {1002A96333}>>

Success! But we got back the string representation of a JSON object. We'll instead request a stream as a return value and pass it to a JSON parser:

* (json:decode-json
 (dex:get "https://api.github.com/user"
 :headers (list (authorization-header (github))
 '("Accept" . "application/vnd.github.v3+json"))
 :want-stream t))
((:LOGIN . "jrm-code-project") (:ID . 51824598)
 (:NODE--ID . "MDQ6VXNlcjUxODI0NTk4")
 (:AVATAR--URL . "https://avatars.githubusercontent.com/u/51824598?v=4")
 (:GRAVATAR--ID . "") (:URL . "https://api.github.com/users/jrm-code-project")
 (:HTML--URL . "https://github.com/jrm-code-project")
 (:FOLLOWERS--URL . "https://api.github.com/users/jrm-code-project/followers")
 (:FOLLOWING--URL
 . "https://api.github.com/users/jrm-code-project/following{/other_user}")
 (:GISTS--URL
 . "https://api.github.com/users/jrm-code-project/gists{/gist_id}")
 (:STARRED--URL
 . "https://api.github.com/users/jrm-code-project/starred{/owner}{/repo}")
 (:SUBSCRIPTIONS--URL
 . "https://api.github.com/users/jrm-code-project/subscriptions")
 (:ORGANIZATIONS--URL . "https://api.github.com/users/jrm-code-project/orgs")
 (:REPOS--URL . "https://api.github.com/users/jrm-code-project/repos")
 (:EVENTS--URL
 . "https://api.github.com/users/jrm-code-project/events{/privacy}")
 (:RECEIVED--EVENTS--URL
 . "https://api.github.com/users/jrm-code-project/received_events")
 (:TYPE . "User") (:SITE--ADMIN) (:NAME . "Joe Marshall") (:COMPANY)
 (:BLOG . "https://sites.google.com/site/evalapply/")
 (:LOCATION . "Seattle, WA") (:EMAIL) (:HIREABLE) (:BIO) (:TWITTER--USERNAME)
 (:PUBLIC--REPOS . 9) (:PUBLIC--GISTS . 0) (:FOLLOWERS . 21) (:FOLLOWING . 0)
 (:CREATED--AT . "2019-06-14T12:33:06Z")
 (:UPDATED--AT . "2022-03-15T15:18:03Z") (:PRIVATE--GISTS . 0)
 (:TOTAL--PRIVATE--REPOS . 0) (:OWNED--PRIVATE--REPOS . 0)
 (:DISK--USAGE . 44815) (:COLLABORATORS . 0) (:TWO--FACTOR--AUTHENTICATION)
 (:PLAN (:NAME . "free") (:SPACE . 976562499) (:COLLABORATORS . 0)
 (:PRIVATE--REPOS . 10000)))

JSON objects are mapped to alists. The key is a little funny because of how the JSON parser encodes JSON keys with underscores.

An alist is sort of a poor man's object. The problem with an alist is that there is no type associated with it. We know the slots in our poor man's object, but we don't know the class. Without the class information, we don't have a predicate or a way to dispatch to methods. We should create a real CLOS object from this JSON.

(defclass user (named-object-mixin)
 ((login :initarg :login)
 (id :initarg :id)
 (node-id :initarg :node--id)))
(defun json->user-instance (json)
 (apply #'make-instance 'user
 :allow-other-keys t
 (alist->plist json)))

Should we want to bring more fields into Lisp, we need simply add slots with the right initargs to the user class.

Now we can write

(defun get-self (github)
 (json->user-instance
 (json:decode-json
 (dex:get "https://api.github.com/user"
 :headers (list (authorization-header github)
 '("Accept" . "application/vnd.github.v3+json"))
 :want-stream t))))
* (get-self (github))
#<USER Joe Marshall {1002BE7013}>
* (inspect *)
The object is a STANDARD-OBJECT of type USER.
0. NAME: "Joe Marshall"
1. LOGIN: "jrm-code-project"
2. ID: 51824598
3. NODE-ID: "MDQ6VXNlcjUxODI0NTk4"

GitHub is moving to a GraphQL API. That's easy to handle.

(defun graphql-query (github query &rest variables)
 (let ((content (json:encode-json-to-string
 `((query . ,query)
 (variables . ,(plist->alist variables))))))
 (let* ((json (json:decode-json
 (dex:post "https://api.github.com/graphql"
 :headers (list (authorization-header github)
 '("Accept" . "application/vnd.github.v3+json")
 '("Content-Type" . "application/json"))
 :content content
 :want-stream t)))
 (errors (cdr (assoc :errors json))))
 (when errors
 (let ((first-error (car errors)))
 (error (cdr (assoc :message first-error)))))
 (cdr (assoc :data json)))))
(defparameter +get-user-by-login-query+
 "query ($login: String!) {
 user (login: $login) {
 databaseId
 login
 name
 }
 }")
 
* (graphql-query (github) +get-user-by-login-query+ :login "jrm-code-project")
((:USER (:DATABASE-ID . 51824598) (:LOGIN . "jrm-code-project")
 (:NAME . "Joe Marshall")))

And you can use the above technique to turn this JSON into a CLOS instance.

At this point we're cooking. We can call GitHub from Common Lisp and get CLOS objects in return. Of course we need more calls other than get-user, but it's more of the same. With this layer as our basis, it is straightforward to script GitHub.

Posted by Joe Marshall at 9:17 AM 2 comments:

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: CLOS, Common Lisp, GitHub