Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Better type safety schemas #232

DadiBit started this conversation in General
Aug 9, 2025 · 1 comments · 3 replies
Discussion options

Greetings, I just started exploring Arrow and DuckDB, and I must admit it's quite interesting, however... I was hurting a bit to not be able to get full type safety with typescript intellisense of vscodium. I saw #192, and it looks somewhat related.
When I request the fields I just end up with an array of fields... Meh, having a typed object would be much cooler, wouldn't it?

Here's how I partially implemented my type safety:

helper.ts allows me to infer the constructor params as tuples:
import { Schema, Field } from "@apache-arrow/ts";
// See: https://stackoverflow.com/a/55344772
type Tail<T extends any[]> = T extends [infer A, ...infer R] ? R : never;
export type Fields = { [name: string]: Tail<ConstructorParameters<typeof Field>> };
type FieldsTypeMap<T extends Fields> = { [K in keyof T]: T[K][0] };
// Factory to generate typed schema
export function schemaFactory<T extends Fields>(fields: T) {
 const _fields = Object.entries(fields).map(([name, field]) => new Field(name, ...field));
 return new Schema<FieldsTypeMap<T>>(_fields);
}
student.ts to check out intellisense
import { Uint8, Utf8 } from "@apache-arrow/ts";
import { Fields, schemaFactory } from "./helper";
export const fields = {
 // here it fails to help me with names, but types are checked, that's something!
 'name': [new Utf8(), true],
 'age': [new Uint8(), true],
} satisfies Fields; // satisfies is very import, it doesn't erase the keys!
// Now, `schema`, techincally has the generic specific for my type
// When using `.fields` you get something out of it: 
// But it's just Field<Utf8 | Uint8>[]... What the hell, I lose some information, namely the key!
export const schema = schemaFactory(fields);

So I would like to ask if there's any chance to modify the library to return more meaningful types. This would break some code, since the types returned would change and some dev might depend on the type defintions (altough not super useful).

You must be logged in to vote

Replies: 1 comment 3 replies

Comment options

We regularly make new major releases and stricter types are usually good as they avoid bugs.

You must be logged in to vote
3 replies
Comment options

kou Aug 9, 2025
Collaborator

FYI: We can change the release cycle because we split the JS implementation from apache/arrow. We can release a new major version only when we need. We can release minor/patch versions when we need.

Comment options

First of all, thanks for the crazy quick reply: first time that happens to me in an open source project!

I like that DuckDB allows me to access static datasets, however as domortiz stated, the moment I started working on my frontend I realized I had no typescript checking (just the good old tsc blindly using any type).

I have very little knowledge on the underlying library structure, but I'm pretty sure some refactoring will be required. I would start by proposing a change that shouldn't brake anything, and is super easy to implement: let the Schema constructor accept a typed dictionary like:

const schema = new Schema({
 'name': new AnonymousField(...args...),
 'age': new AnonymousField(...args...),
}, ...args...)

This would simply call new Field(key, ...args...) under the hood, just like my example.
This is simple and won't need much testing, but next part is better, if it gets properly tested (as it's a bit more complex).


Checking the Field class:

arrow-js/src/schema.ts

Lines 106 to 146 in 90c1db1

export class Field<T extends DataType = any> {
public static new<T extends DataType = any>(props: { name: string | number; type: T; nullable?: boolean; metadata?: Map<string, string> | null }): Field<T>;
public static new<T extends DataType = any>(name: string | number | Field<T>, type: T, nullable?: boolean, metadata?: Map<string, string> | null): Field<T>;
/** @nocollapse */
public static new<T extends DataType = any>(...args: any[]) {
let [name, type, nullable, metadata] = args;
if (args[0] && typeof args[0] === 'object') {
({ name } = args[0]);
(type === undefined) && (type = args[0].type);
(nullable === undefined) && (nullable = args[0].nullable);
(metadata === undefined) && (metadata = args[0].metadata);
}
return new Field<T>(`${name}`, type, nullable, metadata);
}
public readonly type: T;
public readonly name: string;
public readonly nullable: boolean;
public readonly metadata: Map<string, string>;
constructor(name: string, type: T, nullable = false, metadata?: Map<string, string> | null) {
this.name = name;
this.type = type;
this.nullable = nullable;
this.metadata = metadata || new Map();
}
public get typeId() { return this.type.typeId; }
public get [Symbol.toStringTag]() { return 'Field'; }
public toString() { return `${this.name}: ${this.type}`; }
public clone<R extends DataType = T>(props: { name?: string | number; type?: R; nullable?: boolean; metadata?: Map<string, string> | null }): Field<R>;
public clone<R extends DataType = T>(name?: string | number | Field<T>, type?: R, nullable?: boolean, metadata?: Map<string, string> | null): Field<R>;
public clone<R extends DataType = T>(...args: any[]) {
let [name, type, nullable, metadata] = args;
(!args[0] || typeof args[0] !== 'object')
? ([name = this.name, type = this.type, nullable = this.nullable, metadata = this.metadata] = args)
: ({ name = this.name, type = this.type, nullable = this.nullable, metadata = this.metadata } = args[0]);
return Field.new<R>(name, type, nullable, metadata);
}
}

It looks to me we could get away with a simple dictionary with a {...args...} as const as IAnonField:

interface IAnonField<T> {
 name: string,
 type: T,
 nullable?: boolean,
 // This also is a bit sus, Why allow null *and* udnefined? Is it intended? Idk...
 metadata?: Map<string, string> | null
}
const x = { type: arrow.Int32 } as const satisfies IAnonField;
x.type; // this should show arrow.Int32 when hovering over it in vscodium

This means we let typescript know the field type, which is actually a value available at runtime in the javascript engine. I'll try working on this tomorrow instead, looks super promising 😄

Comment options

Note: Javascript objects do not guarantee insertion order IF the key type differs (I see both strings and numbers are allowed in the same schema - like a field named 1 and one named "name")

Example of writer that uses both strings and numbers as keys (Symbols are not supported, so we don't even need to work with them):

// write.js
import { tableToIPC, Schema, Field, Uint8, Float32, Table, makeData, vectorFromArray } from 'apache-arrow';
import * as fs from 'fs';
const schema = new Schema([
 new Field('precipitations', new Float32()),
 new Field('0', new Uint8()),
]);
const table = new Table(schema, {
 age: makeData({ type: new Float32(), length: 3, data: vectorFromArray([1, 2, 3.14]) }),
 0: makeData({ type: new Uint8(), length: 3, data: vectorFromArray([7, 8, 9]) }),
});
fs.writeFileSync('simple.arrow', tableToIPC(table, 'file'));

And then, if we read we can see column order retention:

// read.js
import { tableFromIPC } from 'apache-arrow';
import * as fs from 'fs';
const table = tableFromIPC(fs.readFileSync('simple.arrow'));
console.log(table.schema.fields);
// Now you should see that the field '0' comes after 'precipitations', "not standard", but expected behavior IMHO

With that said, if we look into the friendlier makeTable & Co. the result is that objects are used from the get-go. So field order isn't crucial, from my understanding, however the underlying structure should probably respect it.


See PR #233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /