-
Notifications
You must be signed in to change notification settings - Fork 23
-
Greetings, I just started exploring Arrow and DuckDB, and I must admit it's quite interesting, however... I was hurting a bit to not be able to get full type safety with typescript intellisense of vscodium. I saw #192, and it looks somewhat related.
When I request the fields I just end up with an array of fields... Meh, having a typed object would be much cooler, wouldn't it?
Here's how I partially implemented my type safety:
helper.ts allows me to infer the constructor params as tuples:
import { Schema, Field } from "@apache-arrow/ts"; // See: https://stackoverflow.com/a/55344772 type Tail<T extends any[]> = T extends [infer A, ...infer R] ? R : never; export type Fields = { [name: string]: Tail<ConstructorParameters<typeof Field>> }; type FieldsTypeMap<T extends Fields> = { [K in keyof T]: T[K][0] }; // Factory to generate typed schema export function schemaFactory<T extends Fields>(fields: T) { const _fields = Object.entries(fields).map(([name, field]) => new Field(name, ...field)); return new Schema<FieldsTypeMap<T>>(_fields); }
student.ts to check out intellisense
import { Uint8, Utf8 } from "@apache-arrow/ts"; import { Fields, schemaFactory } from "./helper"; export const fields = { // here it fails to help me with names, but types are checked, that's something! 'name': [new Utf8(), true], 'age': [new Uint8(), true], } satisfies Fields; // satisfies is very import, it doesn't erase the keys! // Now, `schema`, techincally has the generic specific for my type // When using `.fields` you get something out of it: // But it's just Field<Utf8 | Uint8>[]... What the hell, I lose some information, namely the key! export const schema = schemaFactory(fields);
So I would like to ask if there's any chance to modify the library to return more meaningful types. This would break some code, since the types returned would change and some dev might depend on the type defintions (altough not super useful).
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 3 replies
-
We regularly make new major releases and stricter types are usually good as they avoid bugs.
Beta Was this translation helpful? Give feedback.
All reactions
-
FYI: We can change the release cycle because we split the JS implementation from apache/arrow. We can release a new major version only when we need. We can release minor/patch versions when we need.
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1
-
First of all, thanks for the crazy quick reply: first time that happens to me in an open source project!
I like that DuckDB allows me to access static datasets, however as domortiz stated, the moment I started working on my frontend I realized I had no typescript checking (just the good old tsc blindly using any type).
I have very little knowledge on the underlying library structure, but I'm pretty sure some refactoring will be required. I would start by proposing a change that shouldn't brake anything, and is super easy to implement: let the Schema constructor accept a typed dictionary like:
const schema = new Schema({ 'name': new AnonymousField(...args...), 'age': new AnonymousField(...args...), }, ...args...)
This would simply call new Field(key, ...args...) under the hood, just like my example.
This is simple and won't need much testing, but next part is better, if it gets properly tested (as it's a bit more complex).
Checking the Field class:
Lines 106 to 146 in 90c1db1
It looks to me we could get away with a simple dictionary with a {...args...} as const as IAnonField:
interface IAnonField<T> {
name: string,
type: T,
nullable?: boolean,
// This also is a bit sus, Why allow null *and* udnefined? Is it intended? Idk...
metadata?: Map<string, string> | null
}
const x = { type: arrow.Int32 } as const satisfies IAnonField;
x.type; // this should show arrow.Int32 when hovering over it in vscodium
This means we let typescript know the field type, which is actually a value available at runtime in the javascript engine. I'll try working on this tomorrow instead, looks super promising 😄
Beta Was this translation helpful? Give feedback.
All reactions
-
Note: Javascript objects do not guarantee insertion order IF the key type differs (I see both strings and numbers are allowed in the same schema - like a field named
1and one named"name")
Example of writer that uses both strings and numbers as keys (Symbols are not supported, so we don't even need to work with them):
// write.js import { tableToIPC, Schema, Field, Uint8, Float32, Table, makeData, vectorFromArray } from 'apache-arrow'; import * as fs from 'fs'; const schema = new Schema([ new Field('precipitations', new Float32()), new Field('0', new Uint8()), ]); const table = new Table(schema, { age: makeData({ type: new Float32(), length: 3, data: vectorFromArray([1, 2, 3.14]) }), 0: makeData({ type: new Uint8(), length: 3, data: vectorFromArray([7, 8, 9]) }), }); fs.writeFileSync('simple.arrow', tableToIPC(table, 'file'));
And then, if we read we can see column order retention:
// read.js import { tableFromIPC } from 'apache-arrow'; import * as fs from 'fs'; const table = tableFromIPC(fs.readFileSync('simple.arrow')); console.log(table.schema.fields); // Now you should see that the field '0' comes after 'precipitations', "not standard", but expected behavior IMHO
With that said, if we look into the friendlier makeTable & Co. the result is that objects are used from the get-go. So field order isn't crucial, from my understanding, however the underlying structure should probably respect it.
See PR #233
Beta Was this translation helpful? Give feedback.