I've a table which contains text data containing a bunch of translations for different language. Each translation is for a specific label.
I need to generate a pivot table so to get quickly what's missing.
An example of the records is
1, en, hello
1, fr, bonjour
1, es, hola
2, en, how are you
2, fr,
3, es, come es stas
Although translations should be always there for all language, I'm not 100% sure this is the case. So missing fields have to be considered.
The desired outcome is this
|ID|EN|FR|ES|
|1|hello|bonjour|hola|
|2|how are you| |come es stas|
The challenge I have is that the column order may not always be the same when the database is populated, so in theory I should have a dynamic list of fields.
There is not a direct PIVOT function in SQLite, so I started experimenting with the group_concat obtaining a comma separated string.
SELECT DISTINCT language, group_concat(word, ',') OVER (PARTITION BY language) AS group_concat
FROM vocabulary;
I can ran past the results later on in Python if needed; the issue is that any missing value is not appending an empty item thus shifting all the concatenation by n, thus making this solution not valid.
I have attempted also to use the filter clause in the select predicate (though this mean hardcoding the columns), but I was not able to succeed.
Any idea on how this can be achieved?
-
1This enterprising fellow has implemented pivot as an sqlite extenison.Jeremy Field– Jeremy Field2022年04月03日 02:07:27 +00:00Commented Apr 3, 2022 at 2:07
-
Sounds promising, though the *_id might be confusing at firstAndrea Moro– Andrea Moro2022年04月04日 06:55:20 +00:00Commented Apr 4, 2022 at 6:55
3 Answers 3
I was able to arrive to a semi-satisfactory solution, though the only thing missing is a dynamic column selection.
SELECT id,
MAX(CASE WHEN "language" == 'it' THEN word END) as 'it',
MAX(CASE WHEN "language" == 'en' THEN word END) as 'en',
MAX(CASE WHEN "language" == 'ru' THEN word END) as 'ru'
FROM (select t.*,
row_number() over (partition by language order by id) as seq
from vocabulary t
) t
GROUP BY t.seq;
Regrettably SQLite doesn't support dynamic queries, so these have to be constructed via the callee, in my case a Python script.
Yes, you can have dynamic columns in a pivot table in SQLite. You'll have to use extensions, though.
The first option is to use dynamic SQL and the eval(sql)
function from the define extension.
Create the schema and load data:
create table phrases(phrase_id integer, lang_code text, phrase text);
insert into phrases(phrase_id, lang_code, phrase)
values
(1, 'en', 'hello'),
(1, 'fr', 'bonjour'),
(1, 'es', 'hola'),
(2, 'en', 'how are you'),
(2, 'fr', null),
(2, 'es', 'come es stas');
Build and execute a dynamic query that creates the v_phrases
view:
with langs as (
select distinct lang_code as code
from phrases
),
lines as (
select 'drop view if exists v_phrases; ' as part
union all
select 'create view v_phrases as '
union all
select 'select phrase_id ' as part
union all
select ', max(phrase) filter (where lang_code = "' || code || '") as "' || code || '" '
from langs
union all
select 'from phrases group by phrase_id order by phrase_id;'
)
select eval(group_concat(part, ''))
from lines;
Query the view:
select * from v_phrases;
┌───────────┬─────────────┬─────────┬──────────────┐
│ phrase_id │ en │ fr │ es │
├───────────┼─────────────┼─────────┼──────────────┤
│ 1 │ hello │ bonjour │ hola │
│ 2 │ how are you │ │ come es stas │
└───────────┴─────────────┴─────────┴──────────────┘
The second (and probably simpler) option is to use the pivot
extension. See Building a Pivot Table in SQLite article for details.
The following query will return all IDs that do not have all translations:
SELECT id, COUNT(*) AS count
FROM vocabulary
GROUP BY id
HAVING count < (SELECT COUNT(DISTINCT language)
FROM vocabulary);
If you want to know which languages are missing for each ID, use a compound query to look up those languages that are in the list of all languages but not in those for this ID:
WITH missing_ids AS (
SELECT id, COUNT(*) AS count
FROM vocabulary
GROUP BY id
HAVING count < (SELECT COUNT(DISTINCT language)
FROM vocabulary)
)
SELECT id,
(SELECT group_concat(language)
FROM (SELECT DISTINCT language FROM vocabulary
EXCEPT
SELECT language FROM vocabulary WHERE id = missing_ids.id)
) AS missing_languages
FROM missing_ids;
-
Hi Thanks for this. This actually start solving the problem. I assume that having a separate column for each of the items like in a pivot table is not possible, right?Andrea Moro– Andrea Moro2020年05月07日 07:30:45 +00:00Commented May 7, 2020 at 7:30
-
-
Any chance I can create a list of columns with the
with
statement and attach to the solution I posted above?Andrea Moro– Andrea Moro2020年05月07日 09:28:28 +00:00Commented May 7, 2020 at 9:28 -
2No; SQLite cannot execute any kind of dynamic SQL; you'd have to construct the query in Python.CL.– CL.2020年05月07日 09:37:59 +00:00Commented May 7, 2020 at 9:37