7

I've a table which contains text data containing a bunch of translations for different language. Each translation is for a specific label.

I need to generate a pivot table so to get quickly what's missing.

An example of the records is

1, en, hello
1, fr, bonjour
1, es, hola
2, en, how are you
2, fr, 
3, es, come es stas

Although translations should be always there for all language, I'm not 100% sure this is the case. So missing fields have to be considered.

The desired outcome is this

|ID|EN|FR|ES|
|1|hello|bonjour|hola|
|2|how are you| |come es stas|

The challenge I have is that the column order may not always be the same when the database is populated, so in theory I should have a dynamic list of fields.

There is not a direct PIVOT function in SQLite, so I started experimenting with the group_concat obtaining a comma separated string.

SELECT DISTINCT language, group_concat(word, ',') OVER (PARTITION BY language) AS group_concat
FROM vocabulary;

I can ran past the results later on in Python if needed; the issue is that any missing value is not appending an empty item thus shifting all the concatenation by n, thus making this solution not valid.

I have attempted also to use the filter clause in the select predicate (though this mean hardcoding the columns), but I was not able to succeed.

Any idea on how this can be achieved?

asked May 6, 2020 at 17:00
2
  • 1
    This enterprising fellow has implemented pivot as an sqlite extenison. Commented Apr 3, 2022 at 2:07
  • Sounds promising, though the *_id might be confusing at first Commented Apr 4, 2022 at 6:55

3 Answers 3

3

I was able to arrive to a semi-satisfactory solution, though the only thing missing is a dynamic column selection.

SELECT id, 
 MAX(CASE WHEN "language" == 'it' THEN word END) as 'it',
 MAX(CASE WHEN "language" == 'en' THEN word END) as 'en',
 MAX(CASE WHEN "language" == 'ru' THEN word END) as 'ru'
FROM (select t.*,
 row_number() over (partition by language order by id) as seq
 from vocabulary t
 ) t
GROUP BY t.seq;

Regrettably SQLite doesn't support dynamic queries, so these have to be constructed via the callee, in my case a Python script.

answered May 7, 2020 at 8:14
2

Yes, you can have dynamic columns in a pivot table in SQLite. You'll have to use extensions, though.

The first option is to use dynamic SQL and the eval(sql) function from the define extension.

Create the schema and load data:

create table phrases(phrase_id integer, lang_code text, phrase text);
insert into phrases(phrase_id, lang_code, phrase)
values
(1, 'en', 'hello'),
(1, 'fr', 'bonjour'),
(1, 'es', 'hola'),
(2, 'en', 'how are you'),
(2, 'fr', null),
(2, 'es', 'come es stas');

Build and execute a dynamic query that creates the v_phrases view:

with langs as (
 select distinct lang_code as code
 from phrases
),
lines as (
 select 'drop view if exists v_phrases; ' as part
 union all
 select 'create view v_phrases as '
 union all
 select 'select phrase_id ' as part
 union all
 select ', max(phrase) filter (where lang_code = "' || code || '") as "' || code || '" '
 from langs
 union all
 select 'from phrases group by phrase_id order by phrase_id;'
)
select eval(group_concat(part, ''))
from lines;

Query the view:

select * from v_phrases;
┌───────────┬─────────────┬─────────┬──────────────┐
│ phrase_id │ en │ fr │ es │
├───────────┼─────────────┼─────────┼──────────────┤
│ 1 │ hello │ bonjour │ hola │
│ 2 │ how are you │ │ come es stas │
└───────────┴─────────────┴─────────┴──────────────┘

The second (and probably simpler) option is to use the pivot extension. See Building a Pivot Table in SQLite article for details.

answered Feb 9, 2023 at 21:25
1

The following query will return all IDs that do not have all translations:

SELECT id, COUNT(*) AS count
FROM vocabulary
GROUP BY id
HAVING count < (SELECT COUNT(DISTINCT language)
 FROM vocabulary);

If you want to know which languages are missing for each ID, use a compound query to look up those languages that are in the list of all languages but not in those for this ID:

WITH missing_ids AS (
 SELECT id, COUNT(*) AS count
 FROM vocabulary
 GROUP BY id
 HAVING count < (SELECT COUNT(DISTINCT language)
 FROM vocabulary)
)
SELECT id,
 (SELECT group_concat(language)
 FROM (SELECT DISTINCT language FROM vocabulary
 EXCEPT
 SELECT language FROM vocabulary WHERE id = missing_ids.id)
 ) AS missing_languages
FROM missing_ids;
answered May 7, 2020 at 7:23
4
  • Hi Thanks for this. This actually start solving the problem. I assume that having a separate column for each of the items like in a pivot table is not possible, right? Commented May 7, 2020 at 7:30
  • Not automatically. Commented May 7, 2020 at 9:05
  • Any chance I can create a list of columns with the with statement and attach to the solution I posted above? Commented May 7, 2020 at 9:28
  • 2
    No; SQLite cannot execute any kind of dynamic SQL; you'd have to construct the query in Python. Commented May 7, 2020 at 9:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.