Designing DB with AI
Designing PostgreSQL Database Using AI: Why You Should Avoid ENUMs & How MySQL Offers More Storage Options
AI-powered tools like ChatGPT, Gemini, and DeepSeek have changed how developers approach database design. Today, teams routinely use AI to generate schemas, validate ideas, and build quick prototypes. But how good are these AI-generated schemas in real-world applications?
As we experimented with AI for designing our PostgreSQL database, we noticed that AI often missed crucial best practices. In this post, we will share some real-world scenarios where AI could have done a better job, and we will also compare PostgreSQL and MySQL in terms of storage optimization and data type flexibility.
When we started using it, we were expecting AI to not only generate the schema quickly but also for ensure by default
  • Implement the best practices or thumb rules of database design.
  • Infer relationship & take care of suggesting appropriate constraints.
  • Choose the correct data types for the column.
  • Optimize from a storage and query performance.
The reality was quite different & in this post we present few scenarios where the team thought AI could have done a better job in the first go.
Reality check #1: Lack of foreign key constraints
In the initial script generated by AI, there were no foreign key constrains defined at all. Defining foreign keys is a basic database design principle & we thought the initial prompts were explicit for AI to generate the schema with foreign constraints. We had to repeatedly tell AI few times before it started doing it.
AI generated schema without foreign keys

                        CREATE TABLE foobar(
                        id SERIAL PRIMARY KEY,
                        user_id INTEGER,  -- Missing foreign key constraints
                        amount DECIMAL(10,2)
                        );
                    
Corrected Schema (With Foreign Key)


                        CREATE TABLE foobar (
                            id SERIAL PRIMARY KEY,
                            user_id INTEGER REFERENCES users(id),
                            amount DECIMAL(10,2)
                        );

                    
Reality check #2: Choosing sub optimal field types
In the following table say where we had ‘difficulty level’ column which can potentially have values like ‘Easy’, ‘Medium’, ‘Hard’. AI proposed varchar column with the recommendation that we store actually literal strings like ‘Easy’, ‘Medium’, ‘Hard’.

                        CREATE TABLE questions (
                            id SERIAL PRIMARY KEY,
                           
                            difficulty_level VARCHAR(20) 
                            CHECK (difficulty_level IN (‘Easy’, ‘Medium’, ‘Hard’)),
                            tags JSONB DEFAULT ‘{}’::JSONB
                           );

                    
Whoever has worked in database for a while would probably suggest you would keep difficulty_level column as SMALLINT and store values like 1,2,3 etc. in it and also define a difficulty_levels lookup table. So the final table schema looks something like below

                        CREATE TABLE difficulty_levels (
                            id SMALLINT PRIMARY KEY,
                            level_name TEXT UNIQUE
                        );
                        
                        INSERT INTO difficulty_levels (id, level_name) 
                        VALUES (1, 'Easy'), (2, 'Medium'), (3, 'Hard');
                        
                        CREATE TABLE questions (
                            id SERIAL PRIMARY KEY,
                            difficulty_level SMALLINT REFERENCES difficulty_levels(id)
                        );

                    
Initially difficulty_level was defined as INTEGER by AI.We are anticipating to have millions of rows in the table & row storage, index storage efficiency, and query performance is a prime concern for us.INTEGER is 4 bytes is an overkill so we had an option to either use SMALLINT or ENUM. SMALLINT uses 2 bytes which is also an overkill for storing 1,2,3 values. So we thought of using ENUM type which takes only 1 bytes. Although ENUM would be good from storage perspective, if the ENUM is part of an index then index ends up taking more space then an index would take if difficulty_level column was defined as SMALLINT. This is because
  • Even though ENUM is 1 byte, PostgreSQL stores an internal mapping to the text value, meaning indexes must store both the internal mapping and the actual text reference.
  • SMALLINT is 2 bytes but fixed, making B-Tree indexes smaller and faster to traverse.

Also from query performance standpoint , ENUMs don’t do well in comparison to SMALLINT because
  • In PostgreSQL, ENUM is implemented as a custom data type stored as text internally, not as an integer.
  • Even though the storage size is 1 byte, lookups involve extra overhead in mapping ENUM values to their text representation.

So summary as following table indicates it is better to stick with SMALLINT then go with ENUM
Data Type Storage per Row Index Efficiency Flexibility Lookup Performance
ENUM (1 byte) ✅ Smaller (1 byte) ❌ Worse (extra lookup) ❌ Fixed set, hard to change ❌ Text lookup overhead
SMALLINT (2 bytes) ❌ Slightly larger (2 bytes) ✅ Faster (B-Tree optimized) ✅ Can modify values easily ✅ Direct integer comparison
Reality check #3: Missing the perfect use case for JSONB column
We want to tag each row in the foobar table with variable number of tags, AI in this case ended up suggesting one column for each tag which we had given as example when prompting.

                        CREATE TABLE foobar 
                        ( id SERIAL PRIMARY KEY,
                        tag1 VARCHAR(50), 
                        tag2 VARCHAR(50), 
                        tag3 VARCHAR(50) );
                    

This will not work for us because number of tags are not fixed and defining a jsonb column called as tags is perfect in this case.
                        
                            CREATE TABLE questions (
                                id SERIAL PRIMARY KEY,
                                tags JSONB DEFAULT '{}'::JSONB
                            );
                        
                    
Conclusion: AI is a Tool, Not a Replacement for Thoughtful Database Design

AI has undoubtedly made database schema generation more accessible, but it’s crucial to approach it with realistic expectations. While AI can assist in structuring schemas, human expertise remains irreplaceable in optimizing for space efficiency, indexing strategies, and query performance.

When designing a PostgreSQL database, avoiding ENUMs can prevent long-term migration and maintenance challenges, even though they might seem efficient in the short term.

On the other hand, MySQL provides more flexibility in storage choices, offering a wider range of options ( TinyINT, MediumINT ) to fine-tune performance based on specific needs.

Data Type PostgreSQL Equivalent Storage (Bytes) Best Use Case
TINYINT ❌ Not Available 1 byte Flags, small counters
SMALLINT SMALLINT 2 bytes Small numeric values
MEDIUMINT ❌ Not Available 3 bytes Compact storage for moderate ranges
INT INTEGER 4 bytes Default choice for numeric IDs
BIGINT BIGINT 8 bytes Large numeric values (e.g., timestamps)

Ultimately, good database design is about trade-offs — balancing readability, scalability, and storage efficiency. AI can be a helpful assistant in this process, but the final decisions should always be informed by a deep understanding of how databases actually work.