SQLite might be all you need
Table of Contents
Most software engineer overcomplicate their architecture by thinking they are doing the right thing, they are building a more robust application, or that by doing that, they will be able to scale to millions of users, even when they’re just working on a side project that might never see the light. That is what we call Overengineer.**
I mean, SQLite has obvious limitations which makes it not suitable for a lot of use cases. But if they don’t apply to your situation, which is a lot of cases of building new an MVP or a simple app, it might be just enough and save you so much overhead! Since your database is just a simple file standing next to your app, it’s the most popular use case is for a mobile app, which only would take new data from one user, avoiding concurrent writing issues.
The not so good about SQLite #
Let’s go through the cases in which you do not want to use SQLite:
The lack of server-client architecture: SQLite is self-contained that lays as a simple file on your hardware, usually between the application’s code. It does not offer natively any form of server-client architecture, which makes it unable to handle multiple users or processes accessing the database simultaneously.
No user management: SQLite does not support natively any type of user management, which makes it not suitable for any application that would require role-based access or user authentication.
The limited data types: When working with an SQLite db, you will be limited to only 4 primitive data types:
- INTEGER
- REAL (equivalent to python’s float)
- TEXT
- BLOB (equivalent to python’s boolean)
Of course, you can compose other languages datatype through these, for instance .NET’s
DateTime
would be stored as a TEXT. But the lack of some simple data types can come as very inconvenient, like JSON or Arrays.The limited scalability: The most obvious point. SQLite is not suitable for applications that require high concurrency. It lacks capability, especially in handling heavy wrights, due to its architecture of self-contained system.
The last point is, I think, the one for which most people ditch SQLite when planning the architecture of their application.
But let’s be real. When we imagine that our new side project or startup idea are going to scale to thousands, or even millions of users, either we’re going to never attain that level (if we even launch the product), or in the case we really end up scaling to a scale that cannot be handled by SQLite, we would have had the time to validate that app’s market, and we can allocate more time and resources to build a more robust architecture.
Using SQLite would lighten the workload of building the application by almost nullifying the setup time of the database, which is something that can help a lot to test quickly the application against real users.
SQLite’s performances trials #
Let’s run some tests to evaluate SQLite’s performances with.
Writing performances #
Let’s suppose we are writing batches of 1000 rows. In an over simplified case, how many rows can we write in one second in our SQLite db ?
import sqlite3
import time
conn = sqlite3.connect('test.db')
conn.execute('CREATE TABLE IF NOT EXISTS test_table (id INTEGER PRIMARY KEY, data TEXT NOT NULL)')
batch_size = 1000
rows_inserted = 0
start_time = time.time()
while time.time() - start_time < 1.0:
rows = [('test data',) for _ in range(batch_size)]
conn.executemany('INSERT INTO test_table (data) VALUES (?)', rows)
rows_inserted += batch_size
print('Rows written in 1 second: {}'.format(rows_inserted))
conn.close()
This code returns me Rows written in 1 second: 1874000
. It would be slower in a real world example, but it still seems like the writing capabilities are more than enough for many simple applications.
Reading Performances #
Let’s now try to read as many lines possible from the previously created database. We will search for a random id, one line at a time:
import random
import sqlite3
import time
conn = sqlite3.connect('test.db')
cur = conn.cursor()
rows_read = 0
start_time = time.time()
while time.time() - start_time < 1.0:
random_int = random.randint(0, 10000)
cur.execute('SELECT * FROM test_table WHERE id = (?)', (random_int,))
rows_read += 1
print('Rows read in 1 second: {}'.format(rows_read))
cur.close()
conn.close()
This returns me Rows read in 1 second: 263745
, which are decent performances for reading one line at a time, considering we’re using python, and we are filtering on a non indexed column. Python is what is limiting the performances the most here.
Updating performances #
Now, we do almost the same query as we did for reading, but we are updating the rows:
# This line
cur.execute('SELECT * FROM test_table WHERE id = (?)', (random_int,))
# Become this line
cur.execute('UPDATE test_table SET data = (?) WHERE id = (?)', ("new_text", random_int,))
The results on my machine are Rows updated in 1 second: 731407
If it is still not enough, and you still need more performance, scalability, user management, etc… You might want to use PostgreSQL 💖🐘
SQLite does not compete with client/server databases. SQLite competes with fopen()