We're excited to announce that we are adding support for vector storage and search into MySQL and PlanetScale. Soon, you'll be able to use PlanetScale as a vector database for all of your AI needs without needing to adopt a second tool.
You can sign up to be notified of release at
A vector is a one-dimensional array of real number values:
[1, 1] is a vector, as is
[1.5, 8.889, 9.234]. Each element represents an attribute or a dimension, and the position of that element in the array represents a 'point'. In three-dimensional space, a vector of
[2, 3, 5] would have 2 as the x-coordinate, 3 as the y-coordinate and 5 as the z-coordinate. In real-world applications of artificial intelligence, vectors are in significantly higher dimensionality — well above 1,000!
Modern databases are already very good at storing lists of numbers. Storing vectors as a raw datatype is not interesting, like points on a globe or positions on a chessboard. Vectors are useful when applied, like latitude and longitude or the position of a queen and the opponent's rook.
What makes vectors useful is a technique called embedding, which uses machine learning to transform arbitrary data like a picture, song, or sensor data into a vector. This creates a uniform numerical representation of that data which can be transmitted over the network and stored. Inside of that storage engine, they can be compared to other transformed data and analyzed for similarity, using mathematical operations like the cosine similarity.
For a long read that covers this and a lot more, I recommend reading Stephen Wolfram's What Is ChatGPT Doing … and Why Does It Work?. I recommend the entire blog, but you can skip to the section titled "The Concept of Embeddings" which does a great job of explaining the concept, and how they are applied.
We know what a vector is, and storing the data is still straightforward — you can use a
BLOB type and start writing arrays into MySQL today! So what extra support does MySQL need anyway?
That's where vector-specific indexing comes into play. This is what we're adding to MySQL, along with a first-class vector data type. Specifically, we'll be implementing the state-of-the-art Hierarchical Navigable Small World (HNSW) algorithm, which constructs optimized graph structures that make it efficient to search vector similarity in large datasets.
Imagine a database containing a record for every document written at your company, trained on a machine learning algorithm that can identify attributes like what project it's for, what team wrote it, and other useful information. If a user opens up one of the documents, a common task would be to find everything similar — documents for the same project, written by the same team, or that cover the same workstreams.
Without an index, you would have to iterate over every document's vector in the database and compare them for similarity. At scale, this could take a while, and the performance would be awful! Using an index, you can efficiently traverse the graphs of vectors, and quickly present the user their meeting notes from the status meeting last week, or the design document for one component of their project. This is 'vector search' in a nutshell.
PlanetScale already maintains a fork of MySQL and we'll be adding vector types and indexes to it. When released, we'll run that MySQL fork in PlanetScale as we do today. We will publish packages and containers for our PlanetScale-flavored MySQL that will allow users to test and develop locally.
If you're a current PlanetScale customer, this will be transparent: one day you'll automatically gain the ability to do vector storage and retrieval.
AI/ML apps that want to harness the power, stability, reliability, and scalability of MySQL. Instead of adopting a second database just for vectors, you'll be able to do the same storage and retrieval right in PlanetScale, reducing cost and operational burden significantly.
It's exciting to see vector workloads working on MySQL. We are committed to maintaining a stable, reliable, and highly available product. We will continue to test our new vector support under rigorous workloads to ensure it meets our high standards before release.