Shard (database architecture)
A database shard, or simply a shard, is a horizontal partition of data within a database or search engine. Each shard may be held on a separate database server instance in order to spread across multiple servers.
Some data in a database may remain present in all shards,[a] while other data is stored in only one shard. In such cases, each shard acts as the single source for its subset of data.[1]
Database architecture
Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns (as in normalization and vertical partitioning, to varying degrees ). Each partition forms part of a shard, which may in turn be located on a separate database server or in a separate physical location.
There are numerous advantages to the horizontal partitioning of data. Since tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reduces index size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables the distribution of a database across a large number of machines, which can significantly improve performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g., European customers v. American customers) it may be possible to infer the appropriate shard membership easily and automatically, and to query only the relevant shard.[2]
In practice, sharding is complex. Although it has long been implemented through manual coding (especially where rows have an obvious grouping, as in the customer region example above), this approach is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately. Consistent hashing is a technique used in sharding to distribute large loads across multiple smaller services and servers.[3]
Where distributed computing is used to separate load between multiple servers (either for performance or reliability reasons), a sharding approach may also be useful. In the 2010s, sharding of execution capacity, as well as the more traditional sharding of data, emerged as a potential approach to address performance and scalability challenges in blockchains.[4][5]
Recent academic work has proposed protocols such as Cerberus to address cross-shard atomicity by braiding consensus across multiple shards, allowing transactions to affect multiple partitions simultaneously without requiring a global lock.[6]
Compared to horizontal partitioning
Horizontal partitioning splits one or more tables by row, usually within a single instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort), provided there is an obvious, robust, and implicit way to identify in which partition a particular row will be found, without having to first search the index; for example, the classic case of the 'CustomersEast' and 'CustomersWest' tables, where a ZIP code already indicates where a row will be found.
Sharding extends this approach. It partitions the relevant table or tables in the same way, but does so across potentially multiple instances of the schema. An advantage is that the search load for the large partitioned table can be distributed across multiple servers (logical or physical), rather than only across multiple indexes on a same logical server.
Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The expected gains in efficiency would be reduced if querying the database required multiple instances to be accessed just to retrieve a simple dimension table. Beyond partitioning, sharding therefore involves distributing large, partitionable tables across servers, while smaller tables are replicated in full on each server.[7]
This is also why sharding is related to a shared-nothing architecture—once sharded, each shard can reside in a separate logical schema instance, physical database server, data center, or geographic region. Sharding is intended to minimize the need for cross-shard access by partitioning data across independent shards.[8]
This makes replication across multiple servers easier (simple horizontal partitioning does not). It is also useful for worldwide distribution of applications, where communications links between data centers might otherwise become a bottleneck.[9]
There is also a requirement for some notification and replication mechanism between schema instances, so that unpartitioned tables remain as closely synchronized as the application requires. This is a complex architectural choice in sharded systems: approaches range from making these tables effectively read-only (with updates that are rare and batched), to dynamically replicated tables (at the cost of reducing some of the distribution benefits of sharding), and many options in between.[10]
Implementations
- Altibase provides a combined (client-side and server-side) sharding architecture transparent to client applications.
- Apache HBase supports automatic sharding.[11]
- Azure SQL Database Elastic Database tools support support sharding to enable scaling out and in of an application’s data tier.[12]
- ClickHouse, an open-source OLAP database management system, supports sharding.
- Couchbase supports automatic and transparent sharding.
- CUBRID has supported sharding since version 9.0.
- Db2 Data Partitioning Feature (MPP), a shared-nothing database partitioning feature, runs on separate nodes.
- DRDS (Distributed Relational Database Service) of Alibaba Cloud supports database and table sharding,[13] and has been used for large-scale events such as Singles' Day.[14]
- Elasticsearch, an enterprise search server, supports sharding.[15]
- eXtreme Scale is a cross-process in-memory key/value data store (a NoSQL data store) that uses sharding to achieve scalability across processes for both data and MapReduce-style parallel processing.[16]
- Hibernate supports sharding, but has seen little development since 2007.[17][18]
- IBM Informix has supported sharding since version 12.1 xC1 as part of the MACH11 technology. Informix 12.10 xC2 added full compatibility with MongoDB drivers, allowing a mix of regular relational tables and NoSQL collections while retaining sharding, fail-over, and ACID properties.[19][20]
- Kdb+ has supported sharding since version 2.0.
- MariaDB Spider, a storage engine, supports table federation, sharding, XA transactions, and ODBC data sources. It has been included in MariaDB server since version 10.0.4.[21]
- MonetDB, an open-source column-store, introduced read-only sharding in its July 2015 release.[22]
- MongoDB has supported sharding since version 1.6.[23]
- MySQL Cluster supports automatic and transparent sharding across commodity nodes, allowing scaling of read and write queries, without requiring application changes.[24]
- MySQL Fabric (part of MySQL utilities) supports sharding.[25]
- Oracle Database shards since 12c Release 2 and in one liner: Combination of sharding advantages with well-known capabilities of enterprise ready multi-model Oracle Database.[26]
- Oracle NoSQL Database supports automatic sharding and elastic, online expansion of clusters.
- OrientDB has supported sharding since version 1.7.
- Solr, an enterprise search platform, supports sharding.[27]
- ScyllaDB uses per-core sharding within a server and across all nodes in a cluster.
- Spanner, a distributed database developed by Google, shards across multiple Paxos state machines to scale to large numbers of machines, data centers, and rows.[28]
- SQLAlchemy ORM, a data-mapper for the Python programming language shards.[29]
- SQL Server has supported sharding since SQL Server 2005 with the use of 3rd party tools.[30]
- Teradata markets a massive parallel database management system as a data warehouse.
- Vault, a cryptocurrency design, uses sharding to reduce the data required to join the network and verify transactions, improving scalability.[31]
- Vitess, an open-source database clustering system, supports sharding for MySQL and is a Cloud Native Computing Foundation project.[32]
- ShardingSphere is a database clustering system that provides data sharding, distributed transactions, and distributed database management, and is an Apache Software Foundation (ASF) project.[33]
Disadvantages
Sharding a database table before it has been optimized locally can introduce unnecessary complexity. Sharding is generally recommended when other optimization strategies have proven insufficient. The added complexity of database sharding can result in several potential challenges.[34]
- SQL complexity: Developers may need to to write more complex SQL queries to handle sharding logic
- Additional software requirements: Software that partitions, balances, coordinates, and maintains data integrity can fail or introduce errors.
- Single point of failure: Corruption or failure of one shard due to network, hardware, or system issues can affect the integrity of the entire dataset.
- Fail-over server complexity: Fail-over servers must maintain copies of all database shards.
- Backups complexity: Database backups of the individual shards must be coordinated with the backups of the other shards.
- Operational complexity: Tasks such as adding or removing indexes, modifying columns, or altering the schema become more difficult in a sharded environment.
Etymology
In a database context, the term "shard" is believed to derive from one of two sources: Computer Corporation of America's "A System for Highly Available Replicated Data,"[35] which used redundant hardware to facilitate data replication rather than horizontal partitioning, or the 1997 MMORPG Ultima Online.[36][37]
Richard Garriott, creator of Ultima Online, has recalled that the term originated during the production of the game when the team attempted to create a self-regulating, virtual ecology system. Players were able to interact and harvest in-game resources via the internet, which disrupted the balance of the system.[37] To address this, the development team separated the global player base into multiple sessions and introduced part of Ultima Online's fictional connection to the end of Ultima I: The First Age of Darkness, where the defeat of its antagonist Mondain also led to the creation of multiverse "shards." This modification provided Garriott's team with the fictional basis needed to justify creating copies of the virtual environment. The feature was later removed after several months of testing.[37]
Today, the term "shard" refers to the deployment and use of redundant hardware across database systems.
See also
Notes
- ^ Typically supporting data such as dimension tables.
References
- ^ Sadalage, Pramod J.; Fowler, Martin (2012). "4: Distribution Models". NoSQL Distilled. Pearson Education. ISBN 978-0321826626.
- ^ Rahul Roy (July 28, 2008). "Shard - A Database Design".
- ^ Ries, Eric. "Sharding for Startups".
- ^ Wang, Gang; Shi, Zhijie Jerry; Nixon, Mark; Han, Song (21 October 2019). "SoK". Proceedings of the 1st ACM Conference on Advances in Financial Technologies. pp. 41–61. doi:10.1145/3318041.3355457. ISBN 9781450367325. S2CID 204749727.
- ^ Yu, Mingchao; Sahraei, Saeid; Nixon, Mark; Han, Song (18 July 2020). "SoK: Sharding on Blockchain". Proceedings of the 1st ACM Conference on Advances in Financial Technologies. pp. 114–134. doi:10.1145/3318041.3355457. ISBN 9781450367325. S2CID 204749727.
- ^ Hellings, Jelle; Sadoghi, Mohammad (2021). "Cerberus: Minimalistic Multi-shard Byzantine-resilient Transaction Processing" (PDF). Proceedings of the VLDB Endowment. 14 (11): 2230–2243. doi:10.14778/3476249.3476274.
- ^ "Database Sharding: Concepts & Examples". MongoDB. Retrieved 2026-03-20.
- ^ "Understanding Database Sharding". DigitalOcean Community Tutorials. 2022-03-16. Retrieved 2025-10-09.
Database shards exemplify a shared-nothing architecture. This means that the shards are autonomous; they don't share any of the same data or resources.
- ^ "A Guide To Horizontal Vs Vertical Scaling". MongoDB. Retrieved 2026-03-20.
- ^ "Sharding - Database Manual - MongoDB Docs". www.mongodb.com. Retrieved 2026-03-20.
- ^ "Apache HBase – Apache HBase™ Home". hbase.apache.org.
- ^ "Introducing Elastic Scale preview for Azure SQL Database". azure.microsoft.com. 2 October 2014.
- ^ "Alibaba Cloud Help Center - Cloud Definition and Explanation of Cloud Based Services - Alibaba Cloud". www.alibabacloud.com.
- ^ "Focuses on Large-Scale Online Databases - Alibaba Cloud". www.alibabacloud.com.
- ^ "Index Shard Allocation | Elasticsearch Guide [7.13] | Elastic". www.elastic.co.
- ^ "IBM Docs".
- ^ "Hibernate Shards". 2007-02-08.
- ^ "Hibernate Shards". Archived from the original on 2008-12-16. Retrieved 2011-03-30.
- ^ "New Grid queries for Informix". Archived from the original on 2015-06-10. Retrieved 2013-10-07.
- ^ "NoSQL support in Informix (JSON storage, Mongo DB API)". September 24, 2013.
- ^ "Spider". MariaDB KnowledgeBase. Retrieved 2022-12-20.
- ^ "MonetDB July2015 Released". 31 August 2015.
- ^ "MongoDB Sharding". MongoDB. Retrieved 2026-03-20.
- ^ "MySQL Cluster Features & Benefits". 2012-11-23.
- ^ "MySQL Fabric sharding quick start guide".
- ^ "Oracle Sharding". Oracle. 2018-05-24. Retrieved 2021-07-10.
- ^ "DistributedSearch - SOLR - Apache Software Foundation". cwiki.apache.org.
- ^ Corbett, James C; Dean, Jeffrey; Epstein, Michael; Fikes, Andrew; Frost, Christopher; Furman, JJ; Ghemawat, Sanjay; Gubarev, Andrey; Heiser, Christopher; Hochschild, Peter; Hsieh, Wilson; Kanthak, Sebastian; Kogan, Eugene; Li, Hongyi; Lloyd, Alexander; Melnik, Sergey; Mwaura, David; Nagle, David; Quinlan, Sean; Rao, Rajesh; Rolig, Lindsay; Saito, Yasushi; Szymaniak, Michal; Taylor, Christopher; Wang, Ruth; Woodford, Dale. "Spanner: Google's Globally-Distributed Database" (PDF). Proceedings of OSDI 2012. Retrieved 24 February 2014.
- ^ "sqlalchemy/sqlalchemy". July 9, 2021 – via GitHub.
- ^ "Partitioning and Sharding Options for SQL Server and SQL Azure". infoq.com.
- ^ "A faster, more efficient cryptocurrency". MIT News. 24 January 2019. Retrieved 2019-01-30.
- ^ "Vitess". vitess.io.
- ^ "ShardingSphere". shardingsphere.apache.org.
- ^ "Database Sharding: Concepts & Examples". MongoDB. Retrieved 2026-03-20.
- ^ Sarin, DeWitt & Rosenberg, Overview of SHARD: A System for Highly Available Replicated Data, Technical Report CCA-88-01, Computer Corporation of America, May 1988
- ^ Koster, Raph (2009-01-08). "Database "sharding" came from UO?". Raph Koster's Website. Retrieved 2015-01-17.
- ^ a b c "Ultima Online: The Virtual Ecology | War Stories". Ars Technica Videos. 21 December 2017.