blob: 1cf6c7c7adc06c7439d24c8d78d5460e17208e9c [file] [log] [blame]
giolekvadba5b2f2020-03-25 20:28:06 +04001# PFS - PCloud File System
2## Overview
3PFS is a core building block of PCloud providing distributed blob storage, with replication for redundancy and high availability.
4It is designed along the lines of GFS (Google File System): <https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf>
5PFS consists of two main components: Controller and Chunk servers. Chunk servers store actual data while controller maintains a global view and acts on changes in the cluster.
6
7## Goals & Requirements
8* It must be easy to add new chunk servers. Controller should automatically pick it up and use it for storage.
9* Taking out a chunk server must trigger re-replication of chunks stored there.
10* Controller must assigne chunks so that load is equally distributed among chunk servers and improve throughput.
11* Blob sizes are known at the time of creation. This simplifies a design but can be reconsidered.
12
13## Concepts used in the document:
14* Blob: represents single file. Blobs have globally unique ids.
15* Chunk: blobs are split into one or more chunks with equal sizes. Last chunk might be smaller than others. Chunks have globally unique ids.
16* Chunk server: RPC server storing chunks.
17* Controller: RPC server coordinating blob/chunk creation and their assignments to chunk servers.
18* Chunk replica: same chunk might be stored on multiple chunk servers to achieve high availability. Such copies are called chunk replicas.
19* Chunk assignment: list of chunk servers storing particular chunk.
20* Primary replica: when uploading new chunk, one of the replicas will act as primary. Receiving data from the client.
21* Secondary replica: all non-primary replicas are secondary. They replicate data from primary replica.
22
23## Detailed design
24Chunk servers maintain list of chunks they store. Actual chunk payloads will be stored on local disk using OS provided file system. Whole metadata, chunk server needs to maintain its state, must be periodically persisted on disk so chunk server can quickly recover upon failure.
25
26Chunk ids will be represented as [RFC 4122](https://tools.ietf.org/html/rfc4122) compliant 128 bit UUID.
27Chunk metadata will consist of:
28```golang
29type ChunkInfo struct {
30 // Status of the chunk: NEW, CREATED, ..., READY
31 Status ChunkStatus
32 // Total size of chunk in bytes
33 Size int
34 // Number of bytes committed to disk
35 Committed int
36}
37```
38Total of 16 + 3 * 32 = 112 bytes are needed to store single chunk metadata. On top if this thread-safe hash map backed ChankInfoStore structure will be built with two Load and Store methods. Store method will update in memory hash map and also append it to transaction logs. Background process will compact transaction logs periodically and persist full hash map contents on disk.
39
40Controller will not persist any data locally. Instead it will receive state of chunk servers periodically using heart beats. This makes it is easier to keep metadata stored in controller and chunk servers consistent.
41