Skip to main content

Migrate contract storage data when upgrading data structures

When a contract is upgraded and a stored data structure gains new fields, the data already written to the ledger still uses the old layout. Naively reading those old entries with the new type causes the host to trap. This guide explains why that happens, introduces the version marker pattern as the correct solution, and covers lazy versus eager migration strategies and how to test them.

Why intuitive approaches fail

Suppose a contract stores DataV1 entries and is upgraded to use DataV2, which adds an optional field c:

#[contracttype]
pub struct Data { a: i64, b: i64 }

#[contracttype]
pub struct DataV2 { a: i64, b: i64, c: Option<i64> }

Approach 1: Read old entries directly with the new type

The most natural approach is to read the stored bytes directly as DataV2 and expect c to default to None:

// Reading a DataV1 entry with the DataV2 type.
// A developer might expect c = None for old entries — but this traps.
let data: DataV2 = env.storage().persistent().get(&key).unwrap();
// Error(Object, UnexpectedSize)

This traps with Error(Object, UnexpectedSize). The Soroban host validates the field count of the XDR-encoded value against the type definition before returning anything to the contract. Because DataV1 has two fields and DataV2 has three, the host rejects the entry before the SDK can handle it.

Approach 2: Use try_from_val as a fallback

Another approach is to use try_from_val expecting to catch a deserialization error and recover:

let raw: Val = env.storage().persistent().get(&key).unwrap();
if let Ok(v2) = DataV2::try_from_val(&env, &raw) {
v2
} else {
// This branch is never reached — the host traps before returning Err.
let v1 = DataV1::try_from_val(&env, &raw).unwrap();
DataV2 { a: v1.a, b: v1.b, c: None }
}

This also traps at the host level. The field count validation happens in the host environment during deserialization — it does not produce a Rust Err that the SDK can intercept. There is no way to catch or recover from the mismatch at the contract level.

The root issue is that a contract cannot determine which type an existing storage entry was written as just by reading it. That information must be stored explicitly.

Version Marker Pattern

The solution is to store a version number alongside each data entry, keyed by the same identifier. The contract reads the version first, then branches on the result to decode the payload with the correct type.

Key layout

Define two variants in your key enum — one for the version marker and one for the payload — both keyed by the same id:

#[contracttype]
pub enum DataKey {
DataVersion(u32), // version marker keyed by id
Data(u32), // data keyed by id
}

Each logical record occupies two storage slots. Because the version is stored per-record rather than globally, each entry is independently versioned. There is no all-or-nothing upgrade requirement.

Reading with version awareness

Before decoding a storage entry, read its version marker. Use unwrap_or(1) to handle entries that were written before versioning was introduced — the absence of a version key is itself a signal that the entry is version 1:

fn read_data(env: &Env, id: u32) -> DataV2 {
let version: u32 = env.storage().persistent()
.get(&DataKey::DataVersion(id))
.unwrap_or(1); // default to v1 for entries without version marker

match version {
1 => {
let v1: DataV1 = env.storage().persistent().get(&DataKey::Data(id)).unwrap();
DataV2 { a: v1.a, b: v1.b, c: None }
}
_ => env.storage().persistent().get(&DataKey::Data(id)).unwrap(),
}
}

Writing always uses the current version

Every write stamps the entry with the current version number. An entry that was originally DataV1 will carry a DataVersion marker of 2 the next time it is written back:

fn write_data(env: &Env, id: u32, data: &DataV2) {
env.storage().persistent().set(&DataKey::DataVersion(id), &2u32);
env.storage().persistent().set(&DataKey::Data(id), data);
}

Lazy vs eager migration

Once version-aware read/write logic is in place, there are two strategies for converting old entries.

Lazy migration (convert on read)

In lazy migration, old entries are left untouched on the ledger. When a record is read, its version is detected and it is up-converted in memory. When that record is later written back, it is stamped with the new version. No explicit migration step is needed — conversion happens as records are accessed in normal contract use.

Lazy migration is generally preferred on blockchains. Leaving old entries untouched has no upfront cost and no risk of hitting instruction or ledger-entry limits at upgrade time. Records that are never accessed again are never migrated, which is usually acceptable.

The read_data function shown above already implements lazy migration. Each time an old DataV1 entry is read and then passed to write_data, the entry is silently upgraded in place.

Eager migration (batch conversion)

In eager migration, an explicit admin function iterates all known records and rewrites them in the new format immediately after the upgrade is deployed:

pub fn migrate_all(env: &Env, ids: Vec<u32>) {
// Caller should be an authorized admin.
for id in ids.iter() {
let version: u32 = env.storage().persistent()
.get(&DataKey::DataVersion(id))
.unwrap_or(1);

if version < 2 {
// read_data up-converts to DataV2 in memory.
let migrated = read_data(&env, id);
// write_data stamps the entry as version 2.
write_data(&env, id, &migrated);
}
}
}

Eager migration is rarely practical for large datasets on Soroban. Each rewrite consumes fees and burns instructions, and a single transaction cannot migrate an unbounded number of records — the contract will hit instruction or ledger-entry limits. If the batch must span multiple transactions, the contract is in a mixed-version state throughout the window, which means version-aware read logic is still required anyway.

Eager migration is occasionally appropriate when the total number of records is small and known in advance (for example, a fixed registry of a few dozen entries), or when you need to permanently drop old version branches from the read path.

caution

Never remove a version branch from read_data while old entries of that version can still exist on the ledger. Doing so will cause any remaining old entries to trap when accessed.

Testing migrations

Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.

The Soroban test environment allows you to set storage state directly. Use this to write DataV1 entries (without a DataVersion key) and verify that read_data up-converts them correctly:

#[cfg(test)]
use super::*;
use soroban_sdk::Env;

#[test]
fn test_reads_v1_entry_as_v2() {
let env = Env::default();
let id: u32 = 42;
let contract_id = env.register(Contract, ());
let client = ContractClient::new(&env, &contract_id);

// Simulate what the old contract wrote: a DataV1 payload,
// no DataVersion entry (old contracts did not write one).
let v1_data = DataV1 { a: 10, b: 20 };
env.as_contract(&contract_id, || {
env.storage().persistent().set(&DataKey::Data(id), &v1_data);
});

let result = read_data(&env, id);

assert_eq!(result.a, 10);
assert_eq!(result.b, 20);
assert_eq!(result.c, None);
}

#[test]
fn test_reads_v2_entry_correctly() {
let env = Env::default();
let id: u32 = 99;
let contract_id = env.register(Contract, ());
let client = ContractClient::new(&env, &contract_id);

let v2_data = DataV2 { a: 1, b: 2, c: Some(3) };
write_data(&env, id, &v2_data);

let result = read_data(&env, id);

assert_eq!(result.a, 1);
assert_eq!(result.b, 2);
assert_eq!(result.c, Some(3));
}

#[test]
fn test_write_upgrades_v1_entry_to_v2() {
let env = Env::default();
let id: u32 = 7;
let contract_id = env.register(Contract, ());
let client = ContractClient::new(&env, &contract_id);

// Write a v1 entry directly, as the old contract would have.
let v1_data = DataV1 { a: 5, b: 6 };
env.as_contract(&contract_id, || {
env.storage().persistent().set(&DataKey::Data(id), &v1_data);
});

// Read it — lazy migration produces a DataV2 in memory.
let migrated = read_data(&env, id);
assert_eq!(migrated.c, None);

// Write it back — this stamps the entry as version 2.
write_data(&env, id, &migrated);

env.as_contract(&contract_id, || {
let stored_version: u32 = env.storage().persistent()
.get(&DataKey::DataVersion(id))
.unwrap();
});
assert_eq!(stored_version, 2);

// Subsequent reads should take the v2 branch.
let result = read_data(&env, id);
assert_eq!(result.a, 5);
assert_eq!(result.b, 6);
assert_eq!(result.c, None);
}

The three test cases cover the three states a record can be in after an upgrade:

  • A DataV1 entry with no version marker (pre-versioning era records)
  • A DataV2 entry written by the new contract
  • A DataV1 entry that is read and then written back (the lazy migration round-trip)

Versioned Enum Pattern

Another approach is to implement a versioned enum that can hold either a V1 or V2 data struct.

#[contracttype]
pub enum Data {
V1(DataV1),
V2(DataV2),
}

#[contracttype]
pub enum DataKey {
Data(u64),
}

Migration Logic

The migration logic enumerates the two data formats and converts V1 data to V2 format, and passes V2 format through. If it's already V1, it maps fields a and b over and sets the new c field to None (the field that was added in V2). If it's already V2, it passes through unchanged. This is a lazy migration — old data is upgraded on read, not in a bulk migration.

impl Data {
pub fn into_v2(self) -> DataV2 {
match self {
Data::V1(v1) => DataV2 { a: v1.a, b: v1.b, c: None },
Data::V2(v2) => v2,
}
}
}

Reading with version awareness

The value is read from storage and then into_v2() ensures that the returned value is in the V2 format.

pub fn read_data(e: Env, id: u32) -> Option<DataV2> {
let data_enum: Data = e.storage().persistent().get(&DataKey::Data(id))?;
Some(data_enum.into_v2())
}

Writing always uses the current version

The write function write_data() takes a data argument in the DataV2 format.

pub fn write_data(e: Env, id: u32, data: DataV2) {
e.storage().persistent().set(&DataKey::Data(id), &Data::V2(data));
}

Testing migrations

Testing data migration requires simulating state written by an old contract version and verifying that the new contract reads it correctly.

In this test data in the V1 format is first stored. Then it's read using the read_data function, which converts data in the V1 format to V2 format with into_v2() before returning the result. The result is tested with assert_eq!(), and stored with the same id as it was stored with, which means the V1 formatted data is overwritten with the same data in V2 format.

Then the data is read from storage to verify it's stored in the V2 format, and finally the data is read using the read_data() function to verify that the data is also returned in the V2 format by the read function.

#[test]
fn test_write_upgrades_v1_entry_to_v2_1() {
let env = Env::default();
let id: u32 = 7;
let contract_id = env.register(Contract, ());
let client = ContractClient::new(&env, &contract_id);

// Inject a V1 entry directly, simulating legacy on-chain state.
env.as_contract(&contract_id, || {
env.storage()
.persistent()
.set(&DataKey::Data(id), &Data::V1(DataV1 { a: 5, b: 6 }));
});

// Read it — into_v2() migrates lazily; c must be None.
let migrated = client.read_data(&id).unwrap();
assert_eq!(migrated.a, 5);
assert_eq!(migrated.b, 6);
assert_eq!(migrated.c, None);

// Write it back — write_data always stores Data::V2(...).
client.write_data(&id, &migrated);

// Confirm the stored enum variant is now V2, not V1.
let stored: Data = env.as_contract(&contract_id, || {
env.storage().persistent().get(&DataKey::Data(id))
})
.unwrap();

match stored {
Data::V2(v2) => {
assert_eq!(v2.a, 5);
assert_eq!(v2.b, 6);
assert_eq!(v2.c, None);
}
Data::V1(_) => panic!("expected Data::V2 after write_data, found Data::V1"),
}

// Subsequent reads go through the V2 branch and return identical values.
let result = client.read_data(&id).unwrap();
assert_eq!(result.a, 5);
assert_eq!(result.b, 6);
assert_eq!(result.c, None);
}