Schema Mapping

This page describes how to configure schema mapping in a Restore.

Schema Mapping

Schema mapping allows migration of data from one environment to another that use schema registries.

With schema mapping, you can, for example, copy data from a production environment to a QA environment where there are different schema registries.

The restore process will map the schema IDs from the source environment to the schema IDs of the target environment. It will do so by looking up the schema mapping in a lookup table and replacing the magic byte that represents the schema ID in the message when restoring the data.

To use schema mapping, you need to:

Define or generate a schema mapping
Create a ConfigMap with the schema mapping
Reference the ConfigMap in the Restore for schema mapping

The operator will automatically configure both the Payload Schema Mapping plugin, and the Key Schema Mapping plugin with the schema mapping.

Generating a schema mapping using SAME

To generate a schema mapping, you can use our Schema Automated Mapping Engine (SAME) tool. SAME is a tool that can generate schema mappings between two schema registries. It does this by downloading the schemas from the source and target schema registries, indexing them using fingerprinting (hashing), and then comparing the fingerprints to generate a mapping. The tool is still in its early stages, and currently only supports Avro.

First, create a YAML file with the schema registries you want to map:

registries:
- name: source
  url: https://aaaa-1234.schema-registry.com
  username: <API KEY> # Optional
  password: <API SECRET> # Optional
- name: sink
  url: https://bbbb-4567.schema-registry.com
  username: <API KEY> # Optional
  password: <API SECRET> # Optional

Then, to generate a schema mapping, run the following command using the quay.io/kannika/same Docker image:

$ docker run \
  -v .:/usr/var/same \
  quay.io/kannika/same:0.2.1 \
  -v \
  map \
  --from=source \
  --to=sink \
  --ignore-indexing-errors \
  -o /usr/var/same/mapping.yaml \
  --registries /usr/var/same/registries.yaml

In this example:

-v .:/usr/var/same mounts the current directory to the /usr/var/same directory in the container
quay.io/kannika/same:0.2.1 is the Docker image
-v enables verbose mode
map is the command that maps the schemas
--from specifies the source schema registry
--to specifies the target schema registry
--ignore-indexing-errors ignores indexing errors
-o /usr/var/same/mapping.yaml specifies the output file for the schema mapping
--registries /usr/var/same/registries.yaml specifies the input file with the schema registries

After running the command, you will have a mapping.yaml file with the schema mapping. Example:

mapping:
  10001: 10002

Configuring a Restore to use schema mapping

First, create a ConfigMap with the schema mapping:

$ kubectl create configmap mapping --from-file=schema-mapping=mapping.yaml

This will create a ConfigMap with the following content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: mapping
data:
  schema-mapping: |
    mapping:
      # Map from schema 10001 to schema 10002
      10001: 10002

Finally, set the schema mapping in the Restore by setting the .spec.config.schemaMappingFrom field to load the mapping from the field in the ConfigMap, using a ConfigMapKeySelector.

apiVersion: kannika.io/v1alpha
kind: Restore
metadata:
  name: restore
spec:
  sink: "sink"
  source: "source"
  config:
    schemaMappingFrom:
      configMapKeyRef:
        name: mapping # Name of the ConfigMap
        key: schema-mapping # Name of the field in the ConfigMap

The operator will validate the schema mapping. To check if the schema mapping is valid, check the SchemaMappingValidated condition in the status of the Restore resource.

$ kubectl get restore [NAME] -o jsonpath='{.status.conditions[?(@.type=="SchemaMappingValidated")]}'

What if no schema mapping is found for a schema ID?

In case the Restore encounters a schema ID that is not found in the schema mapping, it will skip the schema mapping for that schema ID and restore the data as is.

What if the key or payload has no value?

When the record has a null payload (e.g. tombstone records), the record will be restored as-is without a payload and no schema mapping will be performed on the payload.

When the record has a null key, the record will be restored as-is without a key and no schema mapping will be performed on the key.