Skip to content

Schema Mapping

This page describes how to configure schema mapping in a Restore.

Schema mapping allows migration of data from one environment to another that use schema registries.

With schema mapping, you can, for example, copy data from a production environment to a QA environment where there are different schema registries.

The restore process will map the schema IDs from the source environment to the schema IDs of the target environment. It will do so by looking up the schema mapping in a lookup table and replacing the magic byte that represents the schema ID in the message when restoring the data.

To use schema mapping, you need to:

  • Define or generate a schema mapping
  • Create a ConfigMap with the schema mapping
  • Reference the ConfigMap in the Restore for schema mapping

The operator will automatically configure both the Payload Schema Mapping plugin, and the Key Schema Mapping plugin with the schema mapping.

To generate a schema mapping, you can use our Schema Automated Mapping Engine (SAME) tool. SAME is a tool that can generate schema mappings between two schema registries. It does this by downloading the schemas from the source and target schema registries, indexing them using fingerprinting (hashing), and then comparing the fingerprints to generate a mapping. The tool is still in its early stages, and currently only supports Avro.

First, create a YAML file with the schema registries you want to map:

registries.yaml
registries:
- name: source
url: https://aaaa-1234.schema-registry.com
username: <API KEY> # Optional
password: <API SECRET> # Optional
- name: sink
url: https://bbbb-4567.schema-registry.com
username: <API KEY> # Optional
password: <API SECRET> # Optional

Then, to generate a schema mapping, run the following command using the quay.io/kannika/same Docker image:

Terminal window
$ docker run \
-v .:/usr/var/same \
quay.io/kannika/same:0.4.0 \
-v \
map \
--from=source \
--to=sink \
--ignore-indexing-errors \
--on-conflict=pick-first \
-o /usr/var/same/mapping.yaml \
--registries /usr/var/same/registries.yaml

In this example:

  • -v .:/usr/var/same mounts the current directory to the /usr/var/same directory in the container
  • quay.io/kannika/same:0.2.1 is the Docker image
  • -v enables verbose mode
  • map is the command that maps the schemas
  • --from specifies the source schema registry
  • --to specifies the target schema registry
  • --ignore-indexing-errors ignores indexing errors
  • -o /usr/var/same/mapping.yaml specifies the output file for the schema mapping
  • --registries /usr/var/same/registries.yaml specifies the input file with the schema registries

After running the command, you will have a mapping.yaml file with the schema mapping. Example:

mapping.yaml
mapping:
10001: 10002

Configuring a Restore to use schema mapping

Section titled “Configuring a Restore to use schema mapping”

First, create a ConfigMap with the schema mapping:

Terminal window
$ kubectl create configmap mapping --from-file=schema-mapping=mapping.yaml

This will create a ConfigMap with the following content:

schema-mapping-config-map.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: mapping
data:
schema-mapping: |
mapping:
# Map from schema 10001 to schema 10002
10001: 10002

Finally, set the schema mapping in the Restore by setting the .spec.config.schemaMappingFrom field to load the mapping from the field in the ConfigMap, using a ConfigMapKeySelector.

apiVersion: kannika.io/v1alpha
kind: Restore
metadata:
name: restore
spec:
sink: "sink"
source: "source"
config:
schemaMappingFrom:
configMapKeyRef:
name: mapping # Name of the ConfigMap
key: schema-mapping # Name of the field in the ConfigMap

The operator will validate the schema mapping. To check if the schema mapping is valid, check the SchemaMappingValidated condition in the status of the Restore resource.

Terminal window
$ kubectl get restore [NAME] -o jsonpath='{.status.conditions[?(@.type=="SchemaMappingValidated")]}'

What if no schema mapping is found for a schema ID?

Section titled “What if no schema mapping is found for a schema ID?”

In case the Restore encounters a schema ID that is not found in the schema mapping, it will skip the schema mapping for that schema ID and restore the data as is.

When the record has a null payload (e.g. tombstone records), the record will be restored as-is without a payload and no schema mapping will be performed on the payload.

When the record has a null key, the record will be restored as-is without a key and no schema mapping will be performed on the key.