Skip to content

    Schema Mapping

    This page describes how to configure schema mapping in a Restore.

    Schema Mapping

    Schema mapping allows migration of data from one environment to another that use schema registries.

    With schema mapping, you can, for example, copy data from a production environment to a QA environment where there are different schema registries.

    The restore process will map the schema IDs from the source environment to the schema IDs of the target environment. It will do so by looking up the schema mapping in a lookup table and replacing the magic byte that represents the schema ID in the message when restoring the data.

    To use schema mapping, you need to:

    • Define or generate a schema mapping
    • Create a ConfigMap with the schema mapping
    • Reference the ConfigMap in the Restore for schema mapping

    The operator will automatically configure both the Payload Schema Mapping plugin, and the Key Schema Mapping plugin with the schema mapping.

    Generating a schema mapping using SAME

    To generate a schema mapping, you can use our Schema Automated Mapping Engine (SAME) tool. SAME is a tool that can generate schema mappings between two schema registries. It does this by downloading the schemas from the source and target schema registries, indexing them using fingerprinting (hashing), and then comparing the fingerprints to generate a mapping. The tool is still in its early stages, and currently only supports Avro.

    First, create a YAML file with the schema registries you want to map:

    registries.yaml
    registries:
    - name: source
    url: https://aaaa-1234.schema-registry.com
    username: <API KEY> # Optional
    password: <API SECRET> # Optional
    - name: sink
    url: https://bbbb-4567.schema-registry.com
    username: <API KEY> # Optional
    password: <API SECRET> # Optional

    Then, to generate a schema mapping, run the following command using the quay.io/kannika/same Docker image:

    Terminal window
    $ docker run \
    -v .:/usr/var/same \
    quay.io/kannika/same:0.2.1 \
    -v \
    map \
    --from=source \
    --to=sink \
    --ignore-indexing-errors \
    -o /usr/var/same/mapping.yaml \
    --registries /usr/var/same/registries.yaml

    In this example:

    • -v .:/usr/var/same mounts the current directory to the /usr/var/same directory in the container
    • quay.io/kannika/same:0.2.1 is the Docker image
    • -v enables verbose mode
    • map is the command that maps the schemas
    • --from specifies the source schema registry
    • --to specifies the target schema registry
    • --ignore-indexing-errors ignores indexing errors
    • -o /usr/var/same/mapping.yaml specifies the output file for the schema mapping
    • --registries /usr/var/same/registries.yaml specifies the input file with the schema registries

    After running the command, you will have a mapping.yaml file with the schema mapping. Example:

    mapping.yaml
    mapping:
    10001: 10002

    Configuring a Restore to use schema mapping

    First, create a ConfigMap with the schema mapping:

    Terminal window
    $ kubectl create configmap mapping --from-file=schema-mapping=mapping.yaml

    This will create a ConfigMap with the following content:

    schema-mapping-config-map.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: mapping
    data:
    schema-mapping: |
    mapping:
    # Map from schema 10001 to schema 10002
    10001: 10002

    Finally, set the schema mapping in the Restore by setting the .spec.config.schemaMappingFrom field to load the mapping from the field in the ConfigMap, using a ConfigMapKeySelector.

    apiVersion: kannika.io/v1alpha
    kind: Restore
    metadata:
    name: restore
    spec:
    sink: "sink"
    source: "source"
    config:
    schemaMappingFrom:
    configMapKeyRef:
    name: mapping # Name of the ConfigMap
    key: schema-mapping # Name of the field in the ConfigMap

    The operator will validate the schema mapping. To check if the schema mapping is valid, check the SchemaMappingValidated condition in the status of the Restore resource.

    Terminal window
    $ kubectl get restore [NAME] -o jsonpath='{.status.conditions[?(@.type=="SchemaMappingValidated")]}'

    What if no schema mapping is found for a schema ID?

    In case the Restore encounters a schema ID that is not found in the schema mapping, it will skip the schema mapping for that schema ID and restore the data as is.