Introduction
In today’s data-driven world, protecting sensitive information during replication is crucial for compliance with regulations like GDPR, CCPA, and PCI-DSS. Oracle GoldenGate provides powerful transformation capabilities that allow you to mask sensitive data at the replication level, ensuring that personally identifiable information (PII) never reaches non-production or other production environments in its original form.
In this post, I’ll demonstrate how to implement real-time data masking for Social Security Numbers (SSN), phone numbers, and email addresses using Oracle GoldenGate 23ai’s SQLEXEC functionality.
Business Requirement
In my example there is a need to replicate a critical CUSTOMERS table containing sensitive PII to a test environment. The requirements were:
- SSN: Show only last 4 digits (format: XXX-XX-1234)
- Phone: Show only last 4 digits (format: XXX-XXX-1234)
- Email: Mask username while preserving domain (format: tes***@example.com)
- All other columns: Replicate unchanged
The Solution Architecture
Oracle GoldenGate’s SQLEXEC feature allows us to execute SQL queries during the replication process, making it perfect for data transformation and masking. The key is to implement masking at the Replicat level rather than Extract, which:
- Keeps trail files reversible for troubleshooting
- Maintains flexibility for multiple targets with different masking requirements
- Ensures masked data never touches the target database in clear text
Table Structure
Here’s our source table structure:
CREATE TABLE HR.CUSTOMERS (
CUSTOMER_ID NUMBER GENERATED BY DEFAULT AS IDENTITY
CONSTRAINT PK_CUSTOMERS PRIMARY KEY,
FIRST_NAME VARCHAR2(50),
LAST_NAME VARCHAR2(50),
EMAIL VARCHAR2(320),
PHONE VARCHAR2(40),
SSN VARCHAR2(11), -- e.g. 123-45-6789
ADDRESS1 VARCHAR2(120),
ADDRESS2 VARCHAR2(120),
CITY VARCHAR2(60),
STATE_CODE VARCHAR2(10),
POSTAL_CODE VARCHAR2(20),
COUNTRY VARCHAR2(2) DEFAULT 'US',
CREATED_AT TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL,
UPDATED_AT TIMESTAMP
);
Implementation
Step 1: Replicat Parameter File
Create a Replicat parameter file with SQLEXEC-based masking, this is where the magic happens:
REPLICAT RWEST
USERIDALIAS EAST DOMAIN OracleGoldenGate
DDL INCLUDE MAPPED
REPERROR (DEFAULT, DISCARD)
MAP HR.CUSTOMERS, TARGET HR.CUSTOMERS, &
SQLEXEC (ID mask_ssn, &
QUERY "SELECT CASE WHEN :ssn_val IS NULL THEN NULL ELSE 'XXX-XX-' || SUBSTR(:ssn_val, -4) END as masked_ssn FROM DUAL", &
PARAMS(ssn_val = SSN)), &
SQLEXEC (ID mask_email, &
QUERY "SELECT CASE WHEN :email_val IS NULL THEN NULL ELSE SUBSTR(:email_val, 1, 3) || '***@' || SUBSTR(:email_val, INSTR(:email_val, '@') + 1) END as masked_email FROM DUAL", &
PARAMS(email_val = EMAIL)), &
SQLEXEC (ID mask_phone, &
QUERY "SELECT CASE WHEN :phone_val IS NULL THEN NULL ELSE 'XXX-XXX-' || SUBSTR(:phone_val, -4) END as masked_phone FROM DUAL", &
PARAMS(phone_val = PHONE)), &
COLMAP (USEDEFAULTS, &
SSN = mask_ssn.masked_ssn, &
EMAIL = mask_email.masked_email, &
PHONE = mask_phone.masked_phone &
);
-- Other tables without masking
MAP HR.EMPLOYEES, TARGET HR.EMPLOYEES;
MAP HR.DEPARTMENTS, TARGET HR.DEPARTMENTS;
Step 2: Understanding the SQLEXEC Syntax
Let’s break down the SSN masking logic:
SQLEXEC (ID mask_ssn,
QUERY "SELECT CASE
WHEN :ssn_val IS NULL THEN NULL
ELSE 'XXX-XX-' || SUBSTR(:ssn_val, -4)
END as masked_ssn
FROM DUAL",
PARAMS(ssn_val = SSN))
Key components:
- ID mask_ssn: Unique identifier for this SQLEXEC call
- QUERY: The SQL to execute for each row
- :ssn_val: Bind parameter placeholder
- PARAMS(ssn_val = SSN): Maps the source column SSN to the bind parameter
- CASE WHEN: Handles NULL values gracefully
- SUBSTR(:ssn_val, -4): Extracts last 4 characters
The result is referenced in COLMAP as mask_ssn.masked_ssn.
Step 3: Create and Start the Replicat
In GoldenGate 23ai, I use the adminclient command-line interface:
$ adminclient
OGG (not connected) 1> CONNECT https://localhost:9090 DEPLOYMENT east AS oggadmin PASSWORD ********!
OGG (https://localhost:9100 oggadmin) 2> DBLOGIN USERIDALIAS EAST DOMAIN OracleGoldenGate
OGG (https://localhost:9100 oggadmin as ggadmin@FREEPDB1) 3> ADD REPLICAT RWEST, PARALLEL, EXTTRAIL dw checkpointtable ggadmin.ggcheckpoint
OGG (https://localhost:9100 oggadmin as ggadmin@FREEPDB1) 4> START RWEST
OGG (https://localhost:9100 oggadmin as ggadmin@FREEPDB1) 5> INFO RWEST
Critical Lesson: Parallel Replicat Issues
Important Discovery: During implementation, I encountered an issue where the Replicat would process some records successfully but then appear to hang or stop processing new changes. The root cause was Parallel Replicat mode.
The Problem
When using SQLEXEC with Parallel Replicat:
- Multiple applier threads execute SQLEXEC calls simultaneously
- Each thread creates separate database sessions
- This can cause contention and synchronization issues
- Records may be processed out of order or get stuck
The Solution
Use Non-Integrated (Serial) Replicat for SQLEXEC-based transformations:
OGG (https://localhost:9100 oggadmin) 1> INFO RWEST DETAIL
Verify the output shows:
Type: REPLICAT
Status: RUNNING
Mode: Non-Integrated (Serial)
If you see “Parallel” mode, the Replicat was created incorrectly. Delete and recreate it:
OGG (https://localhost:9100 oggadmin) 1> CONNECT https://localhost:9090 DEPLOYMENT east AS oggadmin
OGG (https://localhost:9100 oggadmin) 2> STOP REPLICAT RWEST
OGG (https://localhost:9100 oggadmin) 3> DELETE REPLICAT RWEST
OGG (https://localhost:9100 oggadmin) 4> DBLOGIN USERIDALIAS EAST DOMAIN OracleGoldenGate
OGG (https://localhost:9100 oggadmin as ggadmin@FREEPDB1) 5> ADD REPLICAT RWEST, EXTTRAIL dw checkpointtable ggadmin.ggcheckpoint
OGG (https://localhost:9100 oggadmin as ggadmin@FREEPDB1) 6> START RWEST
Testing the Masking
Why This Test Dataset?
Before implementing data masking in production, it’s critical to validate that your masking logic handles real-world data variations. I deliberately designed a comprehensive test dataset that covers:
Email Variations:
- Standard formats (
john.doe@oracle.com) - Plus addressing for filters (
alice.ng+promo@gmail.com) - Mixed case domains (
m.gomez@Example.ORG) - Subdomains and multi-level TLDs (
sally.sue@sub.example.co.uk) - Very long local parts with multiple dots
- ALL CAPS usernames (
CHEN.WEI@uni.edu.cn) - Malformed emails without @ symbols (
BOB,kate@) - NULL values
Phone Number Formats:
- US formats with parentheses:
(713) 555-1234 - International formats with country codes:
+44 20 7946 0958,+86 10 8888 9999 - Extensions:
415-555-0123 x99,713-555-0000 x207 - Dots instead of dashes:
713.555.0099 - Spaces:
(202) 555 9988 - Very short numbers (edge case):
555-7 - Empty or malformed:
--,'' - NULL values
SSN Variations:
- Standard format with dashes:
123-45-6789 - No dashes (9 digits):
987654321 - Partial/malformed:
12-345-678,123-45-67 - All zeros:
000-00-0000 - Empty strings:
'' - NULL values
Why These Edge Cases Matter:
- Real production data is messy – Users enter data in various formats
- International operations – Data comes from multiple countries with different conventions
- Legacy systems – Historical data may not follow current standards
- Prevent replication failures – Masking logic must handle edge cases without breaking
- Compliance validation – Auditors want proof that ALL PII is masked, even malformed data
Comprehensive Test Dataset
To thoroughly test the masking implementation, I created a diverse dataset with various edge cases:
-- Standard US format with parentheses and spaces
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('John', 'Doe', 'John.Doe@Oracle.com', '(713) 555-1234', '123-45-6789', '100 Main St', 'Houston', 'TX', '77002', 'US');
-- Email with plus addressing and extension in phone
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Alice', 'Ng', 'ALICE.NG+promo@gmail.com', '415-555-0123 x99', '987654321', '1 Market St', 'San Francisco', 'CA', '94105', 'US');
-- International phone (UK) with NULL SSN
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Sally', 'Brown', 'sally.sue@sub.example.co.uk', '+44 20 7946 0958', NULL, '221B Baker St', 'London', NULL, 'NW1 6XE', 'GB');
-- Malformed email (no @ symbol) to test fallback
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Bob', 'Smith', 'BOB', '713.555.0099', '12-345-678', '200 Oak Ridge', 'Houston', 'TX', '77007', 'US');
-- Mixed case email domain
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Maria', 'Gomez', 'm.gomez@Example.ORG', '(202) 555 9988', '111-22-3333', '1600 Penn Ave', 'Washington', 'DC', '20500', 'US');
-- NULL email, short phone (edge case)
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Ken', 'Ito', NULL, '555-7', '222334444', '5 Chiyoda', 'Tokyo', NULL, '100-8111', 'JP');
-- Plus addressing with mixed case domain
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Priya', 'Rao', 'Priya.Rao+DataTeam@Company.COM', '+1 (650) 555-7777 x123', '000-00-0000', '500 Tech Way', 'Austin', 'TX', '78701', 'US');
-- International formats
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Chen', 'Wei', 'CHEN.WEI@uni.edu.cn', '+86 10 8888 9999', '333-22-1111', '88 Academic Rd', 'Beijing', NULL, '100084', 'CN');
-- NULL phone
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Emma', 'Johnson', 'emma.j@sample.net', NULL, '123-45-67', '77 Elm Street', 'Chicago', 'IL', '60601', 'US');
-- All NULL optional fields
INSERT INTO HR.CUSTOMERS
(FIRST_NAME, LAST_NAME, EMAIL, PHONE, SSN, ADDRESS1, ADDRESS2, CITY, STATE_CODE, POSTAL_CODE, COUNTRY)
VALUES
('Zoe', 'Null', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'US');
COMMIT;
Verify Masked Data
Query the source database:
-- Source Database (EAST) - Original Data
SELECT
CUSTOMER_ID,
FIRST_NAME,
LAST_NAME,
EMAIL,
PHONE,
SSN,
CITY,
STATE_CODE
FROM HR.CUSTOMERS
ORDER BY CUSTOMER_ID DESC;
Query the target database:
-- Target Database (WEST) - Masked Data
SELECT
CUSTOMER_ID,
FIRST_NAME,
LAST_NAME,
EMAIL,
PHONE,
SSN,
CITY,
STATE_CODE
FROM HR.CUSTOMERS
ORDER BY CUSTOMER_ID DESC;
Results Comparison
Source Database (Original Data):

The source database contains the original sensitive information in clear text. Notice the complete SSN, phone numbers, and email addresses.
Target Database (Masked Data):

The target database shows the successfully masked data. All sensitive fields are protected while maintaining data utility for testing and development.
Key Observations from Testing
Our comprehensive testing revealed that the masking logic handles various edge cases correctly:
| Scenario | Source Example | Target Result | Status |
|---|---|---|---|
| Standard SSN | 123-45-6789 | XXX-XX-6789 | ✅ Pass |
| SSN without dashes | 987654321 | XXX-XX-4321 | ✅ Pass |
| NULL SSN | NULL | NULL | ✅ Pass |
| Empty SSN | '' | XXX-XX- | ✅ Pass |
| Standard phone | (713) 555-1234 | XXX-XXX-1234 | ✅ Pass |
| International phone | +44 20 7946 0958 | XXX-XXX-0958 | ✅ Pass |
| Phone with extension | 415-555-0123 x99 | XXX-XXX-x99 | ✅ Pass |
| Short phone | 555-7 | XXX-XXX-555-7 | ✅ Pass |
| NULL phone | NULL | NULL | ✅ Pass |
| Standard email | john.doe@oracle.com | joh***@oracle.com | ✅ Pass |
| Plus addressing | alice.ng+promo@gmail.com | ali***@gmail.com | ✅ Pass |
| Mixed case email | CHEN.WEI@uni.edu.cn | CHE***@uni.edu.cn | ✅ Pass |
| Subdomain email | sally.sue@sub.example.co.uk | sal***@sub.example.co.uk | ✅ Pass |
| Malformed email (no @) | BOB | BOB | ✅ Pass (unchanged) |
| NULL email | NULL | NULL | ✅ Pass |
Important Notes:
- Non-US addresses and phone formats are preserved but still masked consistently
- The masking maintains the last 4 characters regardless of the original format
- NULL values are handled gracefully without errors
- Malformed data (like emails without @) passes through unchanged, preventing replication failures
- International character sets (accented names like López, Hämäläinen) replicate correctly
Performance Considerations
Trade-offs
- Serial Replicat: More stable for SQLEXEC but single-threaded
- Parallel Replicat: Better throughput but incompatible with complex SQLEXEC (During this test)
Optimization Strategies
- Table-level Parallelism: Create separate Replicats for different tables
REPLICAT RWEST_CUSTOMERS -- Handles HR.CUSTOMERS with masking
REPLICAT RWEST_OTHER -- Handles other tables in parallel
- Optimize SQLEXEC Queries: Keep queries simple and avoid complex subqueries
Alternative Approaches
Using Built-in GoldenGate Functions
For simpler masking, you can use GoldenGate’s built-in functions:
MAP HR.CUSTOMERS, TARGET HR.CUSTOMERS,
COLMAP (USEDEFAULTS,
SSN = 'XXX-XX-' || @STREXT(SSN, -4, 4),
PHONE = 'XXX-XXX-' || @STREXT(PHONE, -4, 4)
);
Pros: No database round-trip, potentially faster
Cons: Less flexible, harder to handle complex logic and NULL values
Using Oracle Data Redaction
For Extract-level masking:
- Configure Oracle Data Redaction policies on source tables
- Extract reads already-masked data
- More resource-intensive on source database
Best Practices
- Always handle NULL values in your SQLEXEC queries using CASE statements
- Test thoroughly with edge cases (empty strings, special characters, various formats)
- Monitor performance – SQLEXEC adds latency due to database calls
- Use REPERROR and DISCARDFILE to capture any records that fail masking
- Verify compliance – Ensure your masking meets regulatory requirements (PCI-DSS typically requires all but last 4 digits masked)
- Document your masking rules for audit purposes
- Apply masking at Replicat level, not Extract, for flexibility
- Avoid Parallel Replicat when using multiple SQLEXEC calls
Security Considerations
- Trail files contain unmasked data – Ensure proper encryption (
ENCRYPTTRAIL AES256) - Source database has clear data – Implement proper access controls
- Test in isolated environment first to prevent accidental exposure
- Audit trail access – Monitor who can read trail files
- Key management – Use Oracle Wallet for credential storage
Troubleshooting
Check Replicat Status
Using adminclient in GoldenGate 23ai:
$ adminclient
OGG (not connected) 1> CONNECT https://localhost:9100 DEPLOYMENT oggadmin AS oggadmin
OGG (https://localhost:9100 oggadmin) 2> INFO RWEST DETAIL
OGG (https://localhost:9100 oggadmin) 3> STATS RWEST
OGG (https://localhost:9100 oggadmin) 4> VIEW REPORT RWEST
Conclusion
Oracle GoldenGate’s SQLEXEC feature provides a powerful mechanism for implementing real-time data masking during replication. By applying transformations at the Replicat level, you can ensure sensitive data is protected before it reaches target systems, all while maintaining the flexibility to apply different masking rules to different targets.
The key lessons learned:
- SQLEXEC is ideal for complex masking logic
- Serial (Non-Integrated) Replicat is essential for stability with multiple SQLEXEC calls
- Proper NULL handling and error management are critical
- Performance trade-offs exist but can be managed through architecture choices
This approach has enabled some of my customers to safely replicate production data to test environments while maintaining full compliance with data protection regulations.
Additional Resources
- Oracle GoldenGate Documentation
- GoldenGate Data Transformation with SQLEXEC Guide
- PCI-DSS Requirements

Leave a comment