Help Center

Third-party software Contact us

Data discovery rules

Data discovery rules use regular expressions (regex) and keywords to define what type of sensitive content is to be located. Users can use either the built-in data discovery rules or create new rules to locate credit card numbers, social security numbers, email addresses, and more.

There are two steps to locating sensitive data during scans:

  • Creating data discovery rules.
  • Mapping the rules to a data discovery policy.

They must both be done to define what data to look for, and to instruct DataSecurity Plus to scan for that data during the data discovery process.

Creating new data discovery rules

While default rules cannot be edited, you can create new data discovery rules to locate organization or industry-specific information using regex patterns or keyword sets. The Regular Expression option is used when trying to locate a file with content that matches a pattern of text, numbers, or special characters. While the Keyword Set option is used when an exact phrase needs to be located within your data stores.

To create new data discovery policies, follow these steps:

  • Select Risk Analysis from the applications drop-down.
  • Go to Configuration > Data Discovery Settings > Rules.
  • Click the Add Rule button at the top-right corner.
  • Name the rule and include an appropriate description.
  • Choose from the Rule Match Type options. You can select either Regular Expression or Keyword Set.
  • Depending on the Rule Match Type, enter the respective details to identify your target sensitive data:
    • Regular Expression: Expand the Regular Expression field. Add the regex pattern and click the Add button. Repeat the step to add more regular expressions. Click Close to save the value.
    • Keyword Set: Expand the Keywords field. Select Add Keyword in the top-left corner. Add the keywords you wish to scan for and click Add. Click Close to save the values. Note that keyword sets are case-insensitive
  • Note: Sensitive data instances will be reported even if any one of the configured regular expressions or keywords have been detected.

    Best Practice: When defining Regular Expressions and Keyword Sets, determine a targeted and finite set of rule matches to fine-tune data discovery scans to the best possible extent.

  • Under Threshold Value, specify the minimum number of times a rule has to be matched in a single file in order to be reported. For example, if the threshold value in a rule is 5, then DataSecurity Plus will report only those files whose content matched with that rule 5 or more times.
  • Specify rule tags to enable easy identification when setting policies or modifying rules.
    • Expand the Tags field to select tags from the available tag categories.
    • Note: The Configured Rules page and the rule catalog that pops up during policy configuration will display tag filters at the top for quick filtering.

    • Create a custom tag by clicking the Create Custom Tag option within the expanded Tags drop-down. Provide the Tag Name and click Add. You can view this tag in the Custom category in the Tags drop-down.
  • Note: You can only add up to 3 tags for a rule. You cannot modify the tags on default rules.

  • Click Save. Ensure that you map your rules to a policy to activate them in scans.

Constructing regular expressions

Regular expressions are useful to locate content that matches a pattern of text, numbers, and/or special characters. While DataSecurity Plus provides a builtin library of rules, you might need to construct your own regular expressions for custom rules.

Example: Regex to find URLs in files.

Conditions for target strings RegEx pattern
Starts with http or https https?
Followed by :// :/\/\
Should then include www. www\.
Followed by a subdomain name with a character count from 2 to 253.

The domain name can include alphanumeric and special characters.

[a-zA-Z0-9$\-_.+!*'(),]{2,253}\.
Finished with the main domain name, with a character count from 2 to 6.

The domain name can include alphanumeric and a few specific, safe special characters.

[a-zA-Z0-9$\-_.+!*'(),]{2,6}

So, https://www.google.com will have this regex: https?:\/\/(www\.){1}[a-zA-Z0-9$\-_.+!*'(),]{2,253}\.[a-zA-Z0-9$\-_.+!*'(),]{2,6}

For more information on how to construct a regex, check out the regex guide.

Mapping rules to data discovery policies

Data discovery rules must be added to a policy to activate them in scans. Follow these steps to map rules to data discovery policies:

  • Go to Configuration > Policy. Select an existing policy or click Add Policy and proceed to create a new policy.
  • Click Add Rule.
  • In the Add Rule to the Policy popup, you can filter the list of available rules by the country they apply to, the type of data they identify, the compliance regulation they apply to, or by a custom tag.
  • Check the rules you want to add to the target policy and click Select to save the option.

Note: When modifying an existing data discovery policy, initiate a rescan to reflect the change. Find the steps to initiate a rescan in the scan configuration page.

Example: To map all credit card-based rules to a data discovery policy, you can follow the steps below:

  • After providing policy name and description, select Add rule to open the rule catalog.
  • Click the Information type filter and select the Payment Card tag. All the rules tagged as payment card data will be displayed.
  • Select the credit card rules and click Select. Finally click Save to map the rules to the policy.

Learn more about data discovery policies on the Policy Configuration help page.

Don't see what you're looking for?

  • Visit our community

    Post your questions in the forum.

     
  • Request additional resources

    Send us your requirements.