Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Data Dictionary:

Data assets:

smartinsider_YYYYMMDDHHMM.ttx - data files are delivered in a tab delimited (.ttx) format. Initially, multiple files were sent per day. However, in an attempt to help D&B combine data into a single schema, SmartInsider has opted to provide single daily refreshes only.

Data Ingestion:

  • Data files contain historical data (from August 6, 2021 to present).

  • All files should be ingested and they all belong to a single schema.

  • The work_stream_name (for source record ID) is smartinsider_altdata

  • Update type: delta/incremental

  • Update frequency: daily

  • Data ingestion approach: TBD

    • Please skip the fields “LastSignalUpdate” and “DeliveryDateTime” if you see them in any raw data files. These fields should not be included in our Snowflake tables.

    • Files should be ingested in the order they appear on the SFTP.

    • As duplicates can still be found in the recent data files, we need to discuss whether we need to drop records before appending them.

Character encoding:

Files are in UTF-8 encoded. Please make sure you read files with this encoding.

Escaping logic:

Text fields are wrapped in “”. No escape logic

Excel:

If you plan to use Excel to open → save → ingest files from Excel, please ask Excel NOT to automatically detect data types, as Excel can unintentionally convert information.

Match Logic:

  • Match approach: TBD. D&B mentions that the new match logic for SmartInsider will involve exclusion criteria (additional input parameters to be sent to the API).

Snowflake Tables to Share:

  • SMARTINSIDER_RAW - for pre-DUNS review only

  • SMARTINSIDER - final

    • This table should contain all columns (including raw, audit, and all duns/match columns).

  • No labels