Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Data Dictionary:

Data assets:

smartinsider_YYYYMMDDHHMM.ttx - data files are delivered in a tab delimited (.ttx) format. Initially, multiple files were sent per day. However, in an attempt to help D&B combine data into a single schema, SmartInsider has opted to provide single daily refreshes only.

Data Ingestion:

  • Data files contain historical data (from August 6, 2021 to present).

  • All files should be ingested and they all belong to a single schema.

  • The work_stream_name (for source record ID) is smartinsider_altdata

  • Update type: delta/incremental

  • Update frequency: daily

  • Data ingestion approach: TBD

    • Files should be ingested in the order they appear on the SFTP.

    • As duplicates can still be found in the recent data files, we need to discuss whether we need to drop records before appending them.

Character encoding:

Files are in UTF-8 encoded. Please make sure you read files with this encoding.

Escaping logic:

Text fields are wrapped in “”. No escape logic

Excel:

If you plan to use Excel to open → save → ingest files from Excel, please ask Excel NOT to automatically detect data types, as Excel can unintentionally convert information.

Match Logic:

  • Match approach: TBD. D&B mentions that the new match logic for SmartInsider will involve exclusion criteria (additional input parameters to be sent to the API).

Snowflake Tables to Share:

  • SMARTINSIDER_RAW - for pre-DUNS review only

  • SMARTINSIDER - final

    • This table should contain all columns (including raw, audit, and all duns/match columns).

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.