SmartInsider - Global Insider Transaction Data
Data Dictionary:
Data assets:
smartinsider_YYYYMMDDHHMM.ttx - data files are delivered in a tab delimited (.ttx) format. Initially, multiple files were sent per day. However, in an attempt to help D&B combine data into a single schema, SmartInsider has opted to provide single daily refreshes only.
Data Ingestion:
Data files contain historical data (from August 6, 2021 to present).
All files should be ingested and they all belong to a single schema.
The work_stream_name (for source record ID) is smartinsider_altdata
Update type: delta/incremental
Update frequency: daily
Data ingestion approach: TBD
Please skip the fields “LastSignalUpdate” and “DeliveryDateTime” if you see them in any raw data files. These fields should not be included in our Snowflake tables.
Files should be ingested in the order they appear on the SFTP.
As duplicates can still be found in the recent data files, we need to discuss whether we need to drop records before appending them. And if dropping is required, what fields should constitute as unique records.
Character encoding:
Files are in UTF-8 encoded. Please make sure you read files with this encoding.
Escaping logic:
Text fields are wrapped in “”. No escape logic
Excel:
If you plan to use Excel to open → save → ingest files from Excel, please ask Excel NOT to automatically detect data types, as Excel can unintentionally convert information.
Match Logic:
Match approach: TBD. D&B mentions that the new match logic for SmartInsider will involve exclusion criteria (additional input parameters to be sent to the API).
Snowflake Tables to Share:
SMARTINSIDER_RAW - for pre-DUNS review only
SMARTINSIDER - final
This table should contain all columns (including raw, audit, and all duns/match columns).