Two data assets:
DnB_BusIntel_*.csv.gz - BusIntel is budgets and tech totals.
TechInstall_*.csv.gz - Install data.
Data Update:
Update Frequency = Quarterly (next update date = around 08/15/2021)
Update type = Full refresh
File size - Bus_Intel: 17,718,128 records ; 21,077,678 KB uncompressed ; 4,478,457 KB .zip compressed; Tech_Install: 600,782,436 records ; 84,363,824 KB uncompressed; 6,036,103 KB .zip compressed
Character Encoding and Character Escaping:
Special characters exist in the data files. Please make sure you check for the correct character encoding before ingesting any data.
There are no character escaping in the data files.
Data Dictionary:
D&B does not have the data dictionary for the actual datasets, as these are layouts specific for D&B. Here is the most recent data dictionary that Aberdeen had, which should serve as a reference on what MOST of the fields are in the data files:
Please see the last two tabs.
We do not need to merge the layouts. There will be two separate layouts/tables.
If a field is listed in the data dictionary but not in the data files, don’t worry about them. We are only ingesting the fields from the data files.
Also ignore the Field Order they used in the data dictionary.
If a field is listed in the data file but not in the data dictionary, we will need to keep them. However, we will need to guestimate what the data types should be.
Match Logic:
No dunsification required for this dataset as of 06/01/2021. All DUNS information are already IN the dataset. Explanation provided by D&B below: “Aberdeen is using an existing legacy process for DUNS Numbering their file. We expect this process to continue for the near future, and this DUNS numbered file will be shared with Knoema. We can defer match processing for Aberdeen to a future date - TBD.”
Add Comment