Loading Flat Files Into a Database For Analytic Insights
Here at Zuar, we help companies of all sizes and across all verticals stage data for analytics. Every company uses many different applications to manage their daily operations, but they also collect and store data in various manners. We encounter all sorts of file formats and structures out in the real world. Although they may seem simple to work with, flat files can often create interesting situations to overcome.
What is a flat file?
So let's start with some basics. What is a flat file? Here are some examples.
CSV
One of the most common types of flat file types is the CSV (Comma-Separated Values) file. In a CSV file, table data is represented by lines of ASCII text. The value of each table cell is separated by a delimiter (in this case a comma) and each row is represented by a new line.
Delimited
A delimited file is a sequential file with column delimiters. Each delimited file is a stream of records, which consists of fields that are ordered by column. Each record contains fields for one row. Within each row, individual fields are separated by column delimiters.
A delimiter is one or more characters that separate text strings. Common delimiters are commas (,), semicolon (;), quotes ( ", ' ), braces ({}), pipes (|), or slashes (/ \). When a program stores sequential or tabular data, it delimits each item of data with a predefined character.
Excel
Microsoft Excel is the leading spreadsheet software program, and can be a powerful data visualization and analysis tool. Excel spreadsheets let you create, view, edit and share your files with others quickly and easily.
JSON
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent, but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.
JSON is built on two structures:
- A collection of name/value pairs: In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
- An ordered list of values: In most languages, this is realized as an array, vector, list, or sequence.
XML
XML (Extensible Markup Language) is a markup language similar to HTML, but without predefined tags to use. Instead, you define your own tags designed specifically for your needs. This is a powerful way to store data in a format that can be stored, searched, and shared.
How to Extract Data From a Flat File
There are many different ways to import a CSV file, since each type of database is different. For example, here's how to import CSV into a PostgreSQL table. Below are more examples.
Using SQL Server Import and Export Wizard
To start the SQL Server Import and Export Wizard, use one of the following methods shown in this How to import/export data to SQL Server using the SQL Server Import and Export Wizard page.
Starting the SQL Server Import and Export Wizard from SQL Server Management Studio (SSMS)
- In SQL Server Management Studio, connect to an instance of the SQL Server Database Engine
- Expand 'Databases'
- Right-click a database
- Point to 'Tasks'
- Click one of the following options: 'Import Data' or 'Export Data'
Using an API
An API is a type of web service. When it uses REST API software architecture, the web service is called a RESTful web service.
Google Cloud APIs in the Google Cloud Platform use a JSON REST API interface or a gRPC interface to make API calls. You can also use HTML commands in a REST API.
Manually Import With an API
- Open the application from which to extract data.
- Select 'Consolidation', then 'Extract', and then 'Data Source'.
- For 'Extract Destination Type', select 'Flat File'.
- For 'Extract Format' select a format: 'Without Header' or 'With Header'. Then select the 'Delimiter' option.
Flat File Q&A
Why use a flat file at all?
Many software tools, like QuickBooks, export data in flat file formats. So utilizing the data saved within those systems will require the use of flat files when a more robust integration isn't in place.
Why are ANY of these various file formats differences important?
As companies expand, departments venture off into the own methods of data collection: HR has its own data sources, Accounting has others, Sales has it's own as well. Most CRM tools export into one of the above mentioned file formats, but what you have done ultimately is create a company-wide data silo issue. Each department has its own way of naming variables and how they format each of them. What if you needed combined analytics from HR, Accounting, Sales, etc... could you easily glean insight from this data?
McKinsey has reported the following, which speak to these issues...
- 97% of data leaders rely on multiple business intelligence (BI) and reporting apps; 1 in 4 have more than 10 apps
- 35% use five or more data warehouses
- Nearly a quarter utilize 10 or more data service providers
What are the dangers of not understanding data types and formats?
Selecting the right data types for your tables, stored procedures, and variables not only improves performance by ensuring a correct execution plan, but it also improves data integrity by ensuring that the correct data is stored within a database. Remember, we are collecting data in order to make decisions with these insights gleaned from that data, not just to collect data for the sake of having it. Decisions you make at the beginning of a data project will follow or haunt you for the lifetime of that data, do not become a data driven statistic.
Beyond Simple Flat File Loading
Gleaning insights from data, whether flat files or any other type of data, is critical to attain actionable information to run your organization successfully.
If your organization is having any difficulty getting a handle on its data strategy and analytics, it's NOT alone! Zuar's products and services can help make your life easier:
- Zuar Runner: Fully automate the regular import of your flat files and other data sources. Runner gets data flowing from hundreds of potential sources into a single destination for analytics.
- Zuar Portal: An easy way to provide branded Tableau dashboards. Monetize your data or provide secure access outside of corporate firewalls.
- Data Services: Zuar can help your organization formulate and implement a wholistic data strategy, taking you from raw data to robust analytics and visualization.