A brief explanation of data sources and answers to some common questions our clients typically have about them.
No matter what industry you’re in, the chances are that you’re probably generating data (from multiple different sources) all over the place. And that can be a major pain to deal with when you’re trying to gather all the information you need in order to determine what’s working and what’s not, or simply present results to the necessary stakeholders. Combining all of the different data sources you’re using into a central location is one of the biggest challenges most businesses face today.
Your data sources can be on-prem or in the cloud and may include some combination of a finance and accounting system (QuickBooks, NetSuite, etc.), a CRM system (Salesforce, HubSpot, etc.), a manufacturing system (Katana, FactoryLogix, etc.), a POS system (Square, Heartland, etc.), an inventory management system (Fishbowl, Orderhive, etc.), excel spreadsheets, and/or a variety of other types of software. These are all sources full of data, with your end goal being to bring all of that juicy data together in order to turn it into useful information, information that will ultimately guide you to make the best business decisions possible.
We’re typically asked the same 5 questions about data sources and how they can be used:
- Can I get access to the data?
You can, it depends on your source system. What you’re looking for is an access point. This could be as simple as downloading a CSV file onto a shared drive or data lake storage. Or, it could involve programmatic access to that data source system’s application programming interface (API).
- Is the data of good quality?
Data Quality (DQ, and no I’m not talking about the ice cream 🍦😂) is an extremely important concept in data management. If the data is not in the correct format, or is missing data or values for certain rows or columns, for example, then turning the data into robust usable information may involve going through certain cleansing or transformation steps, before it can be reliably used.
- What’s the best way to give everyone access to the data?
This can be a tricky question. It depends on your information technology architecture. Or, if that’s too slow, nonexistent, or full of red tape, you’ll need a dependable self-service approach (especially when talking about data sources that are in the cloud). Other solutions like Dropbox, Box, or even a shared drive on your network can help. Worst case scenario, there’s always email, but I think you can imagine the inherent problems that go along with that 😏
- Who owns the data?
It depends, the data can be at a departmental level, enterprise level (meaning the entire organization), or at an individual level (think of a business analyst). Not all data is meant for all audiences, but the originating source of the data should always have a responsible person who’s a subject matter expert on that data. Other systems and means for delivering the data, transforming the data, and reporting on the data can allow the owner to secure and distribute the data as needed.
- What is data granularity?
This is the level at which the data source exists. For example, data can be created at the day, week, month, quarter, year, etc. level, depending on the frequency of the demand for the data. This could be the case for most transactional data, like accounting and finance, but other transactions or event-based data (like internet of things (IoT) data) can get really deep with the frequency of events/transactions coming in at the second and even nanosecond levels.
If you thought all data sources were all just excel files and CSVs (although these are very common data sources and structures), you were sorely mistaken. There are lots of more advanced data source types/structures like JSON, SQL/Databases, Parquet, AVRO (don’t worry, we’ll go into these later in a more technical article).
If you’d like to learn more about bringing all of your data sources together, contact one of our data experts.