
I have heard of DBAs but what’s a Database “Architect”?
A DBA traditionally handles administrative tasks such as creating security accounts to access and manipulate the data, handling data migration between different states (Development / Quality Assurance / Production) , and handling higher-level policies. A Data Architect is someone who knows the data intimately and builds the data system being used. In a growing number of businesses this is an official separation of duties due to laws such as Sarbanes Oxley.
The Data Arch builds the database tables and relationships which are normalized (3NF) for speed and efficiency, creates triggers for monitoring the specific actions on those tables and then provides stored procedures and views for the users to access the data.
Aside from being a senior programmer, I have spent years in the position of data architect providing all of the above services as well as creating and managing a data hub layer that provides a single source to a business-logic level of programming objects that can be accessed via direct instantiation or remotely and return WSDL, JSON, or simply pass through structured or queried data. This has helped prevent SQL injection attacks, siloed data systems to prevent attacks through our database to other connected systems, and provided a controlled means to access our data across our entire business platform while preserving the flexibility of making unlimited changes to the data structure as growth demands it.
That sounds like a snoozer of a job!
On the contrary, I love database work! The position of Data Arch which I have occupied has been unique in that for our public websites, a majority of our data is a reflection of business data from across several business divisions, acting as a data warehouse and using our websites to report on that information as it is needed.
While it is not an officially orchestrated data warehouse with data marts or snowflake data constructs, it has provided me the opportunity to perform data mining and extract information about our customer’s interests, locale, and a number trends that lend to larger business decisions and market predictions. It is challenging at times but deeply rewarding.
Why do websites need such complex data administration and handling?
When I first started programming I didn’t realize it at the time but I treated data like flat records. For example, a user table would have a first name, last name and phone number. But what I didn’t realize was that I was wasting space. What if the user didn’t have a phone number? Empty field. What if that user had more than one phone number? Do I keep adding phone number fields? What if another table called “Billing accounts” also needed a Name and address. Sure I could link the two but if the user’s information changed in one table, what would guarantee it would change in other tables? When the same data is repeated in multiple tables it becomes a nightmare to keep up with.
Then there was how the data was accessed. We would write adhoc calls directly to tables from within the code. What if a field property would change, be removed, or if other fields were added. Then I would have to hunt down, across dozens of sites, all the places where the table was being accessed. What a headache!
Data corruption, wasted space, highly inefficient data structures and a herding of adhoc queries was a way of life and it's effects were reflected back on the users of the websites supported by that data. On a small scale it is manageable but on a large scale it is nearly impossible to control. In recent years I have unified access to data into a programming hub with a business logic layer which then connects to views and stored procedures. This reinforces code-reuse and makes management of large data systems more centralized. The user accesses the data as if it was a method in a class passing arguments as parameters and getting back query structures. More importantly this allows me to grow and change the database and so long as the output to the user is the same, the users wouldn't need to be bothered with the changes. The structure of the database is transparent to the user and as a result, can be changed with relative ease.