Data analysis has become a twisted Mobius strip, looping back and forth not only to influence not only how we look at data, but also how we handle the data itself. This makes content enhancement anything but an easy process.
Despite the rapid evolution of processing and algorithmic tools available today, organizations struggle to harvest the insights these technologies are meant to generate. This is largely due to poor data management; data that has not been managed and cleaned is of little use for analysis, no matter how good the user interface is. Unstructured data, in particular, resists uniform management. Its multi-format nature and rapid, continuous generation result in an extremely diverse and rapidly evolving content ecosystem. Nevertheless, structured data is essential to daily business productivity and is necessary to meet legal and regulatory requirements. It also has immense potential for business insight.
At the heart of this struggle are Records and Information Management (RIM) professionals, who have seen a rapid metamorphosis of their roles as data volumes have increased and paper has shrunk. A decade ago, legal changes prompted a cookie-cutter approach to information management, separating data into separate systems, as needed. The paradoxical result of these data management silos is an even more expensive and difficult to manage environment. With guidelines and objectives now cemented through the Directive on the Management of Government Records (M-12-18), agencies are faced with the daunting task of implementing seemingly “simple” requirements that are actually quite complicated, given the disparity of data.
We have entered a new era of RIM, with big data at the forefront. The paper is out; the analytics are there. RIM professionals are eager – rather than apprehensive – to use partial automation to tackle the volumes of data they could never have classified by manual means alone. But there is always a layer of frustration. High-level organizational leaders often champion a holistic, singular data management strategy, but those in the trenches know that the nuts and bolts of implementation can become a nightmare, resulting in upfront costs and delays. .
Some major common challenges were widely discussed at the recent ARMA 2015 conference for information governance professionals. A hot topic was RIM’s role in organization-wide governance, ie if and where RIM sits at the table, as a standard participant or host. I suspect the end result will depend on the dynamics of individual organizations, the influence RIM has established within the organization, how much RIM is willing to take on, and the extent to which the organization will allow teams of RIM to control such a company.
During the conference, other challenges faced by real-world RIM teams also surfaced. The deletion debate was still hot, but data mining and low storage costs gave new vigor to the “hoarding” school of thought. It was only a discussed challenge, however; several common themes were discussed, all with big implications for RIM.
Challenge #1: Deciding what to remove
The rise of analytical capabilities has created an ideological impasse between organizational units. More traditional, risk-averse units such as RIM and legal teams strive to remove outdated content as soon as it is legally permitted to do so. More proactive factions, such as marketing and management, want to retain as much data as possible to leverage in analytics: the “more is better” approach.
It is a paradox. Analytics conventions seek to amass data, while traditional data governance seeks to systematically weed out unnecessary content. Regardless of where an organization sits on the “risk tolerance” spectrum, the first step is to decide what (and when) to cut. But even to eliminate some data, the organization must essentially “touch” each piece of content to decide whether to eliminate it or keep it.
The most elegant solution would be a singular environment where all unstructured content is centrally managed: where company-wide retention policies can be scheduled and executed consistently and no duplicate copies remain. in the dark.
Challenge #2: Designate consistent access privileges
The dispersion of data across departments and silos creates a barrier to enforcing consistent access privileges for users. Within a single application or silo, assigning the appropriate access rights is usually straightforward. This collapses, however, when more silos are added. An individual who may have the right level of access in a system may be completely locked into a related platform – or, conversely, may be granted far too broad access. Coordinating permissions across platforms often requires manual updating, which quickly becomes impossible due to latency times and high levels of human error. To complicate matters, permissions are often based on variables that can change over time, such as timestamps, and the same document can have inconsistent policies in different applications. In a siled environment, access privileges mean little.
Challenge #3: Maintain complete audit trails
Many compliant and legal uses of data require a complete log of changes, modifications, and ownership of a given piece of data, with no gaps in ownership. These audit trails are essential for defense, but they are nearly impossible to maintain when copies of data reside in multiple silos. The audit trail for a particular item is only relevant for actions taken on the exact same platform in which the audit trails are generated. A standard Word document may be considered relevant as a record and placed in an isolated record tool, such as an enterprise content management system, but now has multiple copies and identities. Inevitably, the audit trail provided on each copy will be different. To definitively know the complete history of a piece of data, there must be a single point of contact: a single “master” environment in which all relevant data is stored in unique copies, even if duplicates exist elsewhere.
Challenge 4: Broaden the classification
Today’s volume of data simply cannot be efficiently classified manually. The ever-increasing volume of information, changing policies, and legal precedents that hold almost any data to be discoverable have completely redefined what constitutes a record. Even if a company establishes defensible policies for the expedited removal of “unwanted” content, the problem remains that every piece of data must be evaluated in some way to determine its status.
And since every piece of data in the organization needs to be “touched” at some point, a mix of automatic and manual classification is currently the only feasible approach. The exact configuration and division of roles between human and machine will depend on the specific needs of an organization. For now, it’s generally accepted that it’s best to let registration professionals do what they do best: assign policies to complicated or ambiguous items. Autoclassification can then be left to filter out easily identifiable items such as those with specific URLs, items with certain metadata, or items created by predefined key people. However, for this to happen, there must be a single classification engine through which all data flows. Separate systems with unique classification capabilities cannot create a consistent result, causing irreconcilable holes in correctness.