Organizational data has three primary areas including levels formats and granularities

Data Marts

W.H. Inmon, Daniel Linstedt, in Data Architecture: a Primer for the Data Scientist, 2015

Relational Database Design

When the granular data is designed using the relational data model, the data warehouse is prepared to serve many different perspectives of data as depicted in Figure 3.4.2.

Figure 3.4.2.

From a practical standpoint, the granular data found in the data warehouse serves many purposes. But many users want the granular data to be summarized or otherwise aggregated in order to do their analysis. While the data warehouse serves as a foundation of data, in order to serve the different needs of the users, it is more convenient for end users to look at their data in a less granular manner. Furthermore different users have different perspectives. Marketing wants to look at their data one way. Accounting wants to look at their data another way. Sales has yet a different understanding of data. And finance has their own unique understanding of data.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128020449000180

Data Architecture: A High-Level Perspective

W.H. Inmon, ... Mary Levins, in Data Architecture (Second Edition), 2019

The System of Record

The integrity of the data in data architecture is established by what can be called the “system of record.” The system of record is the one place where the value of data is definitively established. Note that the system of record applies only to detailed granular data. The system of record does not apply to summarized or derived data.

In order to understand the system of record, think of a bank and your bank account balance. For every account in every bank, there is a single system of record for account balance. There is one and only one place where the account balance is established and managed. Your bank account balance may appear in many places throughout the bank. But there is only one place where the system of record is kept.

The system of record moves throughout the data architecture that has been described.

Fig. 8.4.2 depicts the movement of the system of record.

Fig. 8.4.2. The system of record.

Fig. 8.4.2 shows that as data are captured, especially in the online environment, the data have its first occurrence of the system of record. Location 1 shows that the system of record for current valued data is found in the online environment. You can think of calling the bank and asking for your account balance that exists right now, and the bank looks into its online transaction processing environment to find your account balance right now.

Then one day, you have an issue with a bank transaction that occurred 2 years ago. Your lawyer requires you to go back and prove that you made a payment 2 years ago. You can’t go to your online transaction processing environment. Instead, you go to your record in the data warehouse. As data age, the system of record moves for older data to the data warehouse. That is location 2 in the diagram.

Time passes and you get audited by the IRS. This time, you have to go back 10 years time to prove what financial activity you have had a decade ago. Now, you go to the archival store in big data. That is location 3 in the diagram.

So, as time passes, the system of record for data changes in data architecture.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128169162000292

Operational Analytics

W.H. Inmon, ... Mary Levins, in Data Architecture (Second Edition), 2019

Data Marts

The way that the data warehouse serves the different communities is through the creation of data marts. Fig. 12.1.11 shows that the data warehouse serves as a basis for data in the data marts.

Fig. 12.1.11. Data marts are fed from the data warehouse.

In Fig. 12.1.11, it is seen that there are different data marts for different organizations. The data warehouse and its granular data serve as a basis for the data found in the data marts. The granular data in the data warehouse are summarized and otherwise aggregated into the form that each data mart requires. Note that each data mart and each organization will have their own way of summarizing and aggregating data. Stated differently, the data mart for finance will be different from the data mart for marketing.

Data marts are best based on the dimensional model, as seen in Fig. 12.1.12.

Fig. 12.1.12. Star joins.

In the dimensional model are found fact tables and dimension tables. Tables and dimension tables are attached together to form what is known as the “star” join. The star join is designed to be optimal for the informational needs of a department.

The data marts and the data warehouse combine to form an architecture, as seen in Fig. 12.1.13.

Fig. 12.1.13. Data marts and the dimensional model.

In Fig. 12.1.13, it is seen that the integration of data occurs as data are placed in an integrated, historical fashion in the data warehouse. Once the foundation of data is built, the data are passed into the different data marts. As data are passed into the data marts, data are summarized or otherwise aggregated.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128169162000371

Protecting Cardholder Data

Dr.Anton A. Chuvakin, Branden R. Williams, in PCI Compliance (Second Edition), 2010

File- or Folder-Level Encryption

File- or folder-level encryption (or file system level) is an encryption system where specific folders, files, or volumes are encrypted by a third-party software package or a feature of the file system itself.

Advantages:

More granular control over what specific information needs to be encrypted can be accomplished. Card data files that you need to encrypt can be stored in a particular folder or volume, and data that does not need to be protected can be stored elsewhere. For example, some smaller organizations that do periodic billing actually use this method to encrypt all the numbers between the billing runs, thus satisfying PCI DSS requirements.

Many file-level encryption products allow you to integrate access-level restrictions. This allows you to manage who has access to what. This helps satisfy data protection and access control.

Some file-level encryption systems offer the capability to track who attempts to access a file and when. File-level encryption products allow you to add granular data logging, which helps satisfy Requirement 10 about logging access to card data.

When there is a need to move the data, data can be encrypted on a file level and then moved off the storage location, one needs to make sure that the unencrypted copy is removed. This maintains the confidentiality of the data when it is moved to a backup tape. Backup tapes are known to have been used by attackers to compromise massive amounts of card data. Even accidental tape “loss” has caused companies embarrassment and triggered costly disclosure procedures.

File encryption is less invasive to a database than column-level encryption. The schema of the database does not need to be modified and the access of data by authorized personnel (based on access control) is not hindered when querying and other management activities take place. This is an aspect of availability, one of the three tenets of the CIA triad; even though PCI DSS does not contain availability requirements, your business clearly has them.

File-level encryption tends to consume less resource overhead, thus less impact on system performance. Modern operating systems can perform efficient file encryption on the fly.

Disadvantages:

Performance issues can be caused for backup processes, especially with relational databases.

Extra resources for key management are required since more keys need to be managed.

Windows Encrypted File System (EFS) with Microsoft operating systems is the primary example of such technology. Remember, if you deploy this type of encryption, you will need to ensure that the decrypting credentials are different from your standard Windows login credentials. Additional encryption products can be used as well. Here are some of the common free or open-source file encryption products, found in wide use:

GNU Privacy Guard (GnuPG or GPG) from Free Software Foundation can be found at www.gnupg.org. It performs efficient file encryption using symmetric and public key cryptography and works on Windows and Unix operating systems.

TrueCrypt is an other free open-source disk encryption software for Windows, Linux, and even MacOS. It can be found at www.truecrypt.org. It can perform file, folder, and full-disk encryption.

AxCrypt (www.axantum.com/AxCrypt/) is another choice for Windows systems. It is also free and open-source.

Encrypting individual card data files is free and easy with the above tools. As with other domains, PCI DSS never mandates individual tools or vendors.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9781597494991000118

Data Architecture – A High-Level Perspective

W.H. Inmon, Daniel Linstedt, in Data Architecture: a Primer for the Data Scientist, 2015

The System of Record

The integrity of the data in data architecture is established by what can be called the “system of record.” The system of record is the one place where the value of data is definitively and singularly established. Note that the system of record applies only to detailed granular data. The system of record does not apply to summarized or derived data.

In order to understand the system of record, think of a bank and your bank account balance. For every account in every bank, there is a system of record for account balance. There is one and only one place where the account balance is established and managed from. Your bank account balance may appear in many places throughout the bank. But there is only one place where the system of record is kept.

The system of record moves throughout the data architecture that has been described and is depicted in Figure 6.4.2.

Figure 6.4.2.

Figure 6.4.2 shows that as data is captured, especially in the online environment, which the data has as its first system of record. Location 1 shows that the system of record for current valued data is found in the online environment. You can think of calling the bank and asking for your account balance that exists right now, and the bank looks into its online transaction processing environment to find your account balance right now.

Then one day you have an issue with a bank transaction that occurred two years ago. Your lawyer requires you to go back and prove that you made a payment two years ago. You can’t go to your online transaction processing environment. Instead you go to your record in the data warehouse. As data ages, the system of record for older data moves to the data warehouse. That is location 2 in the diagram.

Time passes and you get audited by the Internal Revenue Service (IRS). You have to go back 10 years in time to prove what financial activity you have had. Now you go to the archival store in Big Data. That is location 3 in the diagram.

So as time passes the system of record for data changes in data architecture.

Another way to look at the data found in data architecture is in terms of what types of questions are answered in different parts of the architecture. Figure 6.4.3 shows that different types of questions are answered in different parts of the architecture.

Figure 6.4.3.

Figure 6.4.3 shows that in location 1, detailed up-to-the second questions are answered. Here is where you ask for up-to-the second accurate account balance information.

Location 2 indicates that in the data warehouse you look at your historical activity that has been passed through your bank account.

Location 3 is the ODS. In the ODS you find up-to-the second accurate integrated information. In the ODS you look across information such as all your account information – your loans, your savings accounts, your checking account, your IRA, and so forth.

In location 4 are the data marts. In the data marts is where bank management combines your account information with thousands of other accounts and looks at the information from the perspective of a department. One department looks at the data in the data marts from an accounting perspective. Another department looks at the data from the perspective of marketing, and so forth.

There is yet another perspective of data afforded by the data found in location 5. Big Data is found in location 5. There is deep history there as well as a variety of other data. The kinds of analysis that can be done in location 5 are miscellaneous and diverse.

Of course the differences in data and the types of analysis that can be done are different for different industries. The example that has been used is of a bank for the purposes of making the example clear. But for other industries, there are other types of usage information.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128020449000349

Analytics Implementation Methodology

Nauman Sheikh, in Implementing Analytics, 2013

Analytics Variables

The variables are where the art and science come together. What creative variables can you invent that may have a high analytic value? There is a difference between variables and fields. In data warehouse systems, when an analyst goes to a source system to look for data that is needed by the business for reporting and analysis, two best-practice rules are followed:

The first rule deals with trying to get the most detailed level or atomic data possible. If the business has asked for daily sales data per customer per store, the source system usually has the line-item details of what the customer bought and how much it was. The most granular data would be at the line-item level in the basket of the customer, and that is recommended to be sourced and pulled into the data warehouse.

The second rule deals with a principle called triage when sourcing data. The data that is needed by the data warehouse driven from business requirements is priority 1. Then there are operationally important fields like CHNG_UID, which is a field that captures the user ID of the user who last changed a record; these are priority 3 fields. Everything in between is priority 2. The best practice is to pick up everything with priority 1 and 2 while you are in there building an extraction process. It may come in handy later.

These rules are why the data warehouse is supposed to have more fields than seem necessary for the analytics use. Going back later and getting the additional fields is far more difficult than keeping the additional data around just in case. The analytics project can actually get lost in this large river of fields and not know which ones would be valuable. Following are the four kinds of variables to help sort through the list of fields, and also an explanation on how to use these variables because their treatment changes through the project life cycle.

Base Variables

Base variables are the important fields present in the data warehouse within the analytics project’s scope. If the project’s scope is to build a sales forecasting model, then the business context is sales, and therefore all the fields in the data warehouse within or linked to sales are potentially base variables. If the sales department’s users access certain dimensions and facts for their normal business activities, then everything in those dimensions and facts is potentially a base variable. Examples are store types, customer age, product weight, price, cash register type, employee code, etc.

Performance Variables

Performance variables are specialized variables created for analytics models. If the project’s problem statement is a predictive model that calculates the probability that a customer will use a coupon to buy something (propensity modeling) or the probability that an order will ship on time, then it would need the base variables (formatted and transformed) as well as some new and interesting innovative variables that may have a high predictability for the problem statement. The base variables are raw data and usually have continuous values like age of a customer. The performance variables are preferred to be coded variables. So if the customer records have age as follows, 28, 33, 21, 67, 76, 45, 55, 68, 23, etc., then the coded values would replace the age variables as Code 1 (referring to ages less than 21), Code 2 (referring to age ranges from 21 to 38), and so on, and the last one would be Code n (age greater than 100). This way the age distribution, frequency, and its role in the predictive model can be analyzed and tuned.

Other performance variables could be as follows:

Total sales of grocery items

Total number of year-to-date transactions

Percentage when credit card is used for payments

Online user (yes/no)

At least one purchase of >$100 (yes/no)

These performance variables are designed looking at the problem statement. There is no well-defined method to what should be a performance variable. In established industries like customer analytics (within marketing) and consumer credit (risk management), the analytics experts know what performance variables are going to be important, but in other industries like education, shipping and logistics, state and local governments, etc., the performance variables have to be worked out through trial and error over time. Some may become useful and some may not have any predictive value. Chapter 11 deals with this in greater detail. However, they have to be designed so they can be implemented and therefore they are covered in the design stage.

Model Characteristics

The variables (base plus performance) that end up being used in the actual model get a promotion and they are labeled characteristics, as they will get weights, scores, and probabilities assigned to them in the model. While the design may not know exactly which variables are going to become characteristics, there should be a provision to have some characteristics stored as input and then the model’s output also stored with the characteristics. This is important for tuning and tracking the results of the model. The design has to accommodate characteristics (their creation from transactional data and the analytic output predicted, forecasted, optimized, or clustered).

Decision Variables

The decision variables were covered in detail in Chapter 5. These variables are used in decision strategies once the model output has been assigned to a record. So in case of manufacturing and prediction of warranty claims, if the model output assigns a 63% chance of a warranty claim against a product just being shipped, there may be 10,000 products with 63% or higher, so not all can be kept from shipping and not all can be inspected. Therefore, additional filtering is needed to separate the 10,000 products into smaller more meaningful chunks against which actionable strategies can be carried out. For example, if the product is part of a larger shipment, then let it go as long as the prediction is greater than 63% but not greater than 83%. This additional filtering is called segmentation in a decision strategy and the variables that are used to apply this are called decision variables. The importance of decision variables is establishing their thresholds for actions. In this example we have stated that if the product is part of a larger shipment, what does “larger” really mean—100, 500, or 1,200? These thresholds cannot be set arbitrarily and there is some structure to setting these segmentation values.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780124016965000098

Architecture Part 1

John Ladley, in Making EIM Enterprise Information Management Work for Business, 2010

Activity: Information Applications Frameworks

There will be various applications, or collections of software with a function, that will come under the umbrella of EIM. Most common are applications like data warehouse, or BI. The EIM program sets the standard for how these applications will interact and what the portfolio of these applications will look like. This activity defines the required elements that will specify a framework to deliver data and content to the business applications with EIM. This is also where the EIM team can design their “Corporate Information Factory”3 or “Bus Architecture,”4 or both if so inclined. We will identify and define master data, or reference data capabilities if there is a business's need to provide those frameworks as well.

We will be using the Metrics model, Data model, and any process-type work we have done, like the business actions or expanded levers. Analysis is performed to look for patterns. We defined many patterns in the prior activity, using the metrics to suggest batches of common requirements. These patterns point out the need to adapt the various architectural elements we require to the specific business challenges of the given situation. This is best explained via an example:

An airline is an interesting business in that most of its operational decisions and many of its tactical, or managerial reports come from the same view of data, that of record of a passenger on a flight. One person, one seat, one flight “leg.” Connecting to another city means another leg, hence another record. If we were developing the EIM program for this airline, we would examine the requirements expressed in our earlier work products. We would consider there are an enormous number of operational decisions (e.g., how many bags of pretzels or meals do we need?), as well as analytical decisions (is this leg profitable enough to provide more than pretzels?) based on the same data. Therefore, we must take into account building applications that access large amounts of data over a long period of time. The operational reports need to happen fast; the data must “be there.” But the historical analysis can be done after a time and perhaps downstream from the hectic operational data. Therefore, we need a framework to provide operational reporting very fast and we also need a structure to provide historical analysis. And both structures need to contain granular data.

The Metrics/BIR model we developed tells us this. The business actions tell us what will be done with the reports, and the business goals tell us how much the business will benefit. So further analysis of the requirements groupings and business needs has pointed us in the direction of two common elements—a historical collection of detailed data we need to use for analysis, and perhaps higher performance structure for reporting. We can also identify the requirements for when and how much we need to move data from one area to another. Or from the operational ticket and reservation system to the reporting and analysis frameworks. Additionally, we have the details in the metrics and BIR characteristics to determine the nature of the file structures and even recommend the type of hardware and software to manage these applications.

OBSERVATION

All of this “stuff” really does fit together—it has to—because proposing something to the business such as IAM without being able to tie it all together and making it “auditable” is a rerun of the way it has been done for decades. Even if you are not going to delve into pro forma statements of information value, and only really need to get good results to support business drivers, all of this needs to tie together.

Analysis is performed to tell us how we will distribute data and content, how we will identify potential data issues, and where businesspeople can expect to have reliable, certified content. We will also be able to specify the rules for when and how business users can get to content for use within their areas, such as for product research, legal and compliance investigation, or actuarial analysis.

In addition, we will understand the kinds of control and facilities needed to ensure quality, so we will be examining where the “master” version of data will sit, where we desire certified sources to be placed, and how we want to maintain quality and integrity of content. We will also support controls (accuracy, privacy, and security) and compliance (think Sarbanes–Oxley) requirements.

Lastly, we will also address how all of our information assets will be moved around. Like our factory inventory metaphor, we will need to move material (data and content) from one work area to another. We may require special movement mechanisms or additional “wiring and plumbing.” We will also be developing an idea of capacity and performance required to support the business. (It is useless to perform IAM if the IAM elements do not allow content to get where it needs to be in a timely manner.)

Activity Summary (Table 23.7)

Table 23.7. Information Applications Frameworks Activity Summary

Objective Identify and define the architecture elements of an EIM program that will deliver business applications.
Purpose Produce a verifiable list of structures that need to be deployed over time. This allows for a more reasoned and accurate Road Map.
Inputs The results of the Metrics Architecture step, as well as the various process and data models.
Tasks Refine the analysis of Metrics/BIR

Identify smaller grouping of requirements, and other characteristics deemed relevant:

Time characteristics, latencies

Other characteristics

Specify reporting, BI, and analytics frameworks

Generate spider charts or other graphic of framework characteristics

Identify reporting BI framework elements (ODS, DW, DM, staging, etc.) from spiders

Map Business and Information Requirements for and Analytic framework

Develop framework presentation (spider charts)
Identify reference and master data needs from DQ survey, BIRs, and dimensions
Identify content management needs from BIRs and Business Model
Isolate the dimensions from ALL Metric/BIRs
List all dimensions and develop standard definitions
Merge results of DQ assessment with applications and technology requirements
Determine criteria to direct technical elements choices

Consider possible unique requirements, e.g., for Privacy, Security, Encryption, Web click capture, Device Interface, external data

Develop technology elements specification

Perform information interface and sizing analysis (see MART technique)

Identify gaps in current state technology

Determine new technology requirements, existing technology to leverage, and technology to phase out

Associate relevant technology categories with framework elements (technology classification; tools, data interchange, storage, content management, etc.)

Determine criteria for technical elements

Identify potential candidate software and services solution vendors

Short list technology (time permitting, or if within EIM scope)

Coordinate with Enterprise Architecture

Diagram and document all architectural elements
Techniques There is an alignment technique called MART analysis, M—Message, A—Aggregate, R—Replicate, T—Transform where we align business processes with the requirements to supply the processes with content and data in a timely and correct manner.
Tools Again, the various enterprise architecture tools or Excel. Visio is preferred for the visual presentations.
Outputs There is a robust set of deliverables:
The initial list of application elements for EIM.
The initial list of technical elements for EIM.
List of and justification for specific data and content movement facilities.
Merger of DQ requirements with the EIM architecture elements.
Outcome The EIM team has completed the various work products, and presented the EIM Architecture elements, and the EIM leadership, as well as other constituents (like IT operations, compliance, and business areas) understand the nature of the proposed elements.

Business Benefits and Ramifications

The business areas will begin to see that elusive “something” they have been looking for since the EIM program started: there is now some evidence of what EIM will deliver. There are other support areas in the enterprise which will need to view this activity’s results, as most organizations have architectural and technology standards that must be verified against the required elements for EIM.

Approach Considerations

The Metrics and BIR Architecture data and clusters and supporting models are ingredients in the soup that gets refined into architecture elements. These are not technology elements yet. We are specifying the types of applications the business will use. Applications have to be built or acquired, so this activity provides important support for the Road Map. At this point, you essentially have a collection of logically related batches of work. The Road Map will convert these into units of effort for deployment.

Depending on the complexity of your business and metrics model, you may have many groupings across the various characteristics. However, just being colocated does not mean that all of the metrics will exist in one type of application, or can be developed at the same time. The business members of the EIM team will need to look at the groups of requirements and ensure there is a common business direction.

Specific elements that may be defined include:

Reporting, business performance management, and analytical application framework

Management of “golden copy” of data also known as MDM

Management of other reference items, such as codes (everything like state codes, sales regions, and product types)

Management of those codes that are also important business dimensions, which are segments of the business you will want to sort and report by. (Sales by region, store profit by product class)

Content where security, privacy, compliance, and regulation are of concern to the business

Document management and workflow

Text requiring the ability to be searched and analyzed

Elements that will require data quality and data control consideration

Business functional areas where the IVC recommends process improvements

Business content integrity functions, like document destruction, data purging, backups, data retention, and archiving

Requirements for sending data outside of the organization and bringing external data into the organization.

The EIM team will finalize the various types of frameworks and components, i.e., does the organization require “middleware” to connect databases? Is there a need to have “gold copies” of customer or product data? Are there significant needs to manage documents and workflow? Are the BI needs best served by a “single source of the truth” of some sort?

The EIM team starts by refining the Metrics and BIR Architecture output. After latency and granularity, other characteristics can be reviewed. Perhaps response and follow-up times are important, especially if the IVC work produced juicy new ways to use data at touch points.

Characteristics that affect the reading and usage are analyzed to determine the delivery frameworks for data and content, e.g., is there a data warehouse required? If you are revisiting this activity, you can use this step to verify the effectiveness of your current reporting and analysis environment. The output from this is usually produced in the form of a bar graph or plot. I prefer the radar plot, or “spider charts” as one client referred to them. Experimentation and analyzing the requirements by time period, latencies, and other characteristics deemed relevant are done using this technique. The EIM data architects will correlate the results of data movement (MART) or similar technique to design a framework to move and manage information. Similar steps are taken to identify required master data or reference data frameworks.

One of the specific business-aligned techniques we deploy is called MART. This mnemonic stands for:

M—Message; notification of a content change or business event across the various EIM applications

A—Aggregate; summarize, boil down, or otherwise move data to a higher abstraction, usually for reporting or managerial purposes

R—Replicate; move data and content as is, make a copy

T—Transform; alter or change content or data so it can be repurposed, such as converting from a code in an old system to a code in a new system, or performing a minor data quality operation for “cleanup” of a data error.

We take all of our business actions, IVCs steps, and/or processes we have defined (we can even use the original levers if time is short) and see how they need to acquire content and data or communicate with each other. An example is shown in Figure 23.4.

Figure 23.4. MART example

The EIM team will use the Metrics Architecture and MART analysis (and other techniques if desired) to develop a picture of how wide the pipes need to be and how fast the pumps must operate. Specific elements identified here may be facilities to:

Synchronize data and content across the enterprise due to timing, or required redundancy

Broadcast changes to critical data elements like reference and master data

Clean up, or transform data so it can be used effectively further down the IVC

Aggregate or restate data for historical use and management use.

There will be situations in some organizations where “plumbing” isn’t fast enough and the proposed frameworks will need to look more like a circuit diagram, with a “main bus” that manages the coming and going of business events. Technology-savvy EIM sponsors will recognize this as defining a need for service-oriented structures.

If the analysis of requirements indicates new technology needs, the EIM team is able to identify requirements and target specific brands and suppliers of technology at this point. If so inclined, they can develop lists of candidate suppliers and start procurement.

Sample Output

Here’s an example from the case studies to show what the deliverable may look like. Rather than reproduce an artifact, it is more effective to show the flow from artifacts to framework (Figure 23.5).

Figure 23.5. Farfel frameworks flow

I took Table 23.4 and produced a “spider chart” to show how the scores translate to a visual representation. Obviously, we would need to produce multiple charts based on our clustering (Figure 23.6).

Figure 23.6. Ubetcha sample spider chart

TIPS FOR SUCCESS

If you are going through this for the first time, several enterprise constituents will most likely express concerns that must be addressed.

1.

The enterprise will have technology standards or standard technology (two different concepts—same result). The EIM elements may conflict with the official or de facto standards.

2.

There will be projects funded and under way that will reflect EIM elements. The business must decide the extent to which the EIM program influences work that is in progress. There will always be ragged edges here—in a large organization, there is no way to catch stuff that is “in flight” and draw it back into the governed EIM program.

3.

There will be shadow IT areas that will begin about now to announce they are not going to relinquish (insert their departmental product, tool, database, or reports here), and will begin a process of complaining and/or passive aggression. This is actually good news! First of all, you have their attention. Second, you have identified additional parties requiring EIM orientation and education. Lastly, they may have an EIM element in place that can actually be leveraged.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780123756954000230

Stochastic optimization approaches for elective surgery scheduling with downstream capacity constraints: Models, challenges, and opportunities

Karmel S. Shehadeh, Rema Padman, in Computers & Operations Research, 2022

5.1 Elective surgery databases

The availability of granular data on patients, resources, and clinic staff and their movements during pre-operative, surgical and recovery activities is essential for developing data-driven surgery scheduling and capacity planning approaches that better mimic reality. As pointed out by Wilson and Doyle (2008), the safe, efficient, and coordinated passage of a patient through the surgical and downstream recovery begins long before the patient arrives at the hospital for surgery. The journey starts when a patient’s health concern is recognized and a physician concurs that surgery is needed. From the time of seeking the first medical attention onward, multiple healthcare providers will collect hundreds of data elements and store them in various systems—electronic or paper, integrated or otherwise. Within this complex jungle of information repositories, some patient data will be redundant, some will be contradictory, some will be missing, some will pass from system to system, and some will reside only within a single database (Wilson and Doyle, 2008). Thus, when the data are integrated, pre-processed, and then stored in one database/system, it could be the key to creating new knowledge and evidence for practice innovation for OR stakeholders.

The first step toward developing integrated and standardized surgery databases is establishing a reliable and efficient data collection method with appropriate tools, technologies, and standards. There are numerous methods to collect healthcare data for research and hospital administrative purposes. For example, suppose we want to collect data on LOS in the hospital. As detailed in Sarkies et al. (2015), such data can be collected manually from ward-based resources such as nursing handover records, paper-based ward discharge/transfer records, paper-based inpatient medical records, direct observation by experienced personnel, etc. This is indeed a very time and effort-intensive collection method that is subject to various human errors. Retrospective data extraction from administrative reports, scanned medical records, and electronic patient management systems is another method to obtain data. While this approach has been extensively used in healthcare research as a gold standard approach, retrieving medical records and transforming them into research data is resource-intensive and requires exceptional knowledge of the medical context and research skills (Hogan and Wagner, 1997; Maresh et al., 1986; Wilton and Pennisi, 1994)

Modern health information technologies, tools, and devices offer alternatives to the aforementioned traditional data collection methods. For example, as mentioned earlier, RTLS provides precise location-based information of tagged entities (people, equipment, etc.) within a defined area in real-time. Thus, such health ITs enable granular data collection and often yield massive data on tagged entities that can be used to build data-driven optimization approaches. Unfortunately, a few health systems employ modern health ITs for recording and collecting health data in part due to the prohibitive cost of health IT, the potential need for additional trained staff to ensure both compliance and careful collection of data, insufficient research about its benefits and unproven return on investment, apprehension about change and philosophical opposition to IT, and data privacy issues. Therefore, more research is needed to show the (financial and health) benefits of employing modern health ITs in guiding, optimizing, and changing operational activities in surgical suits. Moreover, there is a need for new technologies that can efficiently pull out, harmonize, and store collected data from different resources in an accessible manner that does not need an advanced computer or IT skills. Government support and fund will encourage implementing new health IT in hospitals to collect data and conduct research.

Developing standardized surgical databases additionally requires more hospitals to adopt advanced health IT and collective efforts from hospitals (including managers and clinical staff) at the national level to ensure a standardized collection of data. Therefore, policies and strategies for developing standards-compliant surgical databases is an important research direction, and a pre-requisite to generalizable modeling and analytic studies. The data type may depend on the health system and surgical suite under study and the optimization objective/task. However, in most OR and surgery scheduling problems, we often need a combination of timestamps along different stages of the surgical and recovery processes (e.g., surgery start and end time, time in OR, admission and discharge times from post-operative recovery units, etc.), clinical data (e.g., medical history, surgery details, etc.), and real-time demand, supply, and capacity data (e.g., hourly ICU occupancy/bed capacity during the day, availability of clinical staff, availability of surgical supplies, etc.) to analyze and model key variables (e.g., surgery duration, LOS, clinical staff overtime in upstream and downstream units, patient waiting time on the day of surgery, idle time, number of blockings between pre-operative stages, number of transfers between post-operative stages, number of premature discharges from ICU, financial data associated with these tasks, and many others).

Read full article

URL: //www.sciencedirect.com/science/article/pii/S0305054821002628

Digital health and addiction

Lisa A. Marsch, in Current Opinion in Systems Biology, 2020

The promise of digital health

Digital technology is transforming our world and promises to change how we understand and promote health. The explosion of digital devices and ‘big data’ analytics allows for the unprecedented collection and interpretation of enormous amounts of granular data about everyday behaviors and the context in which they occur.

Digital health [1,2] refers to the use of data captured via digital technology (sometimes referred to as ‘digital exhaust’ data) [3] to both understand people's health-related behavior and provide personalized health care resources. Smartphones and smartwatches enable passive, unobtrusive ecological sensing that provides continuous measurement of individuals' behavior and physiology, such as sleep, social interactions, physical activity, electrodermal activity, and/or cardiac activity [4]. Individuals can provide consumer-generated data in response to queries on mobile devices (sometimes referred to as ‘ecological momentary assessment’ [EMA]) to reflect (timestamped) snapshots of their daily lives. Rich information can be gleaned via EMA about individual's context, mood, behavior, social interactions, pain, sleep, and stress levels, among other factors (refer to Figure 1 for examples of EMA questions in a digital health app). And voluminous social media data allow for a rich understanding of individuals' social networks, communications, and behavior. These data sources (separately or in combination) may provide new insights into digital (behavioral or biological) markers of health-related phenomena and their evolution over time [5]. Indeed, methods such as digital phenotyping [6] or reality mining allow for moment-by-moment quantification of individual-level information collected via personal digital devices as individuals live their daily lives.

Figure 1. Sample EMA questions — types and formats. EMA, ecological momentary assessment.

In traditional medicine, clinical assessments are typically conducted by trained professionals using structured diagnostic instruments, medical devices, and clinically validated assessments. Digital health data enhance these conventional sources of clinical data with ecologically valid data captured in naturalistic contexts. And indeed, digitally derived data are increasingly considered an essential part of an ‘evolving health data ecosystem [7,8]’ and promise to markedly impact both discovery science and translational science.

Read full article

URL: //www.sciencedirect.com/science/article/pii/S2452310020300081

A survey on fog computing for the Internet of Things

Paolo Bellavista, ... Alessandro Zanni, in Pervasive and Mobile Computing, 2019

5.6.1 Data Analytics

Data Analytics is the application of advance analytics techniques to data sets in order to identify specific situations [23]. By focusing on where data are analysed, we can divide this component into Big Data Analytics, Small Data Analytics and Hierarchical Data Analytics. Big Data Analytics relies on the computing and storage capabilities of cloud environments to execute complex analytics in big data sets [128]. Small Data Analytics refers to a limited quantity of highly granular data that usually provide valuable information for the system, used to perform real-time decisions and actions, suited to be handled by edge or fog nodes. In Hierarchical Data Analytics, the edge and fog nodes store and analyse the gathered data. Then, the relevant and complex information can be aggregated and posted to other nodes with higher capabilities or, even, to the cloud environment to perform medium or long-term analysis [13].

These methods are applied in the presented case studies. For example, in STL, small data analytics are applied for creating a perfect picture of the current situation and helping the decision-maker component to react in real-time. The system collects environmental information about traffic density, vehicle specific data, movements of other vehicles or pedestrians or bikers on the road, pre-emptive emergency routing, and so on. etc. The fog node should store all this information and execute quick analytics techniques to identify certain movements on the road, trying to understand which movements vehicles are performing and, then, predict where they will probably move. In this sense, Hong et al. [24] introduced the MCEP system for monitoring the traffic through several patterns (e.g. movement, acceleration).

In Wind Farm case, Big Data Analytics are more used, since the real-time is important but also the long-term analytics. In particular, it is important to guarantee a certain accuracy level with wind forecasting techniques, especially short-term forecasting techniques, in order to improve the quality of wind power generators and to schedule appropriate operating levels according to the different regulation tasks [20].

In Smart Grid, hierarchical data analytics models are basic to ensure that the network operates in the right way and to correctly manage dynamic end-user demand and distributed generation sources, favouring promptly reactions in case of unexpected events. Data analytics are key to perform autonomous data control/selection in order to give a consistent feedback on energy usage that can lead to behavioural changes by energy users [59]. In particular, hierarchical data analytics are central to face renewable energy supply unpredictability that may be highly variable in relation to weather conditions, since every intermediate node can act as an active control unit [129].

Read full article

URL: //www.sciencedirect.com/science/article/pii/S1574119218301111

What are the primary concepts of a relational database model?

In a relational database, all data is held in tables, which are made up of rows and columns. Each table has one or more columns, and each column is assigned a specific datatype, such as an integer number, a sequence of characters (for text), or a date. Each row in the table has a value for each column.

What is an organized collection of data?

A database is an organized collection of data. A database is a collection of information that is organized and can be easily accessed, managed and updated. Data can be organized into tables, rows, column, index to find the relevant information easily.

What are the four primary traits that help determine the value of data?

There are five traits that you'll find within data quality: accuracy, completeness, reliability, relevance, and timeliness – read on to learn more.

What does information granularity mean?

Definition. Granularity concerns the ability to represent and operate on different levels of detail in data, information, and knowledge that are located at their appropriate level. The entities are described relative to that level, which may be more coarse-grained or concern fine-grained details.

Toplist

Neuester Beitrag

Stichworte