2 min read

Data 0 - what data is

John Snow's map of 1854 London cholera outbreak

What is data?

The famous map above presents data about where people had died in the 1854 London cholera outbreak. This insightful compilation of data about who had died over layed on a map of where they had lived provided the knowledge necessary to determine the root cause of the outbreak. John Snow, the progenitor of the map and the father of modern epidemiology, successfully refuted (eventually) the prevailing idea at the time that the London fog was to blame. Rather, his map data indicated the outbreak to be due to a single water pump, whose handle removal likely prevented many more deaths. As you can see, even simple data can be quite powerful in the applications of human health.

In practice, data can be many things, from a count of ill patients, to a 1 or 0 (Boolean) as  found in medication administration records indicating administered or not, or even a lengthy piece of text, like a provider's note.

# python
# TEXT OR STRING DATA

data1 = "Meropenem"
print(data1)

#--> Meropenem

data2 = 90.0
print(data2)

#--> 98.6

You can interact with data to generate new data.

# python
# NUMERIC DATA

height_m = 1.8
weight_kg = 100.0
bmi = weight_kg / (height_m) ** 2.0
print(bmi)

#--> 30.86

You can also parse complex data to generate new data.

# perl

my $data = "Clinical presentation of GERD, however biopsy 
indicates EoE given 17 eos/hpf on pathology report.";
if ($data =~ m/([0-9]*)\seos\/hpf/gi)
    {
        print "$1\n";
    }

#--> 17

Data can be stored and queried.

# sql

INSERT INTO VITALS (COLLECTED_DT, COLLECTED_BY, VITAL, RECORDED_VAL)
VALUES ('2021-04-16 14:45:32', 'JAQUELINE YI', 'HR', '88');

SELECT COLLECTED_DT, VITAL, RECORDED_VAL
FROM VITALS
WHERE COLLECTED_BY = 'JAQUELINE YI';

#--> 2021-04-16 14:45:32, HR, 88

This is data...

And so is this...

So, how do you work with your data?

The first step is to understand what data type(s) you are working with. With numeric data, you might be able to run it directly through some charting software to get a distribution of the values.  Perhaps it also has a datetime component with which you can do a timeseries analysis. Maybe you have imaging data, based on the type of modality you might need a particular viewer, such as with DICOM images. If your data is of a text variety, you might need to work with any number of tools including regular expressions, natural language processing, or machine learning. Chances are high that you are working with many types of data in healthcare. Therefore, you should be familiar with the intricacies of data types. Over the next several lessons, we'll cover most of the data types you might interact with.

So what is data? Well in short, it is a tool that leads to knowledge. But that is a piece of a larger pyramid that we will discuss soon.