across campus while inspiring
further advances in the techniques
of data science itself,” Hu also said
in 2015.
Big data has been the crux of
medical initiatives and disease
research,
the
uncovering
of
international corruption scandals
such as in the Panama Papers,
and worldwide technological and
economic development.
For example, access to massive
deposits of information related to
issues such as on-field behavior
can help predict athletic-related
injuries, teaching and learning
analytics can lead researchers
to
create
better
educational
techniques,
and
patterns
in
accumulated patient data could
help doctors discover preventative
measures
diagnose
health
conditions.
Now celebrating two years at the
University, the Michigan Institute
for Data Science — an institute
under the umbrella of the DSI — has
expanded as a multidisciplinary
and
interdepartmental
sphere
for all research in data, much
alongside two other sectors of the
DSI, including Advanced Research
Computing – Technology Services
and
Consulting
for
Statistics,
Computing
and
Analytics
Research.
Co-director of MIDAS Brian
Athey, a professor of computational
medicine and bioinformatics, said
innovations continue as the quest
for finding new ways to use big data
grows.
**********
Even before this influx of data-
driven institutions, researchers at
the University had been using these
large data sets to tackle issues such
as disease prediction and augment
the research of computer science.
Eric Michielssen, University
associate
vice
president
of
Advanced Research Computing –
Technology Services, said there is
a “tsunami” of data available for
use for these issues and more, but
he and Athey cited the importance
of a concept called the Four V’s of
Big Data.
Aside from simply the volume of
data Michielssen said researchers
consider velocity, or the ability to
receive vast amounts of data with
unprecedented speeds. They must
also note the variety of formats
available, not just the receipt of data
through structured spreadsheets,
and the veracity, or uncertainty
and trustworthiness of the data
received.
Michielssen,
whose
office
is home to MIDAS and the
overarching Data Science Institute,
said collecting and generating data
on a daily basis at unprecedented
speeds — particularly through
social media, financial transactions
and a newly coined “internet of
things,” where objects are given
the ability to share data through
sensors, wireless technology and
complex networks — requires
researchers to consider how to
leverage that data for the good of
society and science, making it an
innovative phenomenon.
“It’s nothing short of a game
changer
for
society,
as
well
as for science and education.”
Michielssen said. “Research in just
about every field is being affected
by this new phenomenon, this big
data phenomenon.”
According to Athey, the overall
significance of data science at a
university level — especially in the
context of the University’s top-
ranking research across the board
— is that much of the University’s
history
in
machine
learning,
data
analytics,
statistics
and
computation has already paved the
way to today’s methodologies.
“We had an ideal environment
to do this at U of M,” Athey said.
Noting an increase in the power
of computing and a decrease in its
cost — also known as Moore’s law
— Athey said, “Data science and
big data are transformational to
society and higher ed.”
The modern data scientist isn’t
surrounded by rows of outdated
computers that flash neon green
numbers,
MIDAS
Managing
Director Kevin Smith said.
Instead, Smith said, managing
data today requires the three-
fold collaboration of MIDAS —
the University’s academic hub
— a focus on data science services
such
as
consulting
through
CSCAR and a central avenue for
high-performance
computing
infrastructure through ARC-TS.
“This
is
really
thinking
holistically about how you take
data and be able to integrate and
aggregate it in a meaningful way
so that you can explore that data in
the context of whatever scientific
or business problems you’re trying
to solve so that the analysis, the
visualization, the exploration that
you’re doing is valid,” Smith said.
Transportation and Tech
In much of its data science
research, Michielssen said, the
University
focuses
mostly
on
application rather than theory
and method to yield more tangible
results,
especially
through
applications to policy, education
and infrastructure.
“In many ways, we’re trying to
frame this as an opportunity to
advance the field of data science
from a methodological perspective
but in the context of the application
of data science,” Smith added.
For example, researchers within
MIDAS’s Centers for Data-Driven
Transportation
Research
and
that of Data-Intensive Learning
Analytics Research are using data
for improving automobile use and
transportation, as well as creating
new tools to examine the modern
learning process.
Carol Flannagan, an associate
research
scientist
for
the
University’s
Transportation
Research Institute, noted her
team’s application of big data to the
analysis and creation of simulations
of
driver
behavior,
traffic
regulations
and
transportation
systems,
all
with
traditional,
automated and connected vehicles.
“Transportation
data
is
changing so fast,” Flannagan said.
“It’s a really great area for novel
applications of existing methods
or even extensions of methods and
new methods.”
In addition to working to
expand
the
amount
of
data
available to researchers in the field
of transportation data analysis,
Flannagan and her team have been
successful in surveying motor
vehicle crashes and establishing
applications of crash-avoidance
technologies. These applications
can be transferred to vehicle
occupant protection against and
ideally, total crash prevention.
Through
visualization
tools
and a surplus of traffic crash data,
vehicle-centric
countermeasures
against
vehicle
crashes
—
particularly vehicle design — work
in conjunction with behavior-
centric measures, roadway design
and enforcement to produce new
options for policymakers and
infrastructure
designers.
This
helps optimize design and laws and
eliminate unnecessary crashes.
Taking a comprehensive view
of safety, making predictions and
taking into consideration laws that
are working well for some aspects
of driving allows researchers,
policymakers
and
others
to
integrate and focus improvements
elsewhere.
Flannagan
said
one
thing
that poses difficulty are quick
accessibility and shareability of
data.
“We are not freed from the
requirement to think first before
engaging with data analysis,”
Flannagan pointed out. “The
requirement to think first just
got harder.”
Data science research does not
come without other challenges,
however,
particularly
in
a
changing landscape of big data.
Social Science and Challenges
In May 2014, former President
Barack Obama’s Executive Office
released an official White House
report detailing the significance
of upholding privacy values,
responsible education in a digital
age and using data as a public
resource.
“Properly implemented, big
data will become an historic
driver of progress, helping our
nation perpetuate the civic and
economic dynamism that has
long been its hallmark,” the
report reads.
Since then, the nationwide
use of big data has more
recently been at the heart of
controversy surrounding the
2016 presidential election; big
data was first seen as generally
predictive for the election’s
results, though others claim
now-President Trump’s initial
dismissal of the importance of
data was a mask for his use of
the information to target rural
voters.
The
work
of
research
professor
Michael
Traugott,
for example, could be applied
to ultimately help prevent the
effects of social media and news
content on election outcomes.
Traugott and his team — in
collaboration with Gallup and
Georgetown University — have
been collecting data to examine
political communication in the
2016
presidential
campaign.
By using computer software
to search for key topics and
sentiment
in
open-ended
question
responses
from
thousands
of
participants
throughout
the
election,
researchers
can
check
the
representativeness on people’s
opinions related to what they’ve
seen on social media in terms of
news.
A
computerized
content
analysis of nine major U.S.
newspapers
can
also
be
compared
with
attitudes
and content in the tweets of
journalists, another indicator of
how news content is related to
citizen sentiment later.
“I have been surprised by the
emphasis on personality at the very
earliest stage of the campaign and
the disproportionate amount of
coverage that (Trump) received,”
Traugott said. “We hope to be able
to track the news content sentiment
against the favorability measures,
as well as the topics mentioned in
the open-ended (questions with
participants).”
Traugott will use this data
to establish whether traditional
framing and agenda-setting themes
are still present in contemporary
reporting, but in a social media
environment.
Because of the possibilities
big data can hold in a politically
charged and competitive social
sphere, The Washington Post
has called for further privacy
considerations and tech policy
amid the endless possibilities big
data presents. It also noted that
an inherent bias in data’s original
collection — no matter how big
or small — will continue to have
consequences, resulting in what
some refer to as a dangerous and
ominous future.
Concerns
for
privacy
and
confidentiality
are
other
unexpected
hurdles
University
researchers grapple with daily
— transportation data, medical
records
and
social
science
information are just three avenues
through which data can come that
need to be secure.
“Someone has to put in place
measures, software systems that
allow one to do research and
developments in this space while
at the same time, guaranteeing
the privacy of the person who gave
rights to the data in the first place,”
Michielssen said.
Fear of the unknown is another
complication, Athey said, as the
dearth of research in big data’s
abilities
makes
future
uses
frighteningly ambiguous.
“Data science could be used
to help society or frankly, could
be used to take advantage of
different groups within society or
manipulate things,” Athey said.
“Those that have the methodology
and the computing and the access to
the data have a distinct advantage
on everybody. That could be used
for good or for bad.”
Being data-illiterate comes with
consequences, he explained.
“If you’re uneducated about
data science and if you don’t know
how to practice it, you would be a
victim of it,” Athey said. “This is
the society that we’re living in —
Google and all these things — are
not going to go away.”
Likewise, Smith noted an issue
of ethics clouding global data
science.
DATA
From Page 1A
The Michigan Daily — michigandaily.com
Tuesday, September 5, 2017 — 3A
Read more at
MichiganDaily.com