Words Over Time / word page

data

A word that turns facts into infrastructure.

Given facts / collected records / processing systems / platform traces / training material / contested ground.

ngram
1630-2022
index
1630-2026
panels
2
stems
32

01

02

03

04

entry note

Data begins as something given: facts, observations, premises for argument. It becomes something collected, stored, processed, mined, and used to train systems. This page traces that turn through four charts: a historical index, a platform-era social acceleration, a grammatical shift, and a map of contested pressures.

01 / historical index

A Historical Index of Data

Data has always been an infrastructural term. This chart reads that fact through a dual-panel timeline: long formation above, contemporary acceleration below. The split keeps recent density visible without letting it swallow four hundred years of systematic thinking about facts, evidence, and counted things.

Chart 1 / historical index

This index traces how data moves from given facts into systems of collection, processing, storage, governance, and training. The upper panel shows long formation; the lower panel expands the contemporary period where terms and relations accelerate.

display basis

2 panels / 32 stems

Time spacing is density-weighted; frequency remains background context.

Panel A: Long FormationData develops from lexical attestation and given facts into statistical, administrative, machine-processable, and early database-era systems.NGRAM BEGINS 1800PRE-CORPUS / DICTIONARY ATTESTATION ONLYPANEL ALong Formation1630-2005GIVEN1800-1900STATISTICAL1900-1950PROCESSING / COMPUTING1950-1980DATABASE / NETWORK1980-2005EVIDENCETRANSITIONSYSTEMSANALYSIS1630180019001950198020002005PRINTED-BOOK VISIBILITYdata1630 / DICTIONARY ATTESTATIONdatum1646 / LEXICAL ORIGINstatisticaldataEVIDENCE / SCIENCEdata collectionADMINISTRATION / PROCEDUREdata processingCOMPUTINGdata analysisANALYSIS / SCIENCEPROVISIONALdatabaseSYSTEMS / STORAGEmetadataSYSTEMS / CLASSIFICATIONpersonal dataPRIVACY / GOVERNANCEdata miningCOMMERCIAL ANALYTICSFUNCTIONEVIDENCEfacts,measurements,observations,empiricalRECORDScollected andorganizedinstitutionalmaterialPROCESSINGmachine-readableinput and outputSTORAGEstructured,retrievable,managed contentANALYSISmaterial forinterpretation,modelling, andinsightGOVERNANCEprotected,regulated,risky, orrights-bearing

panel a ends

2005

Around 2005, data moves from database-era infrastructure into web-scale, platform, governance, and AI contexts.

panel b begins

2005

Panel B: Contemporary ExpansionThe recent scale expands data into platform traces, governance objects, commercial analytics, decision logic, and AI training material.PANEL BContemporary Expansion2005-2026 / PRESENTPLATFORM / ANALYTICS2005-2012GOVERNANCE / AI2012-2026DECISIONPLATFORMANALYSISGOVERNANCEETHICS / RISKAI / TRAINING200520122016202020242026PRINTED-BOOK VISIBILITYdataHEADWORD / HINGEbig dataSCALE / ANALYTICSopen dataPUBLIC INFRASTRUCTUREpersonal dataPRIVACY / GOVERNANCEuser dataPLATFORM / TRACEdata protectionGOVERNANCE / LEGALdataportabilityRIGHTS / MOBILITYdata privacyPRIVACY / GOVERNANCEdata breachRISK / SECURITYdata scienceANALYSIS / PROFESSIONdata-drivenDECISION LOGICdata economyECONOMIC / COMMODITYdata pipelineINFRASTRUCTURE / OPSdata lakeARCHITECTURE / STORAGEdata ethicsETHICS / ACCOUNTABILITYdatasovereigntyGEOPOLITICAL / JURISDICTIONinference dataAI / OUTPUTPROVISIONALdata poisoningINTEGRITY / ATTACKdata annotationLABELLING / LABORtraining dataAI / MLdata provenanceORIGIN / ACCOUNTABILITYsynthetic dataAI / GENERATIONFUNCTIONPLATFORMuser traces, logs,interfaces,platform exhaustANALYSISmining, science,modelling,large-scaleinterpretationGOVERNANCEprivacy,protection, breach,regulationDECISIONdata asorganizational orautomated actionlogicTRAININGmodel input,learning material,synthetic examples

how to read

Each panel uses a density-weighted time scale, lane system, and function band. Stems connect terms to functions; hover a term or arc to read the relation paths.

caveat

Frequency and visibility are based on printed-language and curated lexical evidence. The index shows historical relations, not strict causality.

02 / socialized generation

The Generation That Socialized Data

Data did not become social through AI alone. Before generative systems made data newly visible, a platform generation had already turned data into traces, profiles, public resources, private risks, and governed objects. This chart reads that generation at two scales: an outline from the 1990s to the 2020s, and an inner acceleration core from 2003 to 2013, the compressed decade in which nearly everything now permanent about social data began.

Chart 2 / semantic socialization

Data did not become social through AI alone. Before generative systems made data newly visible, a platform generation had already turned data into traces, profiles, public resources, private risks, and governed objects.

plate basis

2 panels / 18 terms / 7 anchors

2 AI tail terms stay on the outer rim, not the overlap origin.

OUTLINE / 1990s-2020sThe larger generation in which data became networked, personal, public, commercial, and governable.CHART 2 / AOUTLINE / 1990s-2020sGENERATION SCANTRACE & CIRCULATIONIDENTITY, RIGHTS & CONTROLOVERLAP: TRACES ATTACH TO PEOPLE, VALUE, INSTITUTIONS, RIGHTS01 / TRACEmetadata02 / TRACEopen data03 / OVERLAPuser data04 / TRACEclickstreamdata05 / TRACEsearch data06 / OVERLAPpersonaldata07 / CONTROLdataprotection08 / CONTROLdata privacy09 / CONTROLdata breach10 / CONTROLdatagovernance11 / CONTROLdata broker12 / OVERLAPdata mining13 / OVERLAPbig data14 / OVERLAPdata science15 / OVERLAPdata-driven16 / TRACEdataset17 / AItrainingdata18 / AIsyntheticdata
INNER / 2003-2013 - Frequency RingNgram printed-book frequency per million averaged over 2005-2012, grouped by semantic zone.CHART 2 / BInner Core2003-2013 / FREQ REGISTER2005-2012 / LOUDNESS BY ZONEARC LENGTH = SHARE OF ZONE FREQUENCYdata mining2.881personal data2.167data-driven0.5100.326dataset4.575metadata3.926data protection1.125data privacy0.1420.045training data0.948synthetic data0.1582003-2013AVG PRINTED-BOOK FREQ PER MILLION - GOOGLE NGRAM EN - 2005-2012

hover any term

Arc length shows each term's share of its zone's frequency in print, 2005-2012. Ring thickness reflects how loud each zone was on average.

dataset and metadata dominated print in this decade. big data and data science were nearly absent.

semantic zones

Trace & Circulation

web traces / metadata / open data / analytics

Identity, Rights & Control

personal data / privacy / breach / governance

AI Amplification Tail

training data / synthetic data as later edge

register basis

Panel B is a frequency register, not a second Venn field: it compresses the 2005-2012 core by printed-book loudness inside each semantic zone.

reading tension

The quiet terms that dominate this decade show the socialization of data before the later, louder big-data and AI narratives.

how to read

The outline plate maps the semantic field with two overlapping circles. The inner ring registers 2005-2012 printed-book visibility by semantic zone, exposing which words were loud before the later big-data story.

caveat

Frequencies indicate printed-book visibility; relation paths are curated semantic links. The chart identifies a plausible generation of socialization, not a single causal origin.

03 / grammatical route

From Datum to Data

This chart follows a grammatical route. From datum as singular item to data as plural form, and from data are to data is, it traces how a language of countable facts became a language of mass infrastructure. The grammatical shift is not incidental: it records what the infrastructural turn felt like from the inside of language.

Chart 3 / grammatical route

From datum as a singular given to data as a plural form, then into two coexisting routes: formal plural evidence and singular or mass infrastructure.

route basis

2 rails / 12 nodes / 8 major

Coordinates are curated; the route is conceptual, not a calendar axis.

CHART 3 / GRAMMATICAL ROUTEFrom Datum to DataROUTE MAP / NOT A TIMELINEROUTE IS CONCEPTUAL / NOT TO SCALESEE EVIDENCE STRIP BELOW FOR FREQUENCY DATASINGULAR ORIGINGRAMMATICAL FORKINFRASTRUCTURAL BUILDCOEXISTENCE / TAIL~2020PLURAL / EVIDENTIARY ROUTESINGULAR / INFRASTRUCTURAL ROUTEdatumONE GIVEN ITEMdataPLURAL FORM, LATER MASSBASEthese dataEXPLICIT PLURAL FRAMINGdata areFORMAL PLURALCONSTRUCTIONstatistical dataCOUNTED OR MEASUREDFACTSempirical dataOBSERVED EVIDENCEdata processingAGGREGATION AND MACHINEHANDLINGdatabaseSTORED AND STRUCTUREDMATERIALthis dataSINGULAR OR MASSDEMONSTRATIVEdata isSINGULAR OR MASSCONSTRUCTIONdata as resourceMATERIAL FOR SYSTEMS ANDREUSEtraining dataLATER MODEL-INPUTEXTENSIONCOEXISTENCE ONGOINGCENTRAL FORKData is the shared base: historicalplural form and later mass-nounINFRASTRUCTURAL MEDIATIONProcessing and databases make dataeasier to treat as aggregatedPRINTED-BOOK SIGNALData is overtakes data are around 2020in this signal; plural usage remainsPERSISTENCE / EXPANSIONThe route ends as coexistence: pluralpersistence, singular expansion.
EVIDENCE STRIP / PRINTED-BOOK VISIBILITY5101520FREQ / M1950198020102022~2020DATA AREDATA ISTHESE DATATHIS DATA

Read left to right - upper rail: plural evidence / lower rail: singular infrastructure

hover nodes or arcs

Caveat: This chart reads a usage shift, not a rule change. Printed-book frequencies indicate visibility and remain genre-sensitive.

once the grammar shifted, so did the stakes.

04 / cross-pressures

The Cross-Pressures of Data

Modern data is not pulled in only one direction. It can be attached to persons, bounded by control, mobilized as scientific evidence, and judged through ethical responsibility. These are not competing errors about what data really is. They are simultaneous functions that the word now carries, and this chart maps them as a field rather than a hierarchy.

PERSONALPRIVATE / CONTROLSCIENCE / EVIDENCEETHICS / GOVERNANCEDATA ASCONTESTED MATERIALdata ethicspersonal datastatistical datasensitive datadata protectionscientific datadata breachdatasetdata setPERSONALData attached to persons,users, identities, andPRIVATE / CONTROLData under protection,restriction, accessSCIENCE / EVIDENCEData as observation,measurement, researchETHICS / GOVERNANCEData as responsibility,accountability, provenance,HISTORICAL ANCHORS1973-74fair information1980-81personal data2002-03breach notification2008-10PII2016-18GDPR2020sgovernanceREADING KEYx/y = semantic positionz = corpus frequency liftGEOMETRIC NOTE - strict ellipsoid projection.12 measured; 19 policy/manual.
Semantic positions are curated. Quantitative marks appear only where corpus evidence is robust; many legal, policy, and ethics terms require other forms of evidence.

synthesis

The pressures mapped in this final chart do not resolve; that is the point. A word once used to record given facts now carries incompatible claims: personal attachment, institutional restriction, scientific mobilisation, ethical accountability. No single domain stabilises it. The cross-pressures field maps where the word now lives, and how thoroughly the three earlier histories have altered what it means to use it.

search summary / quick read

About data/

Data is followed from given facts and counted observations into social traces, infrastructure, governance objects, and AI-era material.

This public page is the canonical entry for the data word study. For source boundaries, copyright notes, and the raw-data publication policy, use the methodology and rights page.

Methodology and rights