Understanding the characteristics of big data is essential for designing effective architectures, policies, and analytical processes. This article systematically explains the 10 characteristics (often referred to as the 10Vs), their implications for technology, examples of industry applications, and practical recommendations to help organizations extract value from large-scale data. The discussion is tailored for professional readers: data practitioners, IT architects, and decision-makers.
What are the "characteristics of big data"?
Characteristics of big data is a set of attributes that distinguish large — scale data from traditional data-beyond the mere “multiplicity” of data. These attributes include technical aspects (eg. processing speed), quality (accuracy and relevance), security (privacy & compliance), to the business side (value generated). Establishing clear characteristics helps choose the right technology, process pipeline, and governance policies.
Evolution of the concept from 3V to 10V
Initially, the concept of big data was defined by the 3V: Volume, Variety, dan Velocity. As business practices and needs have evolved, this model has been expanded to include Veracity, Value, Visualization, Validity, Volatility, Variability, and Vulnerability—often referred to as the 10V. This expansion reflects the fact that big data solutions must address not only the quantity and format of data, but also its quality, security, temporal relevance, and the ability to extract value.
10 characteristics of Big Data and examples
Below, each V is provided with a brief definition, technical/operational implications, and practical examples.
1. Volume
Definition: The size or amount of data that must be stored and processed.
Implications: Requires distributed storage (data lake, object storage), a compression strategy, and an architecture that supports scale-out.
Examples: Transaction data, clickstream data, and log data amounting to terabytes or petabytes per day.
2. Velocity
Definition: The rate at which data is generated, sent, and needs to be processed (real-time vs. batch).
Implications: Drives the need for stream processing, low-latency ingestion (e.g., Kafka, Flink), and pipeline designs that support both streaming and batch processing.
Examples: IoT and clickstream sensor Data that must be analyzed on the spot.
3. Variety
Definition: Diversity of data formats: structured (table), semi-structured (JSON/XML), and unstructured (text, images, audio, video).
Implications: Requires tools that support multiple formats (NoSQL, object stores) and flexible ETL/ELT processes.
Examples: Combined transactions, server logs, customer reviews, and product images.
4. Value
Definition: Business value that can be extracted from data through analytics and models.
Implications: Focus on clear use-cases and ROI metrics; not all data should be stored aimlessly.
Examples: Definition: The business value that can be derived from data through analytics and models.
Implications: Focus on clear use cases and ROI metrics; not all data needs to be stored without a purpose.
Example: Recommendation models that increase conversion rates and customer lifetime value.
5. Veracity
Definition: Reliability, accuracy, and noise/bias levels in the data.
Implications: Diperlukan data quality frameworks, pembersihan data, verifikasi sumber, dan metadata yang kuat.
Examples: Sensor Data that has outliers or medical records with incomplete entries.
6. Validity
Definition: Data conformance to analytic definitions and needs (whether the data is valid for use).
Implications: Schema validation, aturan business logic, serta testing model untuk memastikan data relevan.
Examples: Demographic Data that must meet the format and scope of definitions defined for campaign analytics.
7. Variability
Definition: Contextual changes and semantic variations in data (value/format inconsistency between times).
Implications: The pipeline must be able to adapt to schema changes; monitoring is required to detect drift or inconsistencies.
Examples: Third-party API structure changes affecting field names / formats.
8. Volatility
Definition: How quickly does data lose its relevance (retention period and decay)?
Implications: Define retention, tiered storage, and aggregation policies for legacy data.
Examples: High-value realtime log Data is only a few days-after which it is aggregated or archived.
9. Visualization
Definition: The need to represent insights from large datasets so that they can be understood by business users.
Implications: Investing in scalable dashboarding and visualization, as well as data storytelling.
Examples: A real-time operations dashboard for monitoring SLAs or marketing campaign performance.
10. Vulnerability
Definition: Security, privacy and compliance risks inherent to data (leaks, unauthorized access).
Implications: Encryption, Role-Based Access Management, Data masking, audit trails, and regulatory compliance (eg. GDPR, PDPL).
Examples: Patient medical records and financial data that require extra protection. Multimedia Nusantara University.
Example of a simple case study of the application of 10V
- E-commerce: Volume transaction & clickstream, Velocity real-time recommendations, Value increased through personalization.
- Health: Veracity (quality of medical records) and Vulnerability patient privacy is a priority.
- Telecom & IoT: Velocity, Variability, and Volatility on sensor data and logs.
- Finance: Validity and Vulnerability essential for fraud detection and compliance.
Implications for Architecture & Technology
The 10V characteristics require a holistic architecture: a data lake to address volume and variety; a combination of batch & stream processing processing to address velocity and variability; metadata and data governance to address veracity and validity; and encryption, access control, and auditing to address vulnerability. Technology selection must align with the V priorities of the use case (e.g., real-time analytics prioritizes latency and throughput).
Common challenges and practical recommendations for implementing Big Data
Challenges: data quality, infrastructure costs, heterogeneous source integration, skilled HR shortage, and regulatory compliance.
Brief recommendations:
- Start from a clear use-case (Value first).
- Prioritize the most impactful V for your business.
- Build a modular pipeline (ingest → storage → processing → serving).
- Implement data governance, monitoring, and automated testing.
- Measure the ROI of each data Initiative.
Conclusion
Mapping the characteristics of big data (10V) provides a practical framework for defining the technologies, processes, and policies needed. By understanding each V - from Volume and Velocity to Vulnerability—organizations can design pipelines that are efficient, secure, and focused on Business Value.
Want to see how the platform can help manage these aspects? Audithink provides integrated solutions for data, governance, and analytics pipelines designed to meet 10V challenges.
Try Audithink free demo now to see how our platform helps turn big data into actionable business decisions.



