Dataset Profiler

CSV input

Dataset summary

Rows

7 (data: 6)

Columns

7

Missing (overall)

2.4%

Duplicate rows

0

Key duplicates

0 set keys in Options

UTF-8 BOM

none

Column profiles

Column

Type

Non-null

Missing

Unique

Highlights

id

integer

6

0

6

min 1, p50 3.50, max 6, outliers 0

name

text

6

0

6

len 4~8 (avg 5.3)

age

integer

6

0

5

min 27, p50 29.50, max 31, outliers 0

city

text

6

0

4

len 6~13 (avg 8.2)

sales

float

5

1

5

min 75, p50 120.50, max 200, outliers 0

joined_at

datetime

6

0

5

sample 2023-01-03

active

boolean

6

0

2

sample true

JSON report

{
  "dataset": {
    "rows": 7,
    "dataRows": 6,
    "cols": 7,
    "missingRatio": 0.023809523809523808,
    "duplicateRows": 0,
    "bom": false
  },
  "columns": [
    {
      "name": "id",
      "index": 0,
      "type": "integer",
      "nonNull": 6,
      "missing": 0,
      "unique": 6,
      "sample": "1",
      "isConstant": false,
      "isIDLike": false,
      "numeric": {
        "count": 6,
        "min": 1,
        "max": 6,
        "mean": 3.5,
        "std": 1.707825127659933,
        "p05": 1.25,
        "p25": 2.25,
        "p50": 3.5,
        "p75": 4.75,
        "p95": 5.75,
        "outliers": 0
      },
      "warnings": []
    },
    {
      "name": "name",
      "index": 1,
      "type": "text",
      "nonNull": 6,
      "missing": 0,
      "unique": 6,
      "sample": "Alice",
      "isConstant": false,
      "isIDLike": false,
      "text": {
        "lenMin": 4,
        "lenAvg": 5.333333333333333,
        "lenMax": 8
      },
      "warnings": []
    },
    {
      "name": "age",
      "index": 2,
      "type": "integer",
      "nonNull": 6,
      "missing": 0,
      "unique": 5,
      "sample": "30",
      "isConstant": false,
      "isIDLike": false,
      "numeric": {
        "count": 6,
        "min": 27,
        "max": 31,
        "mean": 29.166666666666668,
        "std": 1.343709624716425,
        "p05": 27.25,
        "p25": 28.25,
        "p50": 29.5,
        "p75": 30,
        "p95": 30.75,
        "outliers": 0
      },
      "warnings": []
    },
    {
      "name": "city",
      "index": 3,
      "type": "text",
      "nonNull": 6,
      "missing": 0,
      "unique": 4,
      "sample": "Seattle",
      "isConstant": false,
      "isIDLike": false,
      "text": {
        "lenMin": 6,
        "lenAvg": 8.166666666666666,
        "lenMax": 13
      },
      "warnings": []
    },
    {
      "name": "sales",
      "index": 4,
      "type": "float",
      "nonNull": 5,
      "missing": 1,
      "unique": 5,
      "sample": "120.5",
      "isConstant": false,
      "isIDLike": false,
      "numeric": {
        "count": 5,
        "min": 75,
        "max": 200,
        "mean": 127.12,
        "std": 44.698608479459395,
        "p05": 77.98,
        "p25": 89.9,
        "p50": 120.5,
        "p75": 150.2,
        "p95": 190.04,
        "outliers": 0
      },
      "warnings": []
    },
    {
      "name": "joined_at",
      "index": 5,
      "type": "datetime",
      "nonNull": 6,
      "missing": 0,
      "unique": 5,
      "sample": "2023-01-03",
      "isConstant": false,
      "isIDLike": false,
      "warnings": []
    },
    {
      "name": "active",
      "index": 6,
      "type": "boolean",
      "nonNull": 6,
      "missing": 0,
      "unique": 2,
      "sample": "true",
      "isConstant": false,
      "isIDLike": false,
      "warnings": []
    }
  ]
}

Options

Parsing

Header row

Skip empty lines

Trim whitespace

Report

Keys

Case-insensitive keys

Provide comma-separated key columns to check duplicate keys.

About Dataset Profiler

Dataset Profiler scans your CSV and builds a structured overview: dataset-level stats, per-column type inference, completeness, uniqueness, distributions, and data-quality warnings. Use it to quickly understand data shape and decide the next cleaning or analysis steps.

What you’ll see

Dataset: rows/columns, overall missing ratio, duplicate rows, BOM hint.
Columns: inferred type (integer/float/boolean/datetime/categorical/text), missing rate, unique count.
Numeric: min/max/mean/std and p05/p25/p50/p75/p95 with IQR-based outlier count.
Categorical: Top-N values with frequencies and high-cardinality warning.
Text: length distribution and simple pattern hints (URL/email-like).

Tips

Set key columns to detect duplicate keys before joins or merges.
Tune the categorical rule (by unique ratio or absolute count) for your dataset size.
Use the JSON report for downstream automation or to pipe into other tools.

Lookup

Encode & Decode

Conversion

Calculator

Generators

Data Tools

Text Processing

Media

Development

Other

Dataset Profiler

About Dataset Profiler

What you’ll see

Tips