Las bases de datos son bloat, es mejor usar un archivo texto plano y usar las herramientas de Unix para buscar informacion dentro de ellos.


Aquí tenés una guía práctica y realista para usar un archivo de texto plano como “base de datos” para un colegio, usando solo grep, awk, sed, sort, cut y herramientas Unix. La clave es formato bien definido, un registro por línea y campos separados de forma consistente.


1. Elegir un formato simple, estable y fácil de parsear

El mejor formato para “base de datos en texto plano” usando Unix es:

Formato: TSV (tab-separated values, separado por TAB)

Razones:

  • No choca con comas dentro del texto.
  • awk -F '\t' lo lee perfecto.
  • grep funciona igual.
  • Fácil de editar en Vim.

2. Estructura universal para cualquier tipo de registro

Usá un “schema” fijo:

TYPE    ID      DATE        FIELD1      FIELD2     FIELD3     EXTRA

Campos recomendados:

  • TYPE: el tipo de dato → DOCENTE, ALUMNO, ASISTENCIA, LICENCIA, AMON, TUTOR, etc.
  • ID: identificador único → DNI, legajo o código.
  • DATE: formato estándar → 2025-03-04
  • FIELD1..FIELDn: depende del tipo.
  • EXTRA: texto libre opcional.

3. Ejemplos reales

Docentes

DOCENTE  20300300    2025-03-01  ingreso     Juan Perez     Matemática
DOCENTE  20300300    2025-03-15  licencia    enfermedad     2-dias

Alumnos

ALUMNO   55022110    2025-03-01  ingreso     Lopez Maria    2B
TUTOR    55022110    2025-03-01  tutor       Carlos Lopez   padre

Asistencias

ASIST    55022110    2025-03-04  presente    2B
ASIST    55022110    2025-03-05  ausente     2B

Amonestaciones

AMON     55022110    2025-03-05  leve        celular        profesorX
AMON     55022110    2025-03-10  grave       conducta       preceptorY

4. Cómo consultar rápido con Unix

■ Todos los registros de un alumno:

grep -P "\t55022110\t" base2025.tsv

■ Solo asistencias:

grep "^ASIST" base2025.tsv

■ Asistencias de un alumno:

awk -F '\t' '$1=="ASIST" && $2=="55022110"' base2025.tsv

■ Cantidad de ausencias:

awk -F '\t' '$1=="ASIST" && $2=="55022110" && $4=="ausente" {c++} END {print c}' base2025.tsv

■ Sanciones de un curso:

awk -F '\t' '$1=="AMON" && $6=="2B"' base2025.tsv

5. Cómo agregar un registro desde Bash

Hacete un script:

add.sh

#!/bin/sh
echo -e "$@" >> base2025.tsv

Uso:

./add.sh "ASIST\t55022110\t2025-03-06\tpresente\t2B"

6. Cómo validar datos

Podés agregar checks con awk:

awk -F '\t' 'NF < 4 { print "Registro inválido:", $0}' base2025.tsv

7. Organización por año

Te recomiendo:

/datos-colegio/
    base2025.tsv
    base2026.tsv
    base2027.tsv

Un archivo grande por año es perfecto.


8. Opcional: separar por “tablas virtuales”

Si querés dividir un poco:

docentes2025.tsv
alumnos2025.tsv
asistencias2025.tsv
amonestaciones2025.tsv
tutores2025.tsv

Pero tu idea de un solo archivo es totalmente válida — solo necesitás el campo TYPE.


⚡ ¿Querés que te genere un schema completo y lista de comandos útiles para cada tipo de dato?

Puedo armarte un “manual del sistema” con todo predefinido.


You can absolutely use a plain text file as a database in Unix. This is classic Unix philosophy: “Text is the universal interface.” Below are practical patterns depending on how you want to store and query data — all without jq, without SQLite, without JSON — just text + grep/awk/sed.


✅ 1. Choose a Format (custom plain text)

The simplest and most robust flat-file database formats are:

A) KEY=VALUE (one record per line)

id=1 name="John Doe" age=30
id=2 name="Alice" age=25

B) CSV (comma separated)

1,John Doe,30
2,Alice,25

C) Pipe/Tab separated

1|John Doe|30
2|Alice|25

D) One file per record (directory database)

db/
  1.txt
  2.txt

Each file contains arbitrary text.


✅ 2. How to Query the “Database”

🔍 Search rows by field

KEY=VALUE format

Find all entries where name contains “Alice”:

grep 'name="Alice"' db.txt

Find id=2:

grep '^id=2 ' db.txt

CSV example

Find records whose first column is “2”:

awk -F, '$1 == 2' db.csv

🔄 3. Insert new record

Append:

echo 'id=3 name="Bob" age=50' >> db.txt

✏ 4. Update a record

Replace age for id=1:

sed -i 's/^id=1 .*/id=1 name="John Doe" age=31/' db.txt

❌ 5. Delete a record

Delete id=2:

sed -i '/^id=2 /d' db.txt

🔥 6. Example Flat-File Database Implementation

db.sh

#!/bin/sh

DB="db.txt"

add() {
    echo "id=$1 name=\"$2\" age=$3" >> "$DB"
}

get() {
    grep "^id=$1 " "$DB"
}

delete() {
    sed -i "/^id=$1 /d" "$DB"
}

update_age() {
    sed -i "s/^id=$1 .*/id=$1 name=\"$2\" age=$3/" "$DB"
}

Usage:

./db.sh add 1 "John Doe" 30
./db.sh add 2 "Alice" 25
./db.sh get 2
./db.sh update_age 1 "John Doe" 31
./db.sh delete 2

📂 7. Alternative: directory per record

db/
  1
  2

Create:

echo "name=Alice" > db/2
echo "age=25" >> db/2

Search:

grep -R "Alice" db/

⭐ When is this useful?

✔ shell scripts ✔ embedded devices ✔ tiny systems ✔ config management ✔ low data volume (<50k records) ✔ no dependencies ✔ grep/awk/sed are extremely fast


⚠ When it falls apart

❌ hundreds of thousands of rows ❌ concurrent writes ❌ complex queries (joins, ordering, types)


If you want, I can:

  • design a full mini-database system in shell
  • give you CRUD helpers
  • make an “SQL-like” interface in bash
  • design indexing for fast lookups (using grep caches)

Just tell me your ideal workflow.


Nice — you want a practical, file-based “database” using only Unix tools. Below is a focused, pragmatic guide + ready-to-run shell toolkit you can copy-paste and use. It covers data layout choices, common operations (CRUD + queries), performance & safety tips, and a simple script (plaindb.sh) that implements insert/find/update/delete on newline-delimited JSON (NDJSON) and on delimited CSV/TSV. I thought deeper about scale, concurrency, indexing, and schema evolution so you don’t get burned later.

Why plain files (quick pros/cons)

Pros

  • Simple, transparent, easy to version with git.
  • No DB server, low dependency surface.
  • Fast for many read-heavy workloads using grep/rg/awk/jq. Cons
  • No transactions unless you implement locking/atomic writes.
  • Harder to scale and do complex joins/queries compared to a DB.
  • Requires careful handling for concurrency, schema evolution, and backups.

Recommended file formats

  1. NDJSON (newline-delimited JSON) — each line is a JSON object. Best for structured records, works great with jq.

    • Example: users.ndjson
    • {"id":1,"name":"A","email":"a@x.com"}
  2. CSV/TSV — great for tabular data and fast parsing with awk, cut.

    • Use \t (TSV) to avoid comma-escaping headaches.
  3. Flat key-valuekey:value per line for tiny lookup tables.

  4. Inverted index files — precomputed index files to support fast full-text lookups.

Tools to use (short)

  • grep / rg (ripgrep) — text search
  • awk — field processing, aggregations, joins-ish
  • sed — in-place edits / transforms
  • jq — JSON querying and mutation
  • cut, sort, uniq, join — selection and joins on sorted files
  • flock — file locking for safe concurrent writes
  • mv/cp technique — atomic replace (mv tmp file)
  • gzip/xz — compress cold data
  • git — version your files for history & lightweight rollback
  • fzf — interactive selection

Patterns & examples

1) NDJSON: append, find, update, delete

File: users.ndjson

{"id":1,"name":"Alice","email":"alice@example.com"}
{"id":2,"name":"Bob","email":"bob@example.com"}

Insert (append safely)

# create a new record and append atomically
record='{"id":3,"name":"Carol","email":"carol@example.com"}'
printf '%s\n' "$record" >> users.ndjson
# better: use flock for multi-writer safety (see script below)

Find records

  • Full-text with grep:
grep -i 'alice' users.ndjson
  • Field-level with jq:
jq -c 'select(.email=="alice@example.com")' users.ndjson

Select columns (project)

jq -r '.id, .name' users.ndjson    # prints each field on a new line (not ideal)
jq -r '. | [.id, .name] | @tsv' users.ndjson  # id<TAB>name

Update a record (idempotent pattern)

Can’t modify in-place reliably — create a new file then move:

jq 'if .id==2 then .email="bob@new.com" else . end' users.ndjson > users.ndjson.tmp
mv users.ndjson.tmp users.ndjson

If multiple writers exist, use flock to protect the critical section (script later).

Delete

jq 'select(.id != 2)' users.ndjson > users.ndjson.tmp && mv users.ndjson.tmp users.ndjson

2) CSV/TSV with awk

File: products.tsv (header: idnameprice)

id	name	price
1	Widget	12.50
2	Gizmo	9.99

Find where price > 10

awk -F'\t' 'NR==1{print; next} $3+0 > 10' products.tsv

Group by and count

Count products by name prefix:

awk -F'\t' 'NR>1 {prefix=substr($2,1,3); counts[prefix]++} END{for (k in counts) print k, counts[k]}' products.tsv | sort -k2 -n -r

Join two files (both sorted by key)

join requires files sorted on join field:

# assume customers.tsv and orders.tsv keyed by customer_id in column1
join -t$'\t' -1 1 -2 1 <(sort -k1,1 customers.tsv) <(sort -k1,1 orders.tsv)

3) Indexing for speed (simple inverted index)

If you have lots of text and grep becomes slow / you need targeted search, create an index mapping token → list of record IDs.

Example pipeline to build a basic inverted index (tokenized lowercased words):

# input: docs.ndjson each line {"id":123, "text":"..."}
jq -r '. | [.id, .text] | @tsv' docs.ndjson \
  | tr '[:upper:]' '[:lower:]' \
  | awk -F'\t' '{id=$1; text=$2; gsub(/[^a-z0-9]+/," ",text); split(text, a, " "); for(i in a) if(length(a[i])>1) print a[i] "\t" id}' \
  | sort -k1,1 -u \
  | awk -F'\t' '{word=$1; id=$2; ids[word]=ids[word]?ids[word] "," id : id} END{for (w in ids) print w "\t" ids[w]}' \
  > index.tsv

Lookup:

grep -P '^keyword\t' index.tsv
# parse list of ids and then pull records from ndjson using jq or grep -Ff

4) Transactions & concurrency (flock + atomic move)

Example pattern for safe writes:

(
  flock -x 200 || exit 1
  # make changes in a tmp file
  jq '...update...' users.ndjson > users.ndjson.tmp
  mv users.ndjson.tmp users.ndjson
) 200>users.ndjson.lock
  • flock -x grabs exclusive lock on file descriptor 200.
  • Always write to a .tmp and mv to replace atomically.

5) Backups & snapshots

  • Periodic snapshots: cp users.ndjson users.$(date +%Y%m%d%H%M).ndjson
  • Lightweight: commit changes to git (git add *.ndjson && git commit -m "snapshot").
  • For big files, use incremental rsync to remote.

6) Validation & schema evolution

  • Keep a small schema file schema.json that documents required fields and types.
  • Validate new records with jq filter before appending:
jq -e '(.id|type=="number") and (.email|test("@"))' <<<"$candidate"
  • For migration: write a one-shot jq or awk migration script that produces a new file, test it, then replace.

Ready-to-use toolkit: plaindb.sh

Copy this script and put it in your $PATH (chmod +x plaindb.sh). It implements simple CRUD on an NDJSON file with locking.

#!/usr/bin/env bash
# plaindb.sh - minimal NDJSON "db" operations: insert, find, update, delete, list
# Usage: plaindb.sh <dbfile> <cmd> [args...]
# Requires: jq, flock, mktemp

DB="$1"; shift
CMD="$1"; shift

LOCK="${DB}.lock"

usage(){ cat <<EOF
Usage: $0 <dbfile> <cmd> [args...]
Commands:
  insert  <json>                 Append a JSON record (string)
  find    <jq-filter>            Print matching records (jq filter)
  list                           Print all records
  update  '<jq-update>'          Apply jq program to all records (e.g. 'if .id==2 then .name="X" else . end')
  delete  '<jq-condition>'       Delete records matching condition (jq select; e.g. '.id==2')
EOF
}

if [ -z "$DB" ] || [ -z "$CMD" ]; then usage; exit 1; fi
mkdir -p "$(dirname "$DB")" 2>/dev/null || true
: "${DB:=db.ndjson}"

with_lock() {
  # usage: with_lock <command...>
  exec 200>"$LOCK"
  flock -x 200
  "$@"
  flock -u 200
}

case "$CMD" in
  insert)
    json="$1"
    if ! echo "$json" | jq -e . >/dev/null 2>&1; then
      echo "invalid json" >&2; exit 2
    fi
    # append under lock
    with_lock bash -c "printf '%s\n' \"$json\" >> '$DB'"
    ;;
  find)
    filter="$1"
    if [ -z "$filter" ]; then filter='.'; fi
    # safe read, no write locking necessary
    jq -c "$filter" "$DB"
    ;;
  list)
    jq -c '.' "$DB"
    ;;
  update)
    prog="$1"
    tmp="$(mktemp "${DB}.tmp.XXXX")"
    with_lock bash -c "jq -c '$prog' '$DB' > '$tmp' && mv '$tmp' '$DB'"
    ;;
  delete)
    cond="$1"
    tmp="$(mktemp "${DB}.tmp.XXXX")"
    with_lock bash -c "jq -c \"select( ( $cond ) | not )\" '$DB' > '$tmp' && mv '$tmp' '$DB'"
    ;;
  *)
    usage; exit 1
esac

Examples:

# insert
./plaindb.sh users.ndjson insert '{"id":1,"name":"Alice"}'
# find by id
./plaindb.sh users.ndjson find 'select(.id==1)'
# update
./plaindb.sh users.ndjson update 'if .id==1 then .email="a@x.com" else . end'
# delete
./plaindb.sh users.ndjson delete '.id==1'

Advanced tips (thinking deeper)

  • Search performance: rg (ripgrep) is much faster than grep. For enormous files, split data by sharding keys (year, prefix), or maintain indexes.
  • Atomic counters: maintain a small file next_id and flock it to safely increment IDs.
  • Multi-field joins: sort files on join keys and use join. For complex joins, it’s often simpler to load small sets into memory with awk associative arrays.
  • Partial reads: store large blobs separately (e.g., blobs/<id>.dat) and keep metadata in the main NDJSON.
  • Full-text search: if you need advanced text queries (stemming, ranking), either maintain an index or use sqlite FTS (that’s a DB—but it’s a single file).
  • Monitoring & metrics: keep a metrics.log file with line-oriented events (timestamped) and roll it daily.
  • Immutable append-only log: prefer append-only files and periodic compaction—this makes crash recovery trivial. Compaction script reads file, removes tombstones, writes compacted file, swaps atomically.
  • Encryption: use gpg --encrypt for at-rest, gpg --decrypt when reading; or use age.
  • Testing & CI: add schema tests that jq-validate every new commit to the repo.

Quick cheatsheet

  • grep -i foo file — fast text search
  • jq -c 'select(.field=="x")' file — filter NDJSON
  • awk -F'\t' '$3>10' file.tsv — numeric filter on column 3
  • sort -k1,1 file | join -t$'\t' -1 1 -2 1 file2 — join on column1
  • flock -x 200; mv tmp file; — atomic replacement with lock
  • git add *.ndjson && git commit -m "snapshot" — snapshot your DB state

If you want, I can:

  • produce a slightly fancier CLI (with getopts, --where, --select) to make ad-hoc queries easier,
  • show how to build a tiny inverted index for full-text search and a lookup tool that returns top-N results,
  • or tailor the scripts for CSV/TSV only (no jq).

Which of those should I give you next?


If you want to use jq as your “database engine,” the best possible format is:

✅ NDJSON (Newline-Delimited JSON)

One JSON object per line:

users.ndjson:

{"id":1, "name":"Alice", "email":"alice@example.com"}
{"id":2, "name":"Bob", "email":"bob@example.com"}
{"id":3, "name":"Carol", "email":"carol@example.com"}

No commas, no arrays, no brackets — just one JSON per line. This is the format used by big data tools (Elasticsearch, Firehose, Hadoop, etc.).


✅ 1. FIND (query with jq)

List all records:

jq -c . users.ndjson

Find by name:

jq -c 'select(.name == "Alice")' users.ndjson

Case-insensitive:

jq -c 'select(.name | ascii_downcase == "alice")' users.ndjson

Find by ID:

jq -c 'select(.id == 2)' users.ndjson

Search email ending in .com:

jq -c 'select(.email | endswith(".com"))' users.ndjson

✅ 2. INSERT (append)

Append a new object:

echo '{"id":4, "name":"David", "email":"david@example.com"}' >> users.ndjson

✅ 3. DELETE a record

Delete ID=3:

jq -c 'select(.id != 3)' users.ndjson \
  > tmp && mv tmp users.ndjson

Delete every Bob:

jq -c 'select(.name != "Bob")' users.ndjson \
  > tmp && mv tmp users.ndjson

✅ 4. UPDATE a record

Example: change Bob’s email:

jq -c '
  if .id == 2
  then .email = "bob@newmail.com"
  else .
  end
' users.ndjson > tmp && mv tmp users.ndjson

Update multiple fields:

jq -c '
  if .id == 3
  then .name = "Carolina" | .email = "caro@example.com"
  else .
  end
' users.ndjson > tmp && mv tmp users.ndjson

✅ 5. SORT the “database”

Sort by ID:

jq -c . users.ndjson | sort -n -t: -k2

Better way (jq loads fully, sorts, outputs NDJSON):

jq -c -s 'sort_by(.id)[]' users.ndjson > tmp && mv tmp users.ndjson

(-s = “slurp” = load entire file)

Sort by name:

jq -c -s 'sort_by(.name)[]' users.ndjson > tmp && mv tmp users.ndjson

✅ 6. CREATE a reusable CLI tool

jsondb.sh:

#!/bin/sh
DB="$1"
CMD="$2"
shift 2

case "$CMD" in
  find)
    jq -c "select($1)" "$DB"
    ;;

  insert)
    echo "$1" >> "$DB"
    ;;

  delete)
    jq -c "select(($1) | not)" "$DB" > tmp && mv tmp "$DB"
    ;;

  update)
    filter="$1"   # e.g. '.id == 2'
    update="$2"   # e.g. '.email = "x@y.com"'
    jq -c "
      if ($filter) then $update else . end
    " "$DB" > tmp && mv tmp "$DB"
    ;;

  *)
    echo "Commands: find / insert / delete / update"
    ;;
esac

Example:

./jsondb.sh users.ndjson find '.id == 1'
./jsondb.sh users.ndjson insert '{"id":4,"name":"D","email":"d@mail"}'
./jsondb.sh users.ndjson delete '.id == 2'
./jsondb.sh users.ndjson update '.id == 3' '.name = "Carol X"'

Want more?

I can generate:

  • a full query language (where, select, limit, order by)
  • automatic primary key indexing
  • multi-table JSON structure
  • a “transaction log” version
  • CLI with subcommands and help text

Just tell me what you’d like to build next.