Chapter 10. Data management

Table of Contents

10.1. Sharing, copying, and archiving
10.1.1. Archive and compression tools
10.1.2. Copy and synchronization tools
10.1.3. Idioms for the archive
10.1.4. Idioms for the copy
10.1.5. Idioms for the selection of files
10.1.6. Archive media
10.1.7. Removable storage device
10.1.8. Filesystem choice for sharing data
10.1.9. Sharing data via network
10.2. Backup and recovery
10.2.1. Backup and recovery policy
10.2.2. Backup utility suites
10.2.3. Backup tips
10.2.3.1. GUI backup
10.2.3.2. Mount event triggered backup
10.2.3.3. Timer event triggered backup
10.3. Data security infrastructure
10.3.1. Key management for GnuPG
10.3.2. Using GnuPG on files
10.3.3. Using GnuPG with Mutt
10.3.4. Using GnuPG with Vim
10.3.5. The MD5 sum
10.3.6. Password keyring
10.4. Source code merge tools
10.4.1. Extracting differences for source files
10.4.2. Merging updates for source files
10.4.3. Interactive merge
10.5. Git
10.5.1. Configuration of Git client
10.5.2. Basic Git commands
10.5.3. Git tips
10.5.4. Git references
10.5.5. Other version control systems

Tools and tips for managing binary and text data on the Debian system are described.

[Warning] Warning

The uncoordinated write access to actively accessed devices and files from multiple processes must not be done to avoid the race condition. File locking mechanisms using flock(1) may be used to avoid it.

The security of the data and its controlled sharing have several aspects.

  • The creation of data archive

  • The remote storage access

  • The duplication

  • The tracking of the modification history

  • The facilitation of data sharing

  • The prevention of unauthorized file access

  • The detection of unauthorized file modification

These can be realized by using some combination of tools.

  • Archive and compression tools

  • Copy and synchronization tools

  • Network filesystems

  • Removable storage media

  • The secure shell

  • The authentication system

  • Version control system tools

  • Hash and cryptographic encryption tools

Here is a summary of archive and compression tools available on the Debian system.

Table 10.1. List of archive and compression tools

package popcon size extension command comment
tar V:902, I:999 3077 .tar tar(1) the standard archiver (de facto standard)
cpio V:440, I:998 1199 .cpio cpio(1) Unix System V style archiver, use with find(1)
binutils V:172, I:629 144 .ar ar(1) archiver for the creation of static libraries
fastjar V:1, I:13 183 .jar fastjar(1) archiver for Java (zip like)
pax V:8, I:14 170 .pax pax(1) new POSIX standard archiver, compromise between tar and cpio
gzip V:876, I:999 252 .gz gzip(1), zcat(1), … GNU LZ77 compression utility (de facto standard)
bzip2 V:166, I:970 112 .bz2 bzip2(1), bzcat(1), … Burrows-Wheeler block-sorting compression utility with higher compression ratio than gzip(1) (slower than gzip with similar syntax)
lzma V:1, I:16 149 .lzma lzma(1) LZMA compression utility with higher compression ratio than gzip(1) (deprecated)
xz-utils V:360, I:980 1203 .xz xz(1), xzdec(1), … XZ compression utility with higher compression ratio than bzip2(1) (slower than gzip but faster than bzip2; replacement for LZMA compression utility)
zstd V:193, I:481 2158 .zstd zstd(1), zstdcat(1), … Zstandard fast lossless compression utility
p7zip V:20, I:463 8 .7z 7zr(1), p7zip(1) 7-Zip file archiver with high compression ratio (LZMA compression)
p7zip-full V:110, I:480 12 .7z 7z(1), 7za(1) 7-Zip file archiver with high compression ratio (LZMA compression and others)
lzop V:15, I:142 164 .lzo lzop(1) LZO compression utility with higher compression and decompression speed than gzip(1) (lower compression ratio than gzip with similar syntax)
zip V:48, I:380 616 .zip zip(1) InfoZIP: DOS archive and compression tool
unzip V:105, I:771 379 .zip unzip(1) InfoZIP: DOS unarchive and decompression tool

[Warning] Warning

Do not set the "$TAPE" variable unless you know what to expect. It changes tar(1) behavior.

Here are several ways to copy the entire content of the directory "./source" using different tools.

rsync(8):

# cd ./source; rsync -aHAXSv . /dest
# cd ./source; rsync -aHAXSv . [email protected]:/dest

You can alternatively use "a trailing slash on the source directory" syntax.

# rsync -aHAXSv ./source/ /dest
# rsync -aHAXSv ./source/ [email protected]:/dest

Alternatively, by the following.

# cd ./source; find . -print0 | rsync -aHAXSv0 --files-from=- . /dest
# cd ./source; find . -print0 | rsync -aHAXSv0 --files-from=- . [email protected]:/dest

GNU cp(1) and openSSH scp(1):

# cd ./source; cp -a . /dest
# cd ./source; scp -pr . [email protected]:/dest

GNU tar(1):

# (cd ./source && tar cf - . ) | (cd /dest && tar xvfp - )
# (cd ./source && tar cf - . ) | ssh [email protected] '(cd /dest && tar xvfp - )'

cpio(1):

# cd ./source; find . -print0 | cpio -pvdm --null --sparse /dest

You can substitute "." with "foo" for all examples containing "." to copy files from "./source/foo" directory to "/dest/foo" directory.

You can substitute "." with the absolute path "/path/to/source/foo" for all examples containing "." to drop "cd ./source;". These copy files to different locations depending on tools used as follows.

  • "/dest/foo": rsync(8), GNU cp(1), and scp(1)

  • "/dest/path/to/source/foo": GNU tar(1), and cpio(1)

[Tip] Tip

rsync(8) and GNU cp(1) have option "-u" to skip files that are newer on the receiver.

find(1) is used to select files for archive and copy commands (see Section 10.1.3, “Idioms for the archive” and Section 10.1.4, “Idioms for the copy”) or for xargs(1) (see Section 9.4.9, “Repeating a command looping over files”). This can be enhanced by using its command arguments.

Basic syntax of find(1) can be summarized as the following.

  • Its conditional arguments are evaluated from left to right.

  • This evaluation stops once its outcome is determined.

  • "Logical OR" (specified by "-o" between conditionals) has lower precedence than "logical AND" (specified by "-a" or nothing between conditionals).

  • "Logical NOT" (specified by "!" before a conditional) has higher precedence than "logical AND".

  • "-prune" always returns logical TRUE and, if it is a directory, searching of file is stopped beyond this point.

  • "-name" matches the base of the filename with shell glob (see Section 1.5.6, “Shell glob”) but it also matches its initial "." with metacharacters such as "*" and "?". (New POSIX feature)

  • "-regex" matches the full path with emacs style BRE (see Section 1.6.2, “Regular expressions”) as default.

  • "-size" matches the file based on the file size (value precedented with "+" for larger, precedented with "-" for smaller)

  • "-newer" matches the file newer than the one specified in its argument.

  • "-print0" always returns logical TRUE and print the full filename (null terminated) on the standard output.

find(1) is often used with an idiomatic style as the following.

# find /path/to \
    -xdev -regextype posix-extended \
    -type f -regex ".*\.cpio|.*~" -prune -o \
    -type d -regex ".*/\.git" -prune -o \
    -type f -size +99M -prune -o \
    -type f -newer /path/to/timestamp -print0

This means to do following actions.

  1. Search all files starting from "/path/to"

  2. Globally limit its search within its starting filesystem and uses ERE (see Section 1.6.2, “Regular expressions”) instead

  3. Exclude files matching regex of ".*\.cpio" or ".*~" from search by stop processing

  4. Exclude directories matching regex of ".*/\.git" from search by stop processing

  5. Exclude files larger than 99 Megabytes (units of 1048576 bytes) from search by stop processing

  6. Print filenames which satisfy above search conditions and are newer than "/path/to/timestamp"

Please note the idiomatic use of "-prune -o" to exclude files in the above example.

[Note] Note

For non-Debian Unix-like system, some options may not be supported by find(1). In such a case, please consider to adjust matching methods and replace "-print0" with "-print". You may need to adjust related commands too.

When choosing computer data storage media for important data archive, you should be careful about their limitations. For small personal data backup, I use CD-R and DVD-R by the brand name company and store in a cool, shaded, dry, clean environment. (Tape archive media seem to be popular for professional use.)

[Note] Note

A fire-resistant safe are meant for paper documents. Most of the computer data storage media have less temperature tolerance than paper. I usually rely on multiple secure encrypted copies stored in multiple secure locations.

Optimistic storage life of archive media seen on the net (mostly from vendor info).

  • 100+ years : Acid free paper with ink

  • 100 years : Optical storage (CD/DVD, CD/DVD-R)

  • 30 years : Magnetic storage (tape, floppy)

  • 20 years : Phase change optical storage (CD-RW)

These do not count on the mechanical failures due to handling etc.

Optimistic write cycle of archive media seen on the net (mostly from vendor info).

  • 250,000+ cycles : Harddisk drive

  • 10,000+ cycles : Flash memory

  • 1,000 cycles : CD/DVD-RW

  • 1 cycles : CD/DVD-R, paper

[Caution] Caution

Figures of storage life and write cycle here should not be used for decisions on any critical data storage. Please consult the specific product information provided by the manufacture.

[Tip] Tip

Since CD/DVD-R and paper have only 1 write cycle, they inherently prevent accidental data loss by overwriting. This is advantage!

[Tip] Tip

If you need fast and frequent backup of large amount of data, a hard disk on a remote host linked by a fast network connection, may be the only realistic option.

[Tip] Tip

If you use re-writable media for your backups, use of filesystem such as btrfs or zfs which supports read-only snapshots may be a good idea.

Removable storage devices may be any one of the following.

They may be connected via any one of the following.

Modern desktop environments such as GNOME and KDE can mount these removable devices automatically without a matching "/etc/fstab" entry.

  • udisks2 package provides a daemon and associated utilities to mount and unmount these devices.

  • D-bus creates events to initiate automatic processes.

  • PolicyKit provides required privileges.

[Tip] Tip

Automounted devices may have the "uhelper=" mount option which is used by umount(8).

[Tip] Tip

Automounting under modern desktop environment happens only when those removable media devices are not listed in "/etc/fstab".

Mount point under modern desktop environment is chosen as "/media/username/disk_label" which can be customized by the following.

  • mlabel(1) for FAT filesystem

  • genisoimage(1) with "-V" option for ISO9660 filesystem

  • tune2fs(1) with "-L" option for ext2/ext3/ext4 filesystem

[Tip] Tip

The choice of encoding may need to be provided as mount option (see Section 8.1.3, “Filename encoding”).

[Tip] Tip

The use of the GUI menu to unmount a filesystem may remove its dynamically generated device node such as "/dev/sdc". If you wish to keep its device node, unmount it with the umount(8) command from the shell prompt.

When sharing data with other system via removable storage device, you should format it with common filesystem supported by both systems. Here is a list of filesystem choices.


[Tip] Tip

See Section 9.9.1, “Removable disk encryption with dm-crypt/LUKS” for cross platform sharing of data using device level encryption.

The FAT filesystem is supported by almost all modern operating systems and is quite useful for the data exchange purpose via removable hard disk like media.

When formatting removable hard disk like devices for cross platform sharing of data with the FAT filesystem, the following should be safe choices.

When using the FAT or ISO9660 filesystems for sharing data, the following should be the safe considerations.

  • Archiving files into an archive file first using tar(1), or cpio(1) to retain the long filename, the symbolic link, the original Unix file permission and the owner information.

  • Splitting the archive file into less than 2 GiB chunks with the split(1) command to protect it from the file size limitation.

  • Encrypting the archive file to secure its contents from the unauthorized access.

[Note] Note

For FAT filesystems by its design, the maximum file size is (2^32 - 1) bytes = (4GiB - 1 byte). For some applications on the older 32 bit OS, the maximum file size was even smaller (2^31 - 1) bytes = (2GiB - 1 byte). Debian does not suffer the latter problem.

[Note] Note

Microsoft itself does not recommend to use FAT for drives or partitions of over 200 MB. Microsoft highlights its short comings such as inefficient disk space usage in their "Overview of FAT, HPFS, and NTFS File Systems". Of course, we should normally use the ext4 filesystem for Linux.

[Tip] Tip

For more on filesystems and accessing filesystems, please read "Filesystems HOWTO".

We all know that computers fail sometime or human errors cause system and data damages. Backup and recovery operations are the essential part of successful system administration. All possible failure modes hit you some day.

[Tip] Tip

Keep your backup system simple and backup your system often. Having backup data is more important than how technically good your backup method is.

There are 3 key factors which determine actual backup and recovery policy.

  1. Knowing what to backup and recover.

  2. Knowing how to backup and recover.

    • Secure storage of data: protection from overwrite and system failure

    • Frequent backup: scheduled backup

    • Redundant backup: data mirroring

    • Fool proof process: easy single command backup

  3. Assessing risks and costs involved.

    • Risk of data when lost

      • Data should be at least on different disk partitions preferably on different disks and machines to withstand the filesystem corruption. Important data are best stored on a read-only filesystem. [4]

    • Risk of data when breached

      • Sensitive identity data such as "/etc/ssh/ssh_host_*_key", "~/.gnupg/*", "~/.ssh/*", "~/.local/share/keyrings/*", "/etc/passwd", "/etc/shadow", "popularity-contest.conf", "/etc/ppp/pap-secrets", and "/etc/exim4/passwd.client" should be backed up as encrypted. [5] (See Section 9.9, “Data encryption tips”.)

      • Never hard code system login password nor decryption passphrase in any script even on any trusted system. (See Section 10.3.6, “Password keyring”.)

    • Failure mode and their possibility

      • Hardware (especially HDD) will break

      • Filesystem may be corrupted and data in it may be lost

      • Remote storage system can't be trusted for security breaches

      • Weak password protection can be easily compromised

      • File permission system may be compromised

    • Required resources for backup: human, hardware, software, …

      • Automatic scheduled backup with cron job or systemd timer job

[Tip] Tip

You can recover debconf configuration data with "debconf-set-selections debconf-selections" and dpkg selection data with "dpkg --set-selection <dpkg-selections.list".

[Note] Note

Do not back up the pseudo-filesystem contents found on /proc, /sys, /tmp, and /run (see Section 1.2.12, “procfs and sysfs” and Section 1.2.13, “tmpfs”). Unless you know exactly what you are doing, they are huge useless data.

[Note] Note

You may wish to stop some application daemons such as MTA (see Section 6.2.4, “Mail transport agent (MTA)”) while backing up data.

Here is a select list of notable backup utility suites available on the Debian system.


Backup tools have their specialized focuses.

  • Mondo Rescue is a backup system to facilitate restoration of complete system quickly from backup CD/DVD etc. without going through normal system installation processes.

  • Bacula, Amanda, and BackupPC are full featured backup suite utilities which are focused on regular backups over network.

  • Duplicity, and Borg are simpler backup utilities for typical workstations.

For a personal workstation, full featured backup suite utilities designed for the server environment may not serve well. At the same time, existing backup utilities for workstations may have some shortcomings.

Here are some tips to make backup easier with minimal user efforts. These techniques may be used with any backup utilities.

For demonstration purpose, let's assume the primary user and group name to be penguin and create a backup and snapshot script example "/usr/local/bin/bkss.sh" as:

#!/bin/sh -e
SRC="$1" # source data path
DSTFS="$2" # backup destination filesystem path
DSTSV="$3" # backup destination subvolume name
DSTSS="${DSTFS}/${DSTSV}-snapshot" # snapshot destination path
if [ "$(stat -f -c %T "$DSTFS")" != "btrfs" ]; then
  echo "E: $DESTFS needs to be formatted to btrfs" >&2 ; exit 1
fi
MSGID=$(notify-send -p "bkup.sh $DSTSV" "in progress ...")
if [ ! -d "$DSTFS/$DSTSV" ]; then
  btrfs subvolume create "$DSTFS/$DSTSV"
  mkdir -p "$DSTSS"
fi
rsync -aHxS --delete --mkpath "${SRC}/" "${DSTFS}/${DSTSV}"
btrfs subvolume snapshot -r "${DSTFS}/${DSTSV}" ${DSTSS}/$(date -u --iso=min)
notify-send -r "$MSGID" "bkup.sh $DSTSV" "finished!"

Here, only the basic tool rsync(1) is used to facilitate system backup and the storage space is efficiently used by Btrfs.

[Tip] Tip

FYI: This author uses his own similar shell script "bss: Btrfs Subvolume Snapshot Utility" for his workstation.

Here is an example to setup the single GUI click backup.

For each GUI click, your data is backed up from "~/Documents" to a USB storage device and a read-only snapshot is created.

Here is an example to setup for the automatic backup triggered by the mount event.

  • Prepare a USB storage device to be used for backup as in Section 10.2.3.1, “GUI backup”.

  • Create a systemd service unit file "~/.config/systemd/user/back-BKUP.service" as:

    [Unit]
    Description=USB Disk backup
    Requires=media-%u-BKUP.mount
    After=media-%u-BKUP.mount
    
    [Service]
    ExecStart=/usr/local/bin/bkss.sh %h/Documents /media/%u/BKUP Documents
    StandardOutput=append:%h/.cache/systemd-snap.log
    StandardError=append:%h/.cache/systemd-snap.log
    
    [Install]
    WantedBy=media-%u-BKUP.mount
    
  • Enable this systemd unit configuration with the following:

     $ systemctl --user enable bkup-BKUP.service
    

For each mount event, your data is backed up from "~/Documents" to a USB storage device and a read-only snapshot is created.

Here, names of systemd mount units that systemd currently has in memory can be asked to the service manager of the calling user with "systemctl --user list-units --type=mount".

Here is an example to setup for the automatic backup triggered by the timer event.

  • Prepare a USB storage device to be used for backup as in Section 10.2.3.1, “GUI backup”.

  • Create a systemd timer unit file "~/.config/systemd/user/snap-Documents.timer" as:

    [Unit]
    Description=Run btrfs subvolume snapshot on timer
    Documentation=man:btrfs(1)
    
    [Timer]
    OnStartupSec=30
    OnUnitInactiveSec=900
    
    [Install]
    WantedBy=timers.target
    
  • Create a systemd service unit file "~/.config/systemd/user/snap-Documents.service" as:

    [Unit]
    Description=Run btrfs subvolume snapshot
    Documentation=man:btrfs(1)
    
    [Service]
    Type=oneshot
    Nice=15
    ExecStart=/usr/local/bin/bkss.sh %h/Documents /media/%u/BKUP Documents
    IOSchedulingClass=idle
    CPUSchedulingPolicy=idle
    StandardOutput=append:%h/.cache/systemd-snap.log
    StandardError=append:%h/.cache/systemd-snap.log
    
  • Enable this systemd unit configuration with the following:

     $ systemctl --user enable snap-Documents.timer
    

For each timer event, your data is backed up from "~/Documents" to a USB storage device and a read-only snapshot is created.

Here, names of systemd timer user units that systemd currently has in memory can be asked to the service manager of the calling user with "systemctl --user list-units --type=timer".

For the modern desktop system, this systemd approach can offer more fine grained control than the traditional Unix ones using at(1), cron(8), or anacron(8).

The data security infrastructure is provided by the combination of data encryption tool, message digest tool, and signature tool.


See Section 9.9, “Data encryption tips” on dm-crypt and fscrypt which implement automatic data encryption infrastructure via Linux kernel modules.

Here are GNU Privacy Guard commands for the basic key management.


Here is the meaning of the trust code.


The following uploads my key "1DD8D791" to the popular keyserver "hkp://keys.gnupg.net".

$ gpg --keyserver hkp://keys.gnupg.net --send-keys 1DD8D791

A good default keyserver set up in "~/.gnupg/gpg.conf" (or old location "~/.gnupg/options") contains the following.

keyserver hkp://keys.gnupg.net

The following obtains unknown keys from the keyserver.

$ gpg --list-sigs --with-colons | grep '^sig.*\[User ID not found\]' |\
          cut -d ':' -f 5| sort | uniq | xargs gpg --recv-keys

There was a bug in OpenPGP Public Key Server (pre version 0.9.6) which corrupted key with more than 2 sub-keys. The newer gnupg (>1.2.1-2) package can handle these corrupted subkeys. See gpg(1) under "--repair-pks-subkey-bug" option.

md5sum(1) provides utility to make a digest file using the method in rfc1321 and verifying each file with it.

$ md5sum foo bar >baz.md5
$ cat baz.md5
d3b07384d113edec49eaa6238ad5ff00  foo
c157a79031e1c40f85931829bc5fc552  bar
$ md5sum -c baz.md5
foo: OK
bar: OK
[Note] Note

The computation for the MD5 sum is less CPU intensive than the one for the cryptographic signature by GNU Privacy Guard (GnuPG). Usually, only the top level digest file is cryptographically signed to ensure data integrity.

There are many merge tools for the source code. Following commands caught my eyes.

Table 10.10. List of source code merge tools

package popcon size command description
patch V:97, I:700 248 patch(1) apply a diff file to an original
vim V:95, I:369 3743 vimdiff(1) compare 2 files side by side in vim
imediff V:0, I:0 200 imediff(1) interactive full screen 2/3-way merge tool
meld V:7, I:30 3536 meld(1) compare and merge files (GTK)
wiggle V:0, I:0 175 wiggle(1) apply rejected patches
diffutils V:862, I:996 1735 diff(1) compare files line by line
diffutils V:862, I:996 1735 diff3(1) compare and merges three files line by line
quilt V:2, I:22 871 quilt(1) manage series of patches
wdiff V:7, I:51 648 wdiff(1) display word differences between text files
diffstat V:13, I:121 74 diffstat(1) produce a histogram of changes by the diff
patchutils V:16, I:119 232 combinediff(1) create a cumulative patch from two incremental patches
patchutils V:16, I:119 232 dehtmldiff(1) extract a diff from an HTML page
patchutils V:16, I:119 232 filterdiff(1) extract or excludes diffs from a diff file
patchutils V:16, I:119 232 fixcvsdiff(1) fix diff files created by CVS that patch(1) mis-interprets
patchutils V:16, I:119 232 flipdiff(1) exchange the order of two patches
patchutils V:16, I:119 232 grepdiff(1) show which files are modified by a patch matching a regex
patchutils V:16, I:119 232 interdiff(1) show differences between two unified diff files
patchutils V:16, I:119 232 lsdiff(1) show which files are modified by a patch
patchutils V:16, I:119 232 recountdiff(1) recompute counts and offsets in unified context diffs
patchutils V:16, I:119 232 rediff(1) fix offsets and counts of a hand-edited diff
patchutils V:16, I:119 232 splitdiff(1) separate out incremental patches
patchutils V:16, I:119 232 unwrapdiff(1) demangle patches that have been word-wrapped
dirdiff V:0, I:1 167 dirdiff(1) display differences and merge changes between directory trees
docdiff V:0, I:0 553 docdiff(1) compare two files word by word / char by char
makepatch V:0, I:0 100 makepatch(1) generate extended patch files
makepatch V:0, I:0 100 applypatch(1) apply extended patch files

Git is the tool of choice these days for the version control system (VCS) since Git can do everything for both local and remote source code management.

Debian provides free Git services via Debian Salsa service. Its documentation can be found at https://wiki.debian.org/Salsa .

Here are some Git related packages.


You may wish to set several global configuration in "~/.gitconfig" such as your name and email address used by Git by the following.

$ git config --global user.name "Name Surname"
$ git config --global user.email [email protected]

You may also customize the Git default behavior by the following.

$ git config --global init.defaultBranch main
$ git config --global pull.rebase true
$ git config --global push.default current

If you are too used to CVS or Subversion commands, you may wish to set several command aliases by the following.

$ git config --global alias.ci "commit -a"
$ git config --global alias.co checkout

You can check your global configuration by the following.

$ git config --global --list

Git operation involves several data.

  • The working tree which holds user facing files and to which you make changes.

    • The changes to be recorded must be explicitly selected and staged to the index. This is git add and git rm commands.

  • The index which holds staged files.

    • Staged files will be committed to the local repository upon the subsequent request. This is git commit command.

  • The local repository which holds committed files.

    • Git records the linked history of the committed data and organizes them as branches in the repository.

    • The local repository can send data to the remote repository by git push command.

    • The local repository can receive data from the remote repository by git fetch and git pull commands.

      • The git pull command performs git merge or git rebase command after git fetch command.

      • Here, git merge combines two separate branches of history at the end to a point. (This is default of git pull without customization and may be good for upstream people who publish branch to many people.)

      • Here, git rebase creates one single branch of sequential history of the remote branch one followed by the local branch one. (This is pull.rebase true customization case and may be good for rest of us.)

  • The remote repository which holds committed files.

    • The communication to the remote repository uses secure communication protocols such as SSH or HTTPS.

The working tree is files outside of the .git/ directory. Files inside of the .git/ directory hold the index, the local repository data, and some git configuration text files.

Here is an overview of main Git commands.


Here are some Git tips.

Table 10.13. Git tips

Git command line function
gitk --all see complete Git history and operate on them such as resetting HEAD to another commit, cheery-picking patches, creating tags and branches ...
git stash get the clean working tree without loosing data
git remote -v check settings for remote
git branch -vv check settings for branch
git status show working tree status
git config -l list git settings
git reset --hard HEAD; git clean -x -d -f revert all working tree changes and clean them up completely
git rm --cached filename revert staged index changed by git add filename
git reflog get reference log (useful for recovering commits from the removed branch)
git branch new_branch_name HEAD@{6} create a new branch from reflog information
git remote add new_remote URL add a new_remote remote repository pointed by URL
git remote rename origin upstream rename the remote repository name from origin to upstream
git branch -u upstream/branch_name set the remote tracking to the remote repository upstream and its branch name branch_name.
git remote set-url origin https://foo/bar.git change URL of origin
git remote set-url --push upstream DISABLED disable push to upstream (Edit .git/config to re-enable)
git remote update upstream fetch updates of all remote branches in the upstream repository
git fetch upstream foo:upstream-foo create a local (possibly orphan) upstream-foo branch as a copy of foo branch in the upstream repository
git checkout -b topic_branch ; git push -u topic_branch origin make a new topic_branch and push it to origin
git branch -m oldname newname rename local branch name
git push -d origin branch_to_be_removed remove remote branch (new method)
git push origin :branch_to_be_removed remove remote branch (old method)
git checkout --orphan unconnected create a new unconnected branch
git rebase -i origin/main reorder/drop/squish commits from origin/main to clean branch history
git reset HEAD^; git commit --amend squash last 2 commits into one
git checkout topic_branch ; git merge --squash topic_branch squash entire topic_branch into a commit
git fetch --unshallow --update-head-ok origin '+refs/heads/*:refs/heads/*' convert a shallow clone to the full clone of all branches
git ime split the last commit into a series of file-by-file smaller commits etc. (imediff package required)
git repack -a -d; git prune repack the local repository into single pack (this may limit chance of lost data recovery from erased branch etc.)

[Warning] Warning

Do not use the tag string with spaces in it even if some tools such as gitk(1) allow you to use it. It may choke some other git commands.

[Caution] Caution

If a local branch which has been pushed to remote repository is rebased or squashed, pushing this branch has risks and requires --force option. This is usually not an acceptable for main branch but may be acceptable for a topic branch before merging to main branch.

[Caution] Caution

Invoking a git subcommand directly as "git-xyz" from the command line has been deprecated since early 2006.

[Tip] Tip

If there is a executable file git-foo in the path specified by $PATH, entering "git foo" without hyphen to the command line invokes this git-foo. This is a feature of the git command.

See the following.



[4] A write-once media such as CD/DVD-R can prevent overwrite accidents. (See Section 9.8, “The binary data” for how to write to the storage media from the shell commandline. GNOME desktop GUI environment gives you easy access via menu: "Places→CD/DVD Creator".)

[5] Some of these data can not be regenerated by entering the same input string to the system.

[6] If you use "~/.vimrc" instead of "~/.vim/vimrc", please substitute accordingly.