Guides & Tutorials

An Advanced Git Tutorial: Lessons from a Real-World Versioning Crisis

code-versioning

Amit Choudhary

Ready to transform your data strategy with cutting-edge solutions?

Get key insights and all the details you need in one easy-to-access guide 🚀

The Setup

I was working on a large content repository on Windows, and I needed to version some new work — campaign assets, workshop content, LinkedIn job descriptions, and some file deletions. Simple enough, right? What followed was a two-day journey through some of Git's more obscure corners.

Chapter 1: The Selective Commit

My working directory had ~18 groups of changes — modified files, deleted files, and new untracked folders. I didn't want to commit everything. Some were temp files, some were scratch scripts, some weren't ready.

The principle: Never use git add -A or git add . blindly. I listed everything, categorized it, and let the user pick exactly what to stage.

git status # See the full picture

git add file1 file2 dir/ # Stage only what's needed

Why this matters: In a shared repo, accidentally committing temp files, credentials, or half-finished work creates noise and potential security risks. Selective staging is a discipline, not a preference.

Chapter 2: The Lock File Ambush

My first git add failed:

error: open("~$workshop-1-walkthrough.xlsx"): Permission denied

fatal: adding files failed

The ~$ prefix is Excel's lock file — it exists while the file is open and is locked by the Excel process. Git tried to index it and couldn't.

The fix: Close Excel, then re-run the add. But there's a deeper lesson — these lock files should never be committed. A proper .gitignore would have:

~$*

Key insight: git add <directory>/ adds everything inside that directory recursively. If even one file in that tree is locked or problematic, the entire add fails. When adding directories, be aware of what's inside.

Chapter 3: The Cross-Platform Filename Trap

After committing, I needed to sync with the remote. I ran:

git pull origin main

And got:

error: invalid path 'DBX*Enqurious Hackathon.png'

error: invalid path 'Enqurious * DBX Hackathon.png'

error: invalid path 'Production GenAI — CI:CD, Memory...png'

Merge with strategy ort failed.

What happened: Someone (likely on macOS) had pushed files with * and : in their names. macOS and Linux allow these characters. Windows does not — they're reserved by NTFS. Git's merge strategy (ort) tries to update the working tree, and when it can't create these files, the entire merge aborts.

Critical detail: This isn't just a checkout problem. Both git pull (merge) and git rebase failed because both need to manipulate the working tree:

git pull origin main # Failed — can't create files

git stash && git rebase origin/main # Also failed — same reason

Even rebase, which I tried hoping it would handle things differently, failed at the detach HEAD step because it needed to check out the remote's state first.

Chapter 4: The Sparse-Checkout Detour

I tried using sparse-checkout to exclude the problematic directories entirely:

git sparse-checkout init --cone

git sparse-checkout set '/*' '!gen-ai/gen-ai-hackahton/...'

This failed too. Cone mode (the modern, performant sparse-checkout mode) doesn't support negation patterns (!). It only works with directory inclusion — you tell it which directories you want, not which to exclude. For a repo with dozens of top-level directories, this was impractical.

I could have switched to non-cone mode (the older pattern-based mode), which does support negation:

git sparse-checkout init --no-cone

# Then edit .git/info/sparse-checkout with patterns

But at this point, the approach was getting complex and fragile. I abandoned it.

Lesson: Sparse-checkout is powerful for working with monorepos where you genuinely only need a subset of the code. It's not the right tool for "skip 3 problematic files during a merge." It changes how your entire working tree behaves and can create confusing states if not managed carefully.

Chapter 5: The Option We Had (But Didn't Need)

There was always a nuclear option available:

git -c core.protectNTFS=false pull origin main

core.protectNTFS is a Git safety feature that prevents checking out files with names that are invalid on Windows. Setting it to false tells Git: "Go ahead with the merge. If you can't create a file on disk, just skip it — but record it correctly in Git's object database."

This is safe for the repository's integrity — Git's internal history remains correct. The files simply won't exist in your working directory. But understandably, temporarily disabling a safety feature made us cautious.

Chapter 6: The Clean Resolution

The real fix was the simplest: the teammate who pushed those files renamed them (removing * and : characters). After that:

git fetch origin main # Download remote state (safe, changes nothing locally)

git pull origin main # Merge — now works cleanly

git push origin main # Push our local commits

Why fetch before pull? git fetch is read-only — it downloads remote refs and objects but doesn't touch your working tree or branches. I used it throughout this process to inspect what the remote had without risking any local state changes. git pull is essentially git fetch + git merge combined.

Key Takeaways

1. git fetch is your best friend

When things are uncertain, fetch first, inspect, then decide. It's the only network operation in Git that's completely safe.

git fetch origin main

git log --oneline HEAD..origin/main # What do they have that I don't?

git log --oneline origin/main..HEAD # What do I have that they don't?

2. Diverged branches are normal

"Your branch and remote have diverged" sounds scary but it's just: you have commits they don't, and they have commits you don't. A merge or rebase brings them together.

3. Cross-platform repos need filename discipline

If your team uses Windows, macOS, and Linux, establish naming rules. Avoid: * : " < > | ? \ and trailing spaces/dots. Add a pre-commit hook to enforce this:

# .githooks/pre-commit

if git diff --cached --name-only | grep -E '[*:"<>|?\\]'; then

echo "Error: Filenames contain characters invalid on Windows"

exit 1

4. Know your escape hatches (and their tradeoffs)

Situation	Tool	Risk Level
Inspect remote	git fetch	None
Save work temporarily	git stash	Low
Skip NTFS checks	core.protectNTFS=false	Low (history safe, files missing locally)
Exclude directories	sparse-checkout	Medium (changes working tree behavior)
Overwrite remote	git push --force	High (destroys others' work)

5. Fix root causes, not symptoms

We could have hacked around the filename issue with config flags and sparse-checkout. But the correct fix was renaming the files. Every workaround adds complexity and potential for future confusion. When possible, fix the actual problem.

Ready to Experience the Future of Data?

Discover how Enqurious helps deliver an end-to-end learning experience

Curious how we're reshaping the future of data? Watch our story unfold

Get Free Snowpro Core Certification Skill Path

Data Quality Explained: Challenges, Best Practices, and Complete 2026 Guide blog cover image

Guides & Tutorials

January 23, 2026

Data Quality Explained: Challenges, Best Practices, and Complete 2026 Guide

A complete beginner’s guide to data quality, covering key challenges, real-world examples, and best practices for building trustworthy data.

Divyanshi Data Engineer

Data Lakehouse Demystified: Unlocking Databricks’ Hidden Powers in 2025 blog cover image

Guides & Tutorials

December 29, 2025

Data Lakehouse Demystified: Unlocking Databricks’ Hidden Powers in 2025

Explore the power of Databricks Lakehouse, Delta tables, and modern data engineering practices to build reliable, scalable, and high-quality data pipelines."

Divyanshi Data Engineer

Data Doesn’t Wait Anymore: A Guide to Streaming with Azure Databricks blog cover image

Guides & Tutorials

December 15, 2025

Data Doesn’t Wait Anymore: A Guide to Streaming with Azure Databricks

Data doesn’t wait - and neither should your insights. This blog breaks down streaming vs batch processing and shows, step by step, how to process real-time data using Azure Databricks.

Divyanshi Data Engineer

Unity Catalog Just Leveled Up: Meet your Data’s New Bodyguards blog cover image

Guides & Tutorials

December 8, 2025

Unity Catalog Just Leveled Up: Meet your Data’s New Bodyguards

This blog talks about Databricks’ Unity Catalog upgrades -like Governed Tags, Automated Data Classification, and ABAC which make data governance smarter, faster, and more automated.

Divyanshi Data Engineer

"Yeh Dosti" of AI: Claude & Nano Banana as Jai & Veeru! blog cover image

Guides & Tutorials

December 6, 2025

"Yeh Dosti" of AI: Claude & Nano Banana as Jai & Veeru!

Tired of boring images? Meet the 'Jai & Veeru' of AI! See how combining Claude and Nano Banana Pro creates mind-blowing results for comics, diagrams, and more.

Burhanuddin DevOps Engineer

The Day I Discovered Databricks Connect blog cover image

Guides & Tutorials

December 1, 2025

The Day I Discovered Databricks Connect

This blog walks you through how Databricks Connect completely transforms PySpark development workflow by letting us run Databricks-backed Spark code directly from your local IDE. From setup to debugging to best practices this Blog covers it all.

Divyanshi Data Engineer

Understanding the Power Law Distribution blog cover image

Guides & Tutorials

January 3, 2025

Understanding the Power Law Distribution

This blog talks about the Power Law statistical distribution and how it explains content virality

Amit Co-founder & CEO

An L&D Strategy to achieve 100% Certification clearance blog cover image

Guides & Tutorials

December 6, 2023

An L&D Strategy to achieve 100% Certification clearance

An account of experience gained by Enqurious team as a result of guiding our key clients in achieving a 100% success rate at certifications

Amit Co-founder & CEO

An Advanced Git Tutorial: Lessons from a Real-World Versioning Crisis

Ready to transform your data strategy with cutting-edge solutions?

The Setup

Chapter 1: The Selective Commit

Chapter 2: The Lock File Ambush

Chapter 3: The Cross-Platform Filename Trap

Chapter 4: The Sparse-Checkout Detour

Chapter 5: The Option We Had (But Didn't Need)

Chapter 6: The Clean Resolution

Key Takeaways

1. git fetch is your best friend

2. Diverged branches are normal

3. Cross-platform repos need filename discipline

4. Know your escape hatches (and their tradeoffs)

5. Fix root causes, not symptoms

Ready to Experience the Future of Data?

You Might Also Like

By Need

Fresher Upskilling

Continuous Learning

By Technology

By Industry

By Skill Persona