Skip to content

Win32 path canonicalization refactoring#4852

Merged
ethomson merged 5 commits intomasterfrom
ethomson/unc_paths
Oct 20, 2018
Merged

Win32 path canonicalization refactoring#4852
ethomson merged 5 commits intomasterfrom
ethomson/unc_paths

Conversation

@ethomson
Copy link
Copy Markdown
Member

PR #4825 fixes obviously incorrect behavior in our path canonicalization, taking namespace-formatted paths and formatting them in something that an end-user might want to see.

Reviewing that, I noticed that there was some rather confusing naming going on that could be clarified. For example, we had both a git_win32__canonicalize_path function and a git_win32_path_canonicalize function? Ouch.

I took #4825 and added some refactoring on top of that:

  • I added some additional unit tests. I wanted to be able to walk through the function and understand it again.
  • I renamed git_win32__canonicalize_path to git_win32_path_remove_namespace to reflect better what it actually does. (It removes the \\?\, \??\ or \\?\UNC\ prefixes.)
  • I also updated git_win32_path_remove_namespace to talk about the namespace prefixes being removed as namespaces, and the UNC prefix being added back (\\, if there is one) as a prefix to disambiguate.

I opened a PR for visibility, but I want to merge this quickly to alleviate the pain that this bug is causing.

Copy link
Copy Markdown
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a stylistic point (which could be ignored IMHO). The rename sure makes sense, so I'm 👍.

I'm left with a confusing urge to learn more about those strange Windows pathnames. Or flee, I'm not sure which 😜.

break;

/* Don't trim backslashes from drive letter paths, which
* are 3 characters long and of the form C:\, D:\, etc. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: given the previous hunks, you might want to indent 😉.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - Visual Studio is not great at formatting pasted hunks. Fixed.

The internal API `git_win32__canonicalize_path` is far, far too easily
confused with the internal API `git_win32_path_canonicalize`.  The
former removes the namespace prefix from a path (eg, given
`\\?\C:\Temp\foo`, it returns `C:\Temp\foo`, and given
`\\?\UNC\server\share`, it returns `\\server\share`).  As such, rename
it to `git_win32_path_remove_namespace`.

`git_win32_path_canonicalize` remains unchanged.
Update `git_win32_path_remove_namespace` to disambiguate the prefix
being removed versus the prefix being added.  Now we remove the
"namespace", and (may) add a "prefix" in its place.  Eg, we remove the
`\\?\` namespace.  We remove the `\\?\UNC\` namespace, and replace it
with the `\\` prefix.  This aids readability somewhat.

Additionally, use pointer arithmetic instead of offsets, which seems to
also help readability.
@ethomson
Copy link
Copy Markdown
Member Author

I'm left with a confusing urge to learn more about those strange Windows pathnames. Or flee, I'm not sure which 😜.

They're a bit odd at first, but they're not actually that bad.

The paths you typically see (eg, C:\Foo\Bar) are a convenient fiction that keeps backward compatibility with DOS paths, but in fact bely a more modern filesystem.

We moved over to use the namespaced paths explicitly, converting "standard" paths like C:\Foo\bar into \\?\C:\Foo\bar back when we were addressing CVE 2014-9390. As part of the DOS path backward compatibility, it does a number of translation steps. DOS didn't allow files to end in a dot or a space, so it silently ate those if you tried to create one. So you could create .git. or .git , then it would happily write .git instead.

Similarly when DOS introduced "long filenames" (ie, not 8.3) it supported a mapping from long filename to an 8.3 name for people stuck on old versions of DOS. So if you create longfilename.txt, Windows will create a mapping on-disk called LONGFI~1.TXT. Yes, still today, even on an NTFS volume (do a dir /x), even though that volume can't be read by any device that only understands short filenames. That's some intense backcompat. And it's troublesome for us, because .git will be given a shortname of GIT~1, which you could use to sneak files into .git.

Anyway. The easiest way for us to feel confident avoiding these attacks is to skip the DOS compatibility layer entirely. We can do this by using namespaced paths. If we open \\?\C:\Foo\bar, the Win32 file APIs (eg, CreateFile) will see that path and hand us over to the NT APIs (eg, NTCreateFile) without doing any manipulation to remove trailing dots or spaces or short path conversion. (However, we're likely to get back those paths - if we do a directory listing, for example - so we need to do a two way conversion ourselves.)

As best as I understand from authoritative sources, this is the only way to avoid this path manipulation. There's good news here, though. It also removes the 260 character PATH_MAX limitation. We have some places where we hardcode path lengths (for performance in UTF8 <-> UTF16 conversion) but this opened the door to support long paths for us.

We also examine \??\ as well as \\?\. I don't see the documentation mentioning this, and I've lost a bit of context due to rusty memory, but I'm pretty sure that either this was once documented or we simply saw paths coming back in that format from (say) enumerating directory contents or resolving symbolic links or junction points.

Note that there is still a translation layer, to get directly to the devices. Although it's uncommon, you can actually mount devices throughout the path space in NT. Recall that in DOS, you had drives named with letters, eg A:, B:, C:... That fiction persists in NT, where you might have a C: drive, and a second hard disk mounted at D:. If you want (and unless you have good reason, you probably shouldn't) you could mount your first hard disk as C: and then mount a second hard disk (or network volume, etc) as C:\Foo.

So ultimately, all your paths get translated into a proc-like filesystem semantics. You could actually open \\.\Physicaldisk0\... or something like that. I only know that this sort of exists, I haven't spent any time learning about it. This is the point at which I decided to flee. 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants