Filenames in xHTML output

Pavel Sanda sanda at lyx.org
Wed Jul 29 16:20:55 UTC 2020


On Sun, Jul 19, 2020 at 09:22:33PM -0400, Richard Kimberly Heck wrote:
> > when we export xHTML output, the exported images are in the form
> > "incrementnumber_path_to_the_image_imagefilename.png".
> >
> > I'm strugling with two issues:
> >
> > - the initial number is unstable - e.g. if you insert new fig in the document
> >   all subsequent ones suddenly get +1.  after several exports you get bunch of
> >   obsolete files which you need to manually delete after each update.
> >
> > - the filenames tend to unnecesarily disclose directory structures (from what I
> >   see full path, not just relative  used.)
> >
> > Is there some shortcoming if the filenames were hashes of the pictures
> > (or filename+hash so one can still makes sense of the files)?
> > It would help with both problems.
> 
> I used mangled names just because it was relatively easy to do. We can
> change it to whatever we want, I suppose.

Attached is the patch that mangles graphic filenames by hashes (sha2 of filename+abs path).
It omits counting part as I coud not figure out why we use counting at all.

Is there case in which xHTML meaningfully exports two pictures with the same
path+name but different counter? (I checked the same file with two different
size, but that still exports as a single file).

This should make the exported names stable and not disclosing absolute paths,
the payment are long unreadable filenames (we could add filename), e.g.
0d21b378304bfb8c834763b634ed37377e85f063bb1736138df93f3ee207c14c.jpg
and astronomically small probability that two different files will
end up with the same hash.

We could have this by default, we can have it as a pref or I can just
add this to my private patchset.

Opinions?

Pavel
-------------- next part --------------
diff --git a/src/insets/InsetGraphics.cpp b/src/insets/InsetGraphics.cpp
index b4ddd77a1a..62efa84dc0 100644
--- a/src/insets/InsetGraphics.cpp
+++ b/src/insets/InsetGraphics.cpp
@@ -575,7 +575,7 @@ copyToDirIfNeeded(DocFileName const & file, string const & dir)
 	if (rtrim(only_path, "/") == rtrim(dir, "/"))
 		return make_pair(IDENTICAL_PATHS, FileName(file_in));
 
-	string mangled = file.mangledFileName();
+	string mangled = file.mangledFileName(empty_string(), false, true);
 	if (theFormats().isZippedFile(file)) {
 		// We need to change _eps.gz to .eps.gz. The mangled name is
 		// still unique because of the counter in mangledFileName().
diff --git a/src/support/FileName.cpp b/src/support/FileName.cpp
index 179fef46ee..307406e124 100644
--- a/src/support/FileName.cpp
+++ b/src/support/FileName.cpp
@@ -22,6 +22,7 @@
 #include "support/Package.h"
 #include "support/qstring_helpers.h"
 
+#include <QCryptographicHash>
 #include <QDateTime>
 #include <QDir>
 #include <QFile>
@@ -953,9 +954,13 @@ string DocFileName::outputFileName(string const & path) const
 	return save_abs_path_ ? absFileName() : relFileName(path);
 }
 
-
 string DocFileName::mangledFileName(string const & dir) const
 {
+	return mangledFileName(dir, true, false);
+};
+
+string DocFileName::mangledFileName(string const & dir, bool use_counter, bool encrypt_path) const
+{
 	// Concurrent access to these variables is possible.
 
 	// We need to make sure that every DocFileName instance for a given
@@ -970,8 +975,16 @@ string DocFileName::mangledFileName(string const & dir) const
 		return (*it).second;
 
 	string const name = absFileName();
+
 	// Now the real work. Remove the extension.
 	string mname = support::changeExtension(name, string());
+	if (encrypt_path) {
+		QString qname = QString::fromStdString(mname);
+		QByteArray hash  = QCryptographicHash::hash(qname.toLocal8Bit(),QCryptographicHash::Sha256);
+		hash = hash.toHex();
+		mname = hash.toStdString();
+		}
+	
 	// The mangled name must be a valid LaTeX name.
 	// The list of characters to keep is probably over-restrictive,
 	// but it is not really a problem.
@@ -991,9 +1004,12 @@ string DocFileName::mangledFileName(string const & dir) const
 	// Prepend a counter to the filename. This is necessary to make
 	// the mangled name unique.
 	static int counter = 0;
-	ostringstream s;
-	s << counter++ << mname;
-	mname = s.str();
+
+	if (use_counter) {
+		ostringstream s;
+		s << counter++ << mname;
+		mname = s.str();
+	}
 
 	// MiKTeX's YAP (version 2.4.1803) crashes if the file name
 	// is longer than about 160 characters. MiKTeX's pdflatex
diff --git a/src/support/FileName.h b/src/support/FileName.h
index ac351c2386..9920168140 100644
--- a/src/support/FileName.h
+++ b/src/support/FileName.h
@@ -290,6 +290,8 @@ public:
 	 */
 	std::string
 	mangledFileName(std::string const & dir = empty_string()) const;
+	std::string
+	mangledFileName(std::string const & dir, bool use_counter, bool encrypt_path) const;
 
 	/// \return the absolute file name without its .gz, .z, .Z extension
 	std::string unzippedFileName() const;


More information about the lyx-devel mailing list