GIS - TryCreateForInput

Dani_S 4,836 Reputation points
2025-12-11T17:53:15.8566667+00:00

Hi,

1.I used GIS API .

2.There are 15 formats:

EsriJson GeoJson GeoJsonSeq Kml/Kmz Shapefile OsmXml Gpx Gml FileGdb TopoJson MapInfoInterchange MapInfoTab Csv GeoPackage.

3.This is my main function: TryCreateForInput that gisInputFilePath-Path to a single GIS file or archive to inspect and returns IConverterFactory + detectReason-uman friendly reason describing how the converter was selected (useful for logging).

Since it critical in my app and difficult to hanle all cases , please can you look on the code

if it a good practice. If not can you make the fixes, the class not depend on other classes.

4.The summary of main function:

    /// <summary>

    /// Inspect the input path (file or archive) and attempt to resolve a converter from the factory.

    /// </summary>

    /// <param name="factory">Factory used to resolve a converter key to an <see cref="IConverter"/> instance.</param>

    /// <param name="gisInputFilePath">Path to a single GIS file or archive to inspect.</param>

    /// <param name="converter">Out parameter populated with the resolved converter when the method returns true.</param>

    /// <param name="detectReason">Human friendly reason describing how the converter was selected (useful for logging).</param>

    /// <returns>True when a converter was resolved; false when detection failed or result was ambiguous.</returns>

    /// <remarks>

    /// Behaviour details

    /// - Returns false and sets <paramref name="detectReason"/> when:

    ///   - The input path is invalid or cannot be inspected.

    ///   - Archive inspection is ambiguous (for example tied JSON votes).

    ///   - No matching converter mapping or requirement rules are found.

    /// - Detection steps (high-level):

    ///   1. If the input looks like an archive (see <see cref="ConverterUtils.IsArchiveFile"/>):

    ///      a. Use <see cref="ConverterUtils.TryListArchiveEntries"/> to obtain entry names (no extraction).

    ///      b. Build a set of discovered extensions / markers and apply fast "wins" (explicit .geojson/.esrijson).

    ///      c. Apply the KMZ guard (outer .kmz or top-level doc.kml => Kmz).

    ///      d. If archive contains only generic .json entries, open each .json entry and perform bounded header reads

    ///         (see <c>ReadEntryHeadUtf8</c>) and classify via <c>ClassifyJsonHeader</c>; then apply majority voting.

    ///      e. Otherwise apply strict requirement matching against <see cref="_s_archiveRequirements"/>.

    ///   2. For single-file inputs:

    ///      a. Use explicit extension mapping for .geojson and .esrijson.

    ///      b. For generic .json files invoke <c>JsonFormatDetector.DetectFromFile</c> (if available) then fall back

    ///         to a bounded header read + <c>ClassifyJsonHeader</c>.

    ///      c. Map the detected JSON format to a converter key (GeoJson / EsriJson / GeoJsonSeq / TopoJson).

    ///

    /// Safety and IO

    /// - This method avoids extracting archive contents. When entry-stream reads are required they are bounded

    ///   to <c>HeaderReadLimit</c> bytes and performed via streaming to minimize memory usage.

    /// - Unexpected exceptions are caught; the method logs details and returns false with a detect reason describing the problem.

    /// </remarks>

4.Attached files:

ConverterFactoryInputExtensions.txt

JsonFormatDetector.txt

ConverterFactoryInputExtensionsTests.txt

Thanks in advance,

Developer technologies | C#
Developer technologies | C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Michael Le (WICLOUD CORPORATION) 6,420 Reputation points Microsoft External Staff Moderator
    2025-12-12T07:38:33.1533333+00:00

    Hello @Dani_S ,

    You could use this file ConverterFactoryInputExtensions_refactored.txt as a reference. Please read and make the changes as per your requirements.

    Your overall approach is solid. However, I would like to point out a few issues that could cause problems:

    • Your current ClassifyJsonHeader uses simple substring searches on truncated headers. This might break when JSON is minified or has unusual whitespace. Also, the logic also doesn’t check whether enough bytes have been read to make a reliable determination.
    • When voting on JSON entries in an archive, tied results simply return false. Since you’re already doing the work to read those entries, implementing a simple tiebreaker (such as alphabetical ordering) would make detection more robust.
    • You have format-to-extension mappings in both _s_extensionToConverter and _s_archiveRequirements. Maintaining two separate dictionaries increases the risk of inconsistencies over time.
    • Some paths log and return false, others catch and swallow exceptions, and some let errors bubble up. Choose one pattern and apply it consistently throughout.
    • The method doesn’t check whether the file actually exists before trying to process it. Also consider what happens with zero‑byte files, files without extensions, or symlinks.

    Suggested refactoring

    I recommend consolidating your format definitions into a single descriptor class that holds file extensions, archive requirements, and other metadata:

    
    private static readonly Dictionary<string, FormatDescriptor> _s_formats = new Dictionary<string, FormatDescriptor>(StringComparer.OrdinalIgnoreCase)
    {
        { "GeoJson", new FormatDescriptor("GeoJson", new[] { ".geojson" }, new[] { ".geojson" }) },
        { "EsriJson", new FormatDescriptor("EsriJson", new[] { ".esrijson" }, new[] { ".esrijson" }) },
        { "GeoJsonSeq", new FormatDescriptor("GeoJsonSeq", new[] { ".jsonl", ".ndjson" }, Array.Empty<string>()) },
        { "TopoJson", new FormatDescriptor("TopoJson", new[] { ".topojson" }, Array.Empty<string>()) },
        { "Kml", new FormatDescriptor("Kml", new[] { ".kml" }, new[] { ".kml" }) },
        { "Kmz", new FormatDescriptor("Kmz", new[] { ".kmz" }, new[] { ".kml" }) }, // archive requirement is inner .kml
        { "Shapefile", new FormatDescriptor("Shapefile", new[] { ".shp" }, new[] { ".shp", ".shx", ".dbf" }) },
        { "Osm", new FormatDescriptor("Osm", new[] { ".osm" }, new[] { ".osm" }) },
        { "Gpx", new FormatDescriptor("Gpx", new[] { ".gpx" }, new[] { ".gpx" }) },
        { "Gml", new FormatDescriptor("Gml", new[] { ".gml" }, new[] { ".gml" }) },
        { "Gdb", new FormatDescriptor("Gdb", new[] { ".gdb" }, new[] { ".gdb" }) },
        { "MapInfoInterchange", new FormatDescriptor("MapInfoInterchange", new[] { ".mif" }, new[] { ".mif" }) },
        { "MapInfoTab", new FormatDescriptor("MapInfoTab", new[] { ".tab", ".map", ".dat", ".id" }, new[] { ".tab", ".dat", ".map", ".id" }) },
        { "Csv", new FormatDescriptor("Csv", new[] { ".csv" }, new[] { ".csv" }) },
        { "GeoPackage", new FormatDescriptor("GeoPackage", new[] { ".gpkg" }, new[] { ".gpkg" }) },
    };
    
    private class FormatDescriptor
    {
        public string Name { get; }
        public string[] FileExtensions { get; } // extensions that identify this format for single files
        public string[] ArchiveRequirements { get; } // extensions that MUST be present in an archive
        public FormatDescriptor(string name, string[] fileExts, string[] archiveReqs)
        {
            Name = name;
            FileExtensions = fileExts ?? Array.Empty<string>();
            ArchiveRequirements = archiveReqs ?? Array.Empty<string>();
        }
    }
    

    Reduce your header read limit from 64 KB to something smaller, such as 8 KB. Your classification logic doesn’t need that much data, and you’ll save memory when processing multiple archive entries.

    Instead of returning false on tied votes, you could do something like this:

    var maxVotes = votes.Values.Max();
    var winners = votes.Where(kv => kv.Value == maxVotes)
                       .Select(kv => kv.Key)
                       .OrderBy(k => k)  // tiebreaker
                       .ToArray();
    
    var winner = winners.First();
    detectReason = winners.Length > 1 
        ? $"{winner} selected from tie ({string.Join(", ", winners)})"
        : $"{winner} won with {maxVotes} votes";
    

    Split the main method into TryDetectArchiveFormat and TryDetectSingleFileFormat to separate concerns and make unit testing easier.

    Testing recommendations

    You should add tests for:

    • Empty or zero‑byte files
    • Files without extensions
    • Archives with all entries of unknown JSON type
    • Archives with tied JSON votes (to verify the tiebreaker)
    • Corrupted or truncated JSON files
    • Minified JSON (no whitespace)

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.