GNOME Bugzilla – Bug 570143
Read first 8 bytes of a file to identify file type
Last modified: 2012-09-10 07:51:04 UTC
Would be better to read into the file to identify it instead of rely on extension. (1st 8 bytes) Ubuntu/nautilis is using the extension... :( That way any weirdly named files are access properly without having to rename them. (CD/RO) Other information: function is_filebinary(gzipfile : string) : boolean; var tb : boolean; maxtoread, i : byte; b : array [1..10] of byte; gzfile : file of byte; begin for i := 1 to 10 do b[i] := 0; tb := false; assign(gzfile, gzipfile); {$I-} reset(gzfile); {$I+} if ioresult <> 0 then exit; if filesize(gzfile) >= 10 then maxtoread := 10 else maxtoread := filesize(gzfile); for i := 1 to maxtoread do begin read(gzfile, b[i]); end; close(gzfile); { exit;} for i := 1 to maxtoread do begin if (b[i] < 32) or (b[i] > 126) then begin if (b[i] = 10) or (b[i] = 13) then else tb := true; end; end; is_filebinary := tb; end;
Created attachment 127738 [details] ftype.c util ftype.c compile with gcc ftype.c -o ftype this version only detects com/exe/elf and ascii text.
We don't do this in generasl as it has several problems: a) It is extremely expensive, one extra HD seek per file b) Its uncontrollable by the user, i.e. if it gets things wrong there is nothing the user can do. We do however sniff the filetype in various cases like when the extension is unknown or known to be used by multiple types. This is all defined in the freedesktop.org mimetype specification, and any special cases, conflicts, priorities, etc are in the freedesktop.org mimetype database.