extract_msg.utils Module

Module contents

extract_msg.utils.addNumToDir(dirName: Path) → Path | None[source]: Attempt to create the directory with a ‘(n)’ appended.

extract_msg.utils.addNumToZipDir(dirName: Path, _zip) → Path | None[source]: Attempt to create the directory with a ‘(n)’ appended.

extract_msg.utils.bitwiseAdjust(inp: int, mask: int) → int[source]

Uses a given mask to adjust the location of bits after an operation like bitwise AND.

This is useful for things like flags where you are trying to get a small portion of a larger number. Say for example, you had the number 0xED (0b11101101) and you needed the adjusted result of the AND operation with 0x70 (0b01110000). The result of the AND operation (0b01100000) and the mask used to get it (0x70) are given to this function and the adjustment will be done automatically.

Parameters:: mask – MUST be greater than 0.
Raises:: ValueError – The mask is not greater than 0.

extract_msg.utils.bitwiseAdjustedAnd(inp: int, mask: int) → int[source]

Preforms the bitwise AND operation between :param inp: and :param mask: and adjusts the results based on the rules of bitwiseAdjust().

Raises:: ValueError – The mask is not greater than 0.

extract_msg.utils.bytesToGuid(bytesInput: bytes) → str[source]: Converts a bytes instance to a GUID.

extract_msg.utils.ceilDiv(n: int, d: int) → int[source]

Returns the int from the ceiling division of n / d.

ONLY use ints as inputs to this function.

For ints, this is faster and more accurate for numbers outside the precision range of float.

extract_msg.utils.cloneOleFile(sourcePath, outputPath) → None[source]

Uses the OleWriter class to clone the specified OLE file into a new location.

Mainly designed for testing.

extract_msg.utils.createZipOpen(func) → Callable[source]: Creates a wrapper for the open function of a ZipFile that will automatically set the current date as the modified time to the current time.

extract_msg.utils.decodeRfc2047(encoded: str) → str[source]: Decodes text encoded using the method specified in RFC 2047.

extract_msg.utils.dictGetCasedKey(_dict: Dict[str, Any], key: str) → str[source]: Retrieves the key from the dictionary with the proper casing using a caseless key.

extract_msg.utils.divide(string: AnyStr, length: int) → List[source]

Divides a string into multiple substrings of equal length.

If there is not enough for the last substring to be equal, it will simply use the rest of the string. Can also be used for things like lists and tuples.

Parameters:

string – The string to be divided.
length – The length of each division.

Returns:

list containing the divided strings.

Example:

>>> a = divide('Hello World!', 2)
>>> print(a)
['He', 'll', 'o ', 'Wo', 'rl', 'd!']
>>> a = divide('Hello World!', 5)
>>> print(a)
['Hello', ' Worl', 'd!']

extract_msg.utils.filetimeToDatetime(rawTime: int) → datetime[source]

Converts a filetime into a datetime.

Some values have specialized meanings, listed below:

915151392000000000: December 31, 4500, representing a null time. Returns an instance of extract_msg.null_date.NullDate.
915046235400000000: 23:59 on August 31, 4500, representing a null time. Returns extract_msg.constants.NULL_DATE.

extract_msg.utils.filetimeToUtc(inp: int) → float[source]: Converts a FILETIME into a unix timestamp.

extract_msg.utils.findWk(path=None)[source]

Attempt to find the path of the wkhtmltopdf executable.

Parameters:: path – If provided, the function will verify that it is executable and returns the path if it is.
Raises:: ExecutableNotFound – A valid executable could not be found.

extract_msg.utils.fromTimeStamp(stamp: float) → datetime[source]: Returns a datetime from the UTC timestamp given the current timezone.

extract_msg.utils.getCommandArgs(args: Sequence[str]) → Namespace[source]

Parse command-line arguments.

Raises:

IncompatibleOptionsError – Some options were provided that are incompatible.
ValueError – Something about the options was invalid. This could mean an option was specified that requires another option or it could mean that an option was looking for data that was not found.

extract_msg.utils.guessEncoding(msg: MSGFile) → str | None[source]

Analyzes the strings on an MSG file and attempts to form a consensus about the encoding based on the top-level strings.

Returns None if no consensus could be formed.

Raises:: DependencyError – chardet is not installed or could not be used properly.

extract_msg.utils.htmlSanitize(inp: str) → str[source]

Santizes the input for injection into an HTML string.

Converts charactersinto forms that will not be misinterpreted, if necessary.

extract_msg.utils.inputToBytes(obj: bytes | None | str | SupportsBytes, encoding: str) → bytes[source]

Converts the input into bytes.

Raises:

ConversionError – The input cannot be converted.
UnicodeEncodeError – The input was a str but the encoding was not valid.
TypeError – The input has a __bytes__ method, but it failed.
ValueError – Same as above.

extract_msg.utils.inputToMsgPath(inp: str | List[str] | Tuple[str]) → List[str][source]

Converts the input into an msg path.

Raises:: ValueError – The path contains an illegal character.

extract_msg.utils.inputToString(bytesInputVar: str | bytes | None, encoding: str) → str[source]

Converts the input into a string.

Raises:: ConversionError – The input cannot be converted.

extract_msg.utils.isEncapsulatedRtf(inp: bytes) → bool[source]

Checks if the RTF data has encapsulated HTML.

Currently the detection is made to be extremly basic, but this will work for now. In the future this will be fixed so that literal text in the body of a message won’t cause false detection.

extract_msg.utils.makeWeakRef(obj: _T | None) → weakref.ReferenceType[_T] | None[source]: Attempts to return a weak reference to the object, returning None if not possible.

extract_msg.utils.msgPathToString(inp: str | Iterable[str]) → str[source]: Converts an MSG path (one of the internal paths inside an MSG file) into a string.

extract_msg.utils.parseType(_type: int, stream: int | bytes, encoding: str, extras: Sequence[bytes])[source]

Converts the data in :param stream: to a much more accurate type, specified by :param _type:.

Parameters:

_type – The data’s type.
stream – The data to be converted.
encoding – The encoding to be used for regular strings.
extras – Used in the case of types like PtypMultipleString. For that example, extras should be a list of the bytes from rest of the streams.

Raises:

NotImplementedError – The type has no current support. Most of these types have no documentation in [MS-OXMSG].

extract_msg.utils.prepareFilename(filename: str) → str[source]: Adjusts :param filename: so that it can succesfully be used as an actual file name.

extract_msg.utils.roundUp(inp: int, mult: int) → int[source]: Rounds :param inp: up to the nearest multiple of :param mult:.

extract_msg.utils.rtfSanitizeHtml(inp: str) → str[source]: Sanitizes input to an RTF stream that has encapsulated HTML.

extract_msg.utils.rtfSanitizePlain(inp: str) → str[source]: Sanitizes input to a plain RTF stream.

extract_msg.utils.setupLogging(defaultPath=None, defaultLevel=30, logfile=None, enableFileLogging: bool = False, env_key='EXTRACT_MSG_LOG_CFG') → bool[source]

Setup logging configuration

Parameters:

defaultPath – Default path to use for the logging configuration file.
defaultLevel – Default logging level.
env_key – Environment variable name to search for, for setting logfile path.
enableFileLogging – Whether to use a file to log or not.

Returns:

True if the configuration file was found and applied, False otherwise

extract_msg.utils.tryGetMimetype(att: AttachmentBase, mimetype: str | None) → str | None[source]

Uses an optional dependency to try and get the mimetype of an attachment.

If the mimetype has already been found, the optional dependency does not exist, or an error occurs in the optional dependency, then the provided mimetype is returned.

Parameters:

att – The attachment to use for getting the mimetype.
mimetype – The mimetype acquired directly from an attachment stream. If this value evaluates to False, the function will try to determine it.

extract_msg.utils.unsignedToSignedInt(uInt: int) → int[source]

Convert the bits of an unsigned int (32-bit) to a signed int.

Raises:: ValueError – The number was not valid.

extract_msg.utils.unwrapMsg(msg: MSGFile) → Dict[str, List][source]

Takes a recursive message-attachment structure and unwraps it into a linear dictionary for easy iteration.

Dictionary contains 4 keys: “attachments” for main message attachments, not including embedded MSG files, “embedded” for attachments representing embedded MSG files, “msg” for all MSG files (including the original in the first index), and “raw_attachments” for raw attachments from signed messages.

extract_msg.utils.unwrapMultipart(mp: bytes | str | Message) → Dict[source]

Unwraps a recursive multipart structure into a dictionary of linear lists.

Similar to unwrapMsg, but for multipart. The dictionary contains 3 keys: “attachments” which contains a list of dicts containing processed attachment data as well as the Message instance associated with it, “plain_body” which contains the plain text body, and “html_body” which contains the HTML body.

For clarification, each instance of processed attachment data is a dict with keys identical to the args used for the SignedAttachment constructor. This makes it easy to expand for use in constructing a SignedAttachment. The only argument missing is “msg” to ensure this function will not require one.

Parameters:: mp – The bytes that make up a multipart, the string that makes up a multipart, or a Message instance from the email module created from the multipart data to unwrap. If providing a Message instance, prefer it to be an instance of EmailMessage. If you are doing so, make sure it’s policy is default.

extract_msg.utils.validateHtml(html: bytes) → bool[source]

Checks whether the HTML is considered valid.

To be valid, the HTML must, at minimum, contain an <html> tag, a <body> tag, and closing tags for each.

extract_msg.utils.verifyPropertyId(id: str) → None[source]

Determines whether a property ID is valid for certain functions.

Property IDs MUST be a 4 digit hexadecimal string. Property is valid if no exception is raised.

Raises:: InvalidPropertyIdError – The ID is not a 4 digit hexadecimal number.

extract_msg.utils.verifyType(_type: str | None) → None[source]

Verifies that the type is valid.

Raises an exception if it is not.

Raises:: UnknownTypeError – The type is not recognized.