Import

The Python import statement is the normal way in which a main program or a module gains access to objects defined in some other module.

Jython reproduces Python 2 import, but gives extended meaning to import statements in order to support Java packages. This chapter is devoted to the uneasy coexistence of Java packages, and the required semantics of Python import.

Python 2 has support for certain special cases built in, and hooks for user extension. Information on module import is somewhat scattered in Python 2 documentation, and it was written as the CPython reference implementation grew over many years and several PEPs.

Python 3 documentation is clearer on the required semantics, and the implementation makes more regular use of these hooks, but Python 3 documentation cannot be applied directly to Python 2.

Differences from CPython

CPython allows a module to be defined in a C extension. It will treat a shared library (or DLL) as the definition of a module. Jython does not support extensions in C.

Jython is able to import a Java package or class as a Python module.

Treatment of Java packages and classes

General objective

Jython allows us to use Java classes as types in Python programs. The fully-qualified (dotted) name of a Java package or class closely resembles to that of a Python package or module, both where it occurs in a Python (or Java) import statement, and when we refer to it. Both languages allow for binding just the short name, in Python by the form from a.b.c import D, and in Java because import a.b.c.D does that automatically. It is natural to use what seems to be a common idiom, and to expect it to mean the same thing.

There are some differences, however.

In Python, when we import a package or module, doing so initialises it, and all its ancestor packages. In Java we cannot import or initialise a package, and the import of a class merely declares its name. In Java a class is initialised on first use.

In Python, the import makes the top-level package appear in our namespace. Thus import a.b.c initilises objects a, a.b and a.b.c, and binds a variable a to package object a. In Java, a package is little more than a name grouping classes, not an object.

In Python, a is a package only if the special file a/__init__.py, or its compiled form, exists. This file is executed to initialise an object a, the first time it is imported into a given interpreter. Without that file, for all that a/b.py may exist in the flesystem, import a.b will fail. Java packages contain no such marker, yet Jython must recognise them as (something like Python) packages.

In Python import a.b.c.D expects D to be a module (which may be a package too). If a.b.c is a Java package, then import a.b.c.D in Jython may be expected to make D a module, but this is not so. D may indeed be intended as a Jython module, but it will appear as an ordinary class within a.b.c. (Examine for example the exact type of the math module.)

In Python import a.b.c could contain an object D (a class or anything else) accessible after import a.b.c as the expression a.b.c.D. However, if D is a module, it should not be accessible until explicitly imported. Indeed, a.b is only meaningful because a.b is imported on the way to a.b.c.

Basic behaviour of Jython

In order to deal with these differences, when Jython imports a Java package, it represents it as a particular type javapackage, distinct from module. It has different behaviour from module.

Importing a javapackage effectively enumerates the classes and packages within it. (This enumeration may in practice be read from a cache.) A plain import statement binds the first name in the path, and if this is a javapackage, that name is bound to a structure navigable by the dotted names as attributes, leading to successive packages and classes. Thus import java.lang binds java to a structure in which java.nio.file.FileSystem is automatically available.

A problem arises when a given dotted name defines both a Java package and a Python package. The package may be realised as a sequence of directories, somewhere on the module search paths, which is made a package by the presence of __init__.py modules, but also contains Java .class files. Which of the two behaviours should apply: that of module or that of javapackage?

Choices in this area have been made and reverted several times as the next sections show.

Behaviour of Jython 2.0, 2.5.2, 2.7.1

In these versions of Jython, in order to make D accessible as a.b.c.D after import a.b.c, when a.b.c is a Python package, the attribute access method of PyModule extends the normal look-up. In the evaluation of a.b.c.D, when Jython looks up attribute D on a.b.c, if it is missing, it tries to import a.b.c.D. If a file a/b/c/D.class exists, that class is loaded and becomes the meaning of a.b.c.D. At the Python level, D is a class (a type) defined within package a.b.c.

However, if a/b/c/D.py exists instead, that module is loaded and becomes the meaning of a.b.c.D, which is incorrect for Python without a preceding explicit import.

Behaviour of Jython 2.5.0, 2.5.3 and 2.7.0

In these versions the project has removed “the magic that automatically imports a module”. The Python behaviour is now correct with these mixed directories, but those users who find it convenient to put related Python modules and Java classes in a single namespace are not able to find the Java classes (and regard this as a regression).

Behaviour of Jython approaching 2.7.2

Currently (in Jython 2.7.1+) a compromise behaviour is implemented, under which a module imports a sub-module automatically only if it is a Java package or class.

Possible behaviour eventually

Jython implements the required treatment of Python packages and modules (approaching Jython 2.7.2), which is a top-down discovery guided by the presence of __init__.py files.

The way in which Jython accepts Java packages and classes contains some surprises. For example mixed Python and Java packages are supported, but not if the directory is empty of class files: classes in a sub-package are not found. A mixed package found first as a Java package may change to a Python one later. (Python 3 documentation says “Python has only one type of module object, and all modules are of this type, regardless of whether the module is implemented in Python, C, or something else”.)

One cannot say strictly that Jython’s behaviour is incorrect, since correct behaviour is not defined clearly for us. It is probably not as intended.

As for the implementation, Java-specific processing is attached to the Python top-down strategy, by inserting calls to the package manager. There are a lot of these additions, and yet it seems not quite to catch all intended cases.

Suggested behaviour (for some later Jython) is:

  • There is one module type that may have Python or Java package character, or both.
  • Python character is discovered top-down (as standard).
  • Java character is discovered bottom-up:
    • when the package contains a class directly (with a valid name), or
    • when the package contains a Java package (that is, a class recursively).
  • Java classes and sub-packages may be discovered on first reference (automagically as now).

In the implementation, the Python and Java search strategies would be separate, but the results for a given package fused the single module dictionary.

In order that classes and sub-packages be available as attributes, and the programmer have control over visibility, it is desirable that class and sub-package visibility be controlled using __all__. However, classes created or newly discoverable after the module initialisation pose a problem. (It may be only the same problem as Python modules created or visible after that time.)

Sequence of events in Jython 2.7.2a1+

The following notes are from studying the operation of import mechanisms in, to be precise, Jython 2.7.2a1+ at change set 70493dc7d396 2019-03-16 13:09:15.

A built-in module (exceptions)

The import of exceptions occurs naturally during the initialisation of PySytemState. Unlike sys or __builtins__ it is a regular built-in module. Action begins with a call to org.python.core.Py.initClassExceptions(PyObject):

static void initClassExceptions(PyObject dict) {
    PyObject exc = imp.load("exceptions");

imp.load takes the lock that protects the import system from concurrent modification, and calls import_first(name, new StringBuilder()). This is the short form of import_first. A longer form supports from ... import ..., but both call import_next to get their work done.

In import_next we at last encounter some real import logic:

  1. Check for the module (by its fully-qualified name) in sys.modules.
  2. Try to load the module via find_module or find it as an attribute of the parent module via its impAttr method.
  3. Try to load the module as a Java package.

The first that succeeds here gives us our result. exceptions has not already been imported into the Python interpreter, so it doea not appear in sys.modules. In our case, exceptions has no parent, and find_module will succeed.

find_module also contains some import logic we will visit often:

  1. Offer the fully qualified name to each importer on sys.meta_path.
  2. Attempt to load the module as a built-in (a Java class).
  3. Look along sys.path for a definition of the module.
  4. Consider whether fullName might be a Java package.

exceptions is a built-in module, so the second option will find it for us using loadBuiltin.

This is fairly straightforward, since initialisation in Setup has already created a map PySystemState.builtins, from the fully-qualified name of each built-in module to a class name for its implementation. "exceptions" is a key in that map, of course, for "org.python.core.exceptions".

loadBuiltin has only to load and initialise that Java class, via Py.findClassEx. This is not quite aa simple as it looks, since Py.findClassEx has a choice of class loaders. The normal choice seems to be to load via org.python.core.SyspathJavaLoader.

The last step is to register that class (since it is a PyObject) as a Python type through a call to PyType.fromClass(c). This type object is what we ultimately return from imp.load.

On the way out import_next is responsible for posting this in sys.modules, against the key "exceptions":

>>> import sys
>>> sys.modules["exceptions"]
<type 'exceptions'>

This is slightly at variance with CPython where it shows as <module 'exceptions' (built-in)> and is definitely of type module.

A Python module in a Python program

In a fresh interpreter session, with an appropriately prepared path down to mylib/a/b/c/m.py:

>>> import sys; sys.path[0] = 'mylib'
>>> import a.b.c.m

should (and does) result in:

  • the execution of mylib/a/__init__.py, mylib/a/b/__init__.py, mylib/a/b/c/__init__.py, and mylib/a/b/c/m.py,

  • the creation of module entries in sys.modules with keys "a", "a.b", "a.b.c", and "a.b.c.m". and

  • the binding of variable a to the module a in the interpreter, such that we have:

    >>> a.b.c.m
    <module 'a.b.c.m' from 'mylib\a\b\c\m.py'>
    

The tortuous logic for this may be traced in imp.java.

The original import a.b.c.m compiles to a call to imp.importOne("a.b.c.m", <current frame>, -1)) which calls the overridable built-in effectively as __import__("a.b.c.m", globals(), locals(), None, -1). The None here is the “from-list” (as this is not e.g. from a.b.c import m), and it means we shall eventually return the module a for binding. The -1 in the calls is the level argument, set by the compiler, signifying a Python 2 style of search for a: first relative to the current module, then as absolute. Action transfers now to import_module_level, with essentially these arguments: name = "a.b.c.m", top = true, modDict = globals(), fromlist = None, level = -1.

The first part of the logic is in helper get_parent(), which has access to the globals of the importing module and the level. Note that the “parent” in question is the package of the importing module, or some ancestor of it according to the level argument, and not of the imported module.

In this case, get_parent finds that the console session is in no __package__ and has the module __name__ = "__main__". __path__ is not set either (so it’s not a package). "__main__" is not a dotted module name and level = -1, so there is no parent name to return (return null). Relative import is not possible at the top level (as we are). By a side effect, the __package__ of the importing module is set here to None. On return to import_module_level we have pkgName = null and pkgMod = null, characterising top-level import.

First package

Import begins with an attempt at importing the first package in the name "a.b.c.m", at the fragment:

StringBuilder parentName =
        new StringBuilder(pkgMod != null ? pkgName : "");
PyObject topMod =
        import_next(pkgMod, parentName, firstName, name, fromlist);

In the example, name = "a.b.c.m" and firstName = "a". import_next has the side-effect of adding "a" to the parentName buffer. Within import_next we check to see if module a is already loaded in sys.modules, in which case we may return directly. If that is not the case, we have to load a. This is attempted via a call to find_module(fullName, name, null), where here fullName = name = "a".

find_module expresses the standard Python module import logic applied to one requested module. We have already described (in A built-in module (exceptions)) the logic:

  1. Offer the fully qualified name to each importer on sys.meta_path.
  2. Attempt to load the module as a built-in (a Java class).
  3. Look along sys.path for a definition of the module.
  4. Consider whether fullName might be a Java package.

In this case the third option is the operative one. We put mylib on sys.path at the start, and since it needs no special importer in sys.path_hooks, we land at the call:

ret = loadFromSource(sys, name, moduleName, Py.fileSystemDecode(p));

where name = moduleName = "a" and p = "mylib". sys.path entries like p are usually a PyString, so p needs to be decoded.

loadFromSource is not well named. It will look for any of:

  • mylib/a.py
  • mylib/a$py.class
  • mylib/a/__init__.py
  • mylib/a/__init__$py.class

It will prefer the compiled versions as long as the Mtime attribute in them, which preserves the last modified time of the source when it was compiled, matches that of the corresponding source. The approximate order of events in loadFromSource (for a package a) is:

  1. Decide that a is a package.
  2. Compile the source mylib/a/__init__.py to byte code for class a$py, or read the byte code from mylib/a/__init__$py.class (if current).
  3. Load (and Java-initialise) the class a$py into the JVM.
  4. Construct an instance of a a$py and call its PyRunnable.getMain() to obtain a PyCode for the main body of a.
  5. Create a module representing package a (with __path__ set to ["mylib/a"]).
  6. Execute the PyCode against the module’s __dict__ as both globals() and locals().

A lot of this activity is the responsibility of supporting methods we do not detail here.

Notice that the PyCode is not needed (becomes garbage) once it has been executed. The permanent results of loading are the changes made to the module __dict__. This may include the definition of Python functions and classes that have their own PyCode objects and other data as embedded values. After this, we return the module a from import_next into import_module_level, and this establishes the “top module” of the import.

Subsequent packages

In the example of import a.b.c.m, we have imported a, but we have a long way still to go to before we reach m.py. The next significant call in import_module_level is:

mod = import_logic(topMod, parentName,
        name.substring(dot + 1), name, fromlist);

where dot is the position of the first dot in the full name. import_logic has responsibility for completing the import of the module chain down to m. The signature is:

static PyObject import_logic(PyObject mod, StringBuilder parentName,
        String restOfName, String fullName, PyObject fromlist)

Internally import_logic loops over the elements of the module path, loading each package, ending with the module defined by mylib/b/c/m.py. We enter with mod = <module 'a' from 'mylib/a/__init__.py'>, parentName = "a", restOfName = "b.c.m", fullName = "a.b.c.m", and fromlist = None.

During the iteration, import_next, already discussed above, is repeatedly called, as:

mod = import_next(mod, parentName, name, fullName, fromlist);

and as we move down the chain from one module to the next, parentName becomes the name of the enclosing module (first time "a"), and name is the simple name of the next module sought (first time "b").

As previously, import_next looks for the target by its full name (here "a.b") in sys.modules. In this call, mod is not null as it was in the top-level, but is the module a, and so import_next will look up b via mod.impAttr(name.intern()).

PyModule.impAttr deduces the full name "a.b" (in getFullName) and using findSubModule tries to find it as a Python sub-module. findSubModule checks sys.modules for it (again), gets the package __path__ (or makes an empty one), and then seeks the package via essentially attr = imp.find_module("b", "a.b", ['mylib/a']). This differs from the call import_next might have made, only in the non-null “path”.

Note

The actions of PyModule.impAttr appear largely to duplicate those in import_next around where impAttr() is called. A check is made for the module in sys.modules duplicating the one before the call. impAttr checks to see if the module sought is a Java package, which import_next also does (although the call is to JavaImportHelper.tryAddPackage). Any entry made in sys.modules during the import is allowed to supersede the result, first in impAttr, then again in import_next.

The search strategy of find_module has already been described (try sys.meta_path, built-in modules, the path and path hooks) except that in this case, the parent package’s __path__ is used, not sys.path. There is no path hook corresponding to mylib/a, so loadFromSource, correctly deducing a.b to be a package, ends up compiling and executing mylib/a/b/__init__.py, or reading and executing mylib/a/b/__init__$py.class. The module this creates is eventually returned to the loop within import_logic.

The next pass of that loop imports a.b.c by an exactly parallel process.

The import of a.b.c.m is almost the same except that the module is not a package. This final import_next gets us out of the loop in import_logic, which returns the module a.b.c.m. However, import_module_level has remembered that we wanted the first in the chain, ultimately because there is no “from-list”, and finally we return through importOne into the compiled instruction with <module 'a' from 'mylib\a\__init__.py'> for assignment to a.

Note

The Jython versions of get_parent and import_module_level differ from the CPython ones. The difference in signatures may not be that significant, representing the choice to return a String rather than update a char * buffer. Differences in structure and logic may be significant.

A Python module by relative import

Suppose we have a module file mylib/a/b/m3.py like this:

print "Executed: ", __file__
import c.m
print repr(c.m)

where, as previously, the path down to mylib/a/b/c/m.py is prepared with __init__.py files to create packages a, a.b and a.b.c. From the form of the import (in Python 2) we should expect the imported module c.m to be defined either in mylib/a/b/c/m.py or in mylib/c/m.py (or an equivalent to mylib elsewhere on sys.path). We know it will be found at mylib/a/b/c/m.py, but the compiler didn’t, and so this has to be a run-time discovery.

In a fresh interpreter session, with an appropriately prepared path down to mylib/a/b/c/m.py:

>>> import sys; sys.path[0] = 'mylib'
>>> import a.b.m3
Executed:  mylib\a\__init__$py.class
Executed:  mylib\a\b\__init__$py.class
Executed:  mylib\a\b\m3.py
Executed:  mylib\a\b\c\__init__$py.class
Executed:  mylib\a\b\c\m$py.class
<module 'a.b.c.m' from 'mylib\a\b\c\m$py.class'>

The action begins with an absolute import as already studied (in A Python module in a Python program), but as the body of m3 is executed, during createFromCode, we reach the implicitly relative import c.m. We take up the story there.

The compiler has translated import c.m into imp.importOne("c.m", <current frame>, -1), just as before. Action transfers now to import_module_level, with essentially these arguments: name = "c.m", top = true, modDict = a.b.m3.globals(), fromlist = None, level = -1.

The first part of the logic is in helper get_parent, which has to work out the parent module name using globals() and level as input. get_parent works out that the package name of the importing module a.b.m3 is a.b. (It sets the module’s __package__ to this by side-effect, since it was missing.)

If a.b.m3 had been a package itself, its name would be the package name. If level had been a positive number, signifying that the compiler saw so-many dots before the name in the import statement, get_parent would have stripped a further level-1 elements from the package name. It would be an error at this point for "a.b" not to be a key in sys.modules.

From this point, import_module_level is able to call the equivalent of topMod = import_next(<module 'a.b'>, "a.b", "c", "c.m") which succeeds in importing c relative to a.b. Previously we encountered almost exactly this call as part of a top-level import a.b.c.m, in the loop of import_logic, so we don’t need to walk through it again. However, notice that c is going to become the head module returned by import_module_level, and ultimately returned to the body code of m3, to be assigned to a.b.m3.__dict__['c'].

A Python module by absolute import

Suppose we have a module file mylib/a/b/m4.py like this:

print "Executed: ", __file__
import sys
print repr(sys)

It is perfectly obvious to any Python programmer that we are importing the top-level sys module. But the form of the import is exactly the same as it was for an implicit relative import. The import mechanism has to be ready for the possibility that a.b.sys is the module intended.

In a fresh interpreter session, with an appropriately prepared path down to mylib/a/b/c/m.py:

>>> import sys; sys.path[0] = 'mylib'
>>> import a.b.m4
Executed:  mylib\a\__init__$py.class
Executed:  mylib\a\b\__init__$py.class
Executed:  mylib\a\b\m4.py
<module 'sys' (built-in)>

The action begins with an absolute import as already studied (in A Python module in a Python program), but as the body of m4 is executed, during createFromCode, we reach the (possibly) implicitly relative import sys. We take up the story there.

The compiler has translated import sys into imp.importOne("sys", <current frame>, -1). We arrive in import_module_level, with essentially these arguments: name = "sys", top = true, modDict = a.b.m4.globals(), fromlist = None, level = -1. The helper get_parent works out that the package name of the importing module a.b.m4 is a.b.

The relative import attempt

import_module_level now calls the equivalent of topMod = import_next(<module 'a.b'>, "a.b", "sys", "sys"), in order to search out a module called a.b.sys. Unlike previous examples, this is not going to succeed, and it is worth following the steps by which Jython decides no such module exists.

import_next first checks sys.modules for the key a.b.sys. No such module has been imported, and so it moves on.

Because there is an enclosing module, import_next calls PyModule.impAttr. In impAttr, the check in sys.modules fails (again), so it calls effectively attr = imp.find_module("sys", "a.b.sys", ['mylib/a/b']).

Within find_module, there is nothing on sys.meta_path, and loadBuiltin doesn’t find "a.b.sys" as a key in the built-in module table, so it searches the path of the parent module. a.b.__path__ == ['mylib/a/b']. The attempt in loadFromSource fails to find any of:

  • mylib/a/b/sys.py
  • mylib/a/b/sys$py.class
  • mylib/a/b/sys/__init__.py
  • mylib/a/b/sys/__init__$py.class

and so find_module returns null to impAttr.

Now, impAttr tries to find a.b.sys as a Java class or package:

attr = PySystemState.packageManager.lookupName(fullName);

This begins by looking up "a" in the nameless top-level Java package. PySystemState.packageManager keeps a tree-like index built by scanning the Java run-time and JARs on the class path, with the nameless package at its root. The nodes are PyJavaPackage objects (a subclass of PyObject), and so in looking up a as an attribute, we land at PyJavaPackage.__findattr_ex__. There is no matching key in the __dict__ of the root PyJavaPackage, but each PyJavaPackage has a PackageManager __mgr__ member that will search for a Java package by that name, via a call to __mgr__.packageExists.

In this case packageExists looks in the current working directory but there is no directory ./a. (If there were, the manager would add it to the index.) It then moves on to do the same for entries along sys.path. mylib/a exists, but contains no (non-Python) Java class files. (Interestingly, having found mylib/a, the manager does not look for any further places on sys.path that might be Java packages.)

PyJavaPackage.__findattr_ex__ then consults __mgr__.findClass to see if "a" designates a Java class (rather than a package). Going via Py.findClass, and Py.loadAndInitClass, it tries to load and initialise a class "a".

It doesn’t exist, so __findattr__ returns empty-handed in lookupName, as does impAttr to import_next. Then in import_next, the module a.b has no sys attribute.

import_next will now try to find "sys" as a Java package through a call that is effectively JavaImportHelper.tryAddPackage("sys", Py.None).

Note

It seems odd that at this point, while import_module_level is still following the hypothesis that import sys refers to fullname = "a.b.sys", that we should suddenly go looking for the absolute outerFullName = "sys".

The operation of tryAddPackage in this case falls through to looking for sys as a package in the JVM. For this purpose it asks for a list of all packages currently known to the JVM and builds a TreeMap with the packages, and their containing packages (e.g. {java, java.lang, java.lang.invoke}). If "sys" were among the keys, tryAddPackage would try to add a module to sys.modules for it.

There isn’t one, of course, which finally allows import_next to conclude that there is no a.b.sys module, and report this back into import_module_level.

The absolute import attempt

An implicit relative import having failed, import_module_level decides that the proper parentName is an empty string. It now calls the equivalent of topMod = import_first("sys", "", "sys", None), in order to search out a top-level module called sys.

import_first delegates to import_next which quickly finds sys in sys.modules. If we had chosen in m4.py to import a module not already loaded, a chain of events would unfold like that already described for A Python module in a Python program.

Recall this import of sys occurred during exewcution of a.b.m4. In that module, variable sys becomes bound to <module 'sys' (built-in)>, execution of the module completes, and then import of a.b.m4 itself completes, returning a to the console session.

A Python module by from-import

Suppose we have a module file mylib/a/b/m5.py like this:

print "Executed: ", __file__
from a.b.c import m, n1, n2
print repr(m)
print n1, n2

The interesting thing about this import is that n1 and n2 are conventional attributes of package a.b.c, while m is a module that only becomes an attribute once it is imported, as a result of this statement. In order to set up the expected attributes, let mylib/a/b/c/__init__.py be:

print "Executed: ", __file__
n0, n1, n2, n3, n4 = range(5)

where, as previously, the path down to mylib/a/b/c/m.py is prepared with __init__.py files to create packages a, a.b and a.b.c.

In a fresh interpreter session:

>>> import sys; sys.path[0] = 'mylib'
>>> import a.b.m5
Executed:  mylib\a\__init__$py.class
Executed:  mylib\a\b\__init__$py.class
Executed:  mylib\a\b\m5.py
Executed:  mylib\a\b\c\__init__$py.class
Executed:  mylib\a\b\c\m$py.class
<module 'a.b.c.m' from 'mylib\a\b\c\m$py.class'>
1 2

The import activity as far as invoking m5 is as already studied (in A Python module in a Python program), but as the body of m5 is executed, during createFromCode, we reach from a.b.c import m, n1, n2. We take up the action in import_module_level where name = "a.b.c", level = -1 and fromlist = ('m', 'n1', 'n2') (a tuple).

The relative import attempt

a.b.c may be an implicit relative import. get_parent works out that the package name of the importing module a.b.m5 is a.b. Therefore our first hypothesis is that we’re looking for a.b.a.b.c, and so our first call is to import_next with mod = "a.b" and name = "a". The action unfolds roughly as described in The relative import attempt, except the presence of the from-list is significant.

import_next looks for a.b.a in sys.modules, without success, then for a relative to a.b using PyModule.impAttr, involving:

  • look for a.b.a in sys.modules (again),
  • look for a within a.b via find_module, that is:
    • chack for sys.meta_path importers,
    • look for a.b.a as a built-in
    • look for mylib/a/b/a.py etc.
  • look for a.b.a as a Java package via PySystemState.packageManager.lookupName(“a.b.a”), which entails:
    • look for a as an attribute of the top-level unnamed package, that is:
      • look for ./a.
      • find mylib/a on sys.path (but Python code means it is not a Java package).
      • look for a as a Java class via __mgr__.findClass.

None of this succeeds, so PyModule.impAttr has not found a.b.a.

import_next makes another attempt via the call JavaImportHelper.tryAddPackage(outerFullName, fromlist), where outerFullName = "a.b.c" and fromlist = ('m', 'n1', 'n2').

Note

import_module_level called import_next to look for a (the first element of a.b.c`) relatively within a.b. But at this point Jython has stopped looking for a.b.a and is now working on a.b.c absolutely (albeit as a Java package). This is out of place in the context of the logic of import_module_level. We noted a similar problem during The relative import attempt in respect of sys.

tryAddPackage looks for each of a.b.c.m, a.b.c.n1 and a.b.c.n2 as a Java class, and then for a.b.c as a Java class, all unsuccessfully Finally, tryAddPackage builds a list (set actually) of all the packages known to the JVM, and their package ancestors, and searches for each of a.b.c.m, a.b.c.n1 and a.b.c.n2 as packages in it, all unsuccessfully. import_next thus returns null indicating that a.b has no module a.

The absolute import attempt

Having tried import_next first, we try import_first next.

This begins an attempt to import a.b.c as an absolute reference. The call is the equivalent of topMod = import_first("a", parentName, "a.b.c", ('m', 'n1', 'n2')), where parentName is an initially empty StringBuilder. This call returns quickly with the first in the chain a as an already-imported module, found by import_next. (The from-list seems superfluous but could used to explore Java class possibilities.)

Compare this with The absolute import attempt: this time the target module is not top-level and we have a from-list. Because here the target a.b.c is a dotted name, import_module_level will continue its descent by a call:

mod = import_logic(topMod, parentName, "b.c", "a.b.c", ("m", "n1", "n2"))

As we have seen, import_logic works by iterated calls to import_next.

In this example, the first such looks for a.b and finds it already imported.

The second looks for a.b.c which is not yet imported. This will go via PyModule.impAttr for the module instance a.b, in which imp.find_module will succeed in returning the module a.b.c. This is the point at which a.b gets a new attribute c referring to the module found.

Satisfying the from-list

The next interesting action rounds out import_module_level with a call to ensureFromList. Effectively this is called with mod = <module 'a.b.c'>, fromlist = ('m', 'n1', 'n2'), name = "a.b.c", recursive = false. It iterates the from-list to make sure, by a call to mod.__findattr__, that each name on it is actually an attribute of the module.

The first name on the list m, is not an attribute, so the call mod.__findattr__("m") ends up in PyModule.__findattr_ex__, which checks for a.b.c.m as a Java package or class via a call equivalent to attr = PySystemState.packageManager.lookupName("a.b.c.m"). If m had been a Java package or class it would have been imported automatically, but as it is, mod.__findattr__("m") returns null to ensureFromList. (Note that a Python module is not automatically imported by attribute access.) Since m was not found as an attribute, ensureFromList calls import_next in order to import it (via impAttr).

Having dealt with m, ensureFromList performs the same process for n1 and n2. These other two calls mod.__findattr__("n1") and mod.__findattr__("n2") find their targets as attributes in the dictionary of a.b.c.

The value finally returned by import_module_level, and hence by the import operation as a whole, is <module 'a.b.c' from 'mylib/a/b/c/__init__$py.class'>. Code generated in the caller m5.py receives this value into a temporary variable, then assigns module globals m, n1, and n2 from its attributes.

Execution of m5.py now continues as part of the import of that at the prompt, and the final state is illustrated by:

>>> a.b.m5.m, a.b.m5.n1, a.b.m5.n2
(<module 'a.b.c.m' from 'mylib\a\b\c\m$py.class'>, 1, 2)

A Python module by relative from-import

Suppose we have a module file mylib/a/b/m6.py like this:

print "Executed: ", __file__
from ..b.c import m, n1, n2
print repr(m)
print n1, n2

This import has the same meaning as that studied in A Python module by from-import, but expressed as a relative module reference. (We could have written .c instead of ..b.c.)

The expected attributes, are set up as before in mylib/a/b/c/__init__.py:

print "Executed: ", __file__
n0, n1, n2, n3, n4 = range(5)

and, as previously, the path down to mylib/a/b/c/m.py is prepared with __init__.py files to create packages a, a.b and a.b.c.

In a fresh interpreter session:

>>> import sys; sys.path[0] = 'mylib'
>>> import a.b.m6
Executed:  mylib\a\__init__$py.class
Executed:  mylib\a\b\__init__$py.class
Executed:  mylib\a\b\m6.py
Executed:  mylib\a\b\c\__init__$py.class
Executed:  mylib\a\b\c\m$py.class
<module 'a.b.c.m' from 'mylib\a\b\c\m$py.class'>
1 2

The first difference from A Python module by from-import occurs where we hit the relative import during execution of from ..b.c import m, n1, n2. The explicit relative import is going to save us searching for a relative to a.b.

We take up the action in import_module_level where name = "b.c", level = 2 and fromlist = ('m', 'n1', 'n2') (a tuple). Notice that name contains none of the leading dots, but they have been counted in level. This happens in get_parent, which deduces first that the __package__ of the importing module is a.b, and then takes off level-1 = 1 further elements to return a. This exists (as it must) in sys.modules, so in import_module_level we have pkgName = "a" and pkgMod = <module 'a'>.

The first import is accomplished by a call that is effectively topMod = import_next(<module 'a'>, parentName, "b", "b.c", ('m', 'n1', 'n2')), where the parentName buffer is initialised to pkgName = "a". a.b is in sys.modules, so that returns quickly with it. topMod = <module 'a.b'> and parentName = "a.b".

import_module_level will continue its descent by a call:

mod = import_logic(topMod, parentName,
         "c", "b.c", "a.b.c", ("m", "n1", "n2"))

Within import_logic, the first call to import_next resolves a.b.c through a call to PyModule.impAttr on <module 'a.b'>. We return into import_module_level with <module 'a.b.c'>, and the sequel (ensureFromList imports m) is the same as in A Python module by from-import.

A Java class in a Java package

Suppose we set up a Java class file mylib/jpkg/j/K.class, where the fully-qualified class name is jpkg.j.K, but no __init__.py (or its compiled form) exists on the path to K. In a fresh interpreter session try:

>>> import sys; sys.path[0] = 'mylib'
>>> from jpkg.j import K

This results in a variable K = <type 'jpkg.j.K'>. Let’s see how this comes about.

The import instruction is compiled into a call importFrom("jpkg.j", ["K"], <current frame>, -1) which goes via the overridable __builtin__.__import__ function, and we land as so often at import_module_level. The level = -1, and from the passed-in globals(), get_parent concludes that there is no package context to the import instruction.

First package

The first port of call is import_next, with mod = null, parentName = new StringBuilder(""), name = "jpkg", outerFullName = "jpkg.j", and fromlist = ("K",).

import_next adds “jpkg” to the parentName buffer, but it is not yet in sys.modules, so it calls find_module("jpkg", "jpkg", null) to look for it as a package. This will succeed, but (perhaps surprisingly) not at the first entry “mylib” on sys.path, because Jython only looks there for Python modules.

Note

When Jython finds that an apparent package in an import corresponds to a directory, but it contains no __init__.py (or compiled __init__$py.class), it issues a warning via the warnings package. This is required Python behaviour, although premature while the directory might still be a Java package.

The first time this happens, it results in a long complex sequence of imports, right in the middle of the behaviour we want to study, so in this experiment it has been disabled by comment markers in the tests described here.

find_module searches sys.path and eventually reaches the special entry __classpath__. For this entry, imp.getPathImporter produces an org.python.core.JavaImporter with its own find_module method, which imp.find_module calls. Note that this method must be a Python callable, and found by the attribute look-up importer.__getattr__("find_module"), because an importer may be defined in Python.

JavaImporter.find_module relies on SysPackageManager.lookupName. That method searches the package index built from the class path and other locations, which contains a tree of PyJavaPackage objects, connected through their Python attributes, and rooted in the nameless top-level package held by the SysPackageManager. (All PyJavaPackages also point back to the SysPackageManager that owns them.) The root <java package > has attributes com, java, org, etc., <java package org> has attributes python, junit, antlr, etc., <java package org.python> has attributes core, modules, etc., and so on.

If necessary SysPackageManager.lookupName would traverse this hierarchy, looking for its target, but the root <java package > does not currently have an attribute jpkg. An attempt to access it lands at PyJavaPackage.__findattr_ex__, which calls __mgr__.packageExists("", "jpkg"), where __mgr__ is the related SysPackageManager. The method packageExists looks first for ./jpkg, and then relative to the elements of sys.path (starting with ./mylib). Because ./mylib/jpkg does not contain any Python files, it is identified as a Java package, and added as a new <java package jpkg> to the __dict__ of the current (nameless) package.

Although jpkg has been added to the index of known Java packages, this is not the same as actually importing it. JavaImporter.find_module returns the importer object itself to imp.find_module, as loader (an allowable strategy in the Python specification). imp.find_module effectively calls loader.load_module("jpkg"), which in turn relies on a second call to SysPackageManager.lookupName. This time, however, <java package jpkg> is found immediately in the top-level, and becomes the module returned to import_next (and added there to sys.modules).

On return into import_module_level, topMod = <java package jpkg>.

Subsequent packages

There is another element to the package name jpkg.j, and so we have not finished. The next call is to import_logic, which as we know operates a loop around import_next to load successive modules. In this case, we are only looking for jpkg.j, so it will make a single call into import_next with mod = <java package jpkg>, parentName = "jpkg", name = "j", outerFullName = "jpkg.j", and fromlist = ("K",). "j" is appended to parentName and since the result jpkg.j is not in sys.modules, and mod != null, the import looks for "j" as an attribute by a call mod.impAttr("j").

This leads us again to PyJavaPackage.__findattr_ex__("j"), but this time the target of the call is not tho nameless top package but jpkg. It has no attribute j yet, so it calls __mgr__.packageExists("jpkg", "j"). As before SysPackageManager.packageExists tries for jpkg.j, first as ./jpkg/j, then along sys.path where it is found at mylib/jpkg/j.

Creation and return of <java package jpkg.j> into import_logic then happens much as described in First package.

An important difference from the first package is the use of impAttr rather than find_module during import_next, which occurs because the search for jpkg.j happens in the context of a known package jpkg. This avoids searching for the module in several places (sys.meta_path, built-ins), necessary when a top-level package is being sought. In particular, as sys.path is traversed, sys.path_hooks is not consulted, so there is no reflective importer.find_module call. Instead, the package manager that found jpkg in the first round, and referenced by that object’s __mgr__ attribute, is consulted directly.

Satisfying the from-list

On return into import_module_level, the remaining action is to process the from-list. This is handled by imp.ensureFromList. The pure Python version of this was described in A Python module by relative from-import, and is effectively a call to mod.__findattr__ for each name in the list (here just “K”). This time the target is <java package jpkg.j> rather than a PyModule.

Again we find ourselves in PyJavaPackage.__findattr_ex__. “K” is not yet a key in that module’s __dict__, so we try to find it as a sub-package by a call __mgr__.packageExists("jpkg.j", "K"). There is no directory ./jpkg/j/K or anywhere on sys.path, so PyJavaPackage.__findattr_ex__ fails to find a package.

PyJavaPackage.__findattr_ex__ then tries to add jpkg.j.K as a class, by a call __mgr__.findClass("jpkg.j", "K"), which relies (eventually) on Py.findClassInternal. That method chooses a particular loader (an instance of SyspathJavaLoader in this case), and loads, and Java-initialises, the class jpkg.j.K using it. PyJavaPackage.__findattr_ex__ then calls addClass to add it to the current PyJavaPackage’s __dict__. For this, the class it must be wrapped as a PyObject, that is, a PyType must be created for jpkg.j.K.

It is sufficient to have added K as an attribute of <java package jpkg.j>, and to return <java package jpkg.j> from import_module_level, because importFromAs will mine it for the values corresponding to the from-list. This is returned as an array (of length 1 here) into the code generated by the compiler, when it compiled the import statement, which will bind a local name K = <type 'jpkg.j.K'>.

A Java class in a Python package

Suppose we set up Java class files mylib/mix/J1.class and mylib/mix/b/K1.class, where the fully-qualified class names are mix.J1 and mix.b.K1, and also create mylib/mix/__init__.py and mylib/mix/b/__init__.py (or compiled forms). This makes mix and mix.b Python packages, but containing Java classes.

In a fresh interpreter session try:

>>> import sys; sys.path[0] = 'mylib'
>>> from mix.b import K1
Executed:  mylib\mix\__init__$py.class
Executed:  mylib\mix\b\__init__$py.class

This results in a variable K1 = <type 'mix.b.K1'>, by a sequence of events very similar to that in A Python module by from-import, where we traced from a.b.c import m, n1, n2.

The import of module mix.b is identical to that of a.b.c, where afterwards, in ensureFromList, the attribute m was added to the module a.b.c, by being discovered as a sub-module. Here, however, K1 becomes an attribute of mix.b by being discovered as a class. It is only this last part that breaks new ground for us.

We may satisfactorily take up the action in import_module_level where Jython calls the equivalent of:

ensureFromList(mod, ("K1",), "mix.b")

and mod = <module 'mix.b'>. Both mix and mix.b have been imported as Python modules.

At this stage, ensureFromList has no indication that K1 is not a simple attribute. It calls mod.__findattr__("K1"), that is, PyModule.__findattr_ex__, which fails to find "K1" in the module dictionary. PyModule.__findattr_ex__ then effectively calls PySystemState.packageManager.lookupName("mix.b.K1") to look for a Java package or class. This drives the creation of PyJavaPackage objects for mix and mix.b, through calls to packageExists in the Java package indexer.

Note

A problem is latent here in that mix is only recognised as a Java package because it contains J1.class. The critereon to recognise a directory as a Java package is that is contain a Java class or no Python. A more complete strategy in this case would be to look for the fully-qualified class first, then create all intervening packages. However, it would require a search to recognise a package as an import target when only the package is named and no contained class.

When lookupName reaches mix.b.K1, PyJavaPackage.__findattr_ex__, called on the Java package mix.b, fails to find a sub-package K1, and moves on to call c = __mgr__.findClass("mix.b", "K1"). This succeeds, going via the SysPathJavaLoader, which loads and Java-initialises class mix.b.K1, and having found it that way, adds it as an attribute to the Python module mix.b. Action then completes as in Satisfying the from-list.

A Java package in a Python package

Suppose we set up Java class files mylib/mix/J1.class and mylib/mix/j/K2.class, where the fully-qualified class names are mix.J1 and mix.j.K2, and also create mylib/mix/__init__.py (or its compiled form). This makes mix a Python package, while mix.j is a Java package.

In a fresh interpreter session try:

>>> import sys; sys.path[0] = 'mylib'
>>> import mix
Executed:  mylib\mix\__init__$py.class
>>> mix
<module 'mix' from 'mylib\mix\__init__$py.class'>
>>> mix.j, mix.j.K2
(<java package mix.j>, <type 'mix.j.K2'>)

We see operating here the automagical import of Java packages and classes, starting with a Python module. Let’s see how this comes about.

The import of mix to the interpreter session occurs as a simplified form of the case studied in A Python module in a Python program. The compiler has produced effectively importOne("mix.j", <current frame>, -1), and within import_module_level, the first call to import_next returns <module 'mix' from 'mylib/mix/__init__$py.class'>.

The first interesting process is the resolution of the expression mix.j. This compiles to a call like __getattr__("j"), and soon lands at PyModule.__findattr_ex__. There, a dictionary look-up fails, and so __findattr_ex__ looks for a package by the full name using PySystemState.packageManager.lookupName("mix.j"). That begins by consulting the nameless, top-level Java package for attribute mix, landing at PyJavaPackage.__findattr_ex__. "mix" is not a key in its dictionary, so __findattr_ex__ calls __mgr__.packageExists("", "mix"), which drives the addition of the missing PyJavaPackage("mix").

Note

mix is accepted as a Java package, despite previous recognition as a Python package, because mylib/mix contains a Java class file. We have two disconnected representations of mix.

The second iteration of lookupName looks up attribute j in Java package mix. Once more PyJavaPackage.__findattr_ex__ calls __mgr__.packageExists("", "mix"), which drives the addition of the missing PyJavaPackage("mix.j"), ultimately returned by __getattr__ to the console session. mix.j is accepted as a Java package, because mylib/mix/j contains no Python files.

In the resolution of mix.j.K2 differs only slightly. This time __getattr__ finds j as a PyJavaPackage within the dictionary of mix. Then within PyJavaPackage("mix.j") the search for K2 is completed by a call to findClass("mix.j.K2").