NtupleSchema

class atlas_schema.schema.NtupleSchema(base_form, version='latest')[source]

Bases: BaseSchema

The schema for building ATLAS ntuples following the typical centralized formats.

This schema is built from all branches found in a tree in the supplied file, based on the naming pattern of the branches. This naming pattern is typically assumed to be

{collection:str}_{subcollection:str}_{systematic:str}
where:
  • collection is assumed to be a prefix with typical characters, following the regex [a-zA-Z][a-zA-Z0-9]*; that is starting with a case-insensitive letter, and proceeded by zero or more alphanumeric characters,

  • subcollection is assumed to be anything with typical characters (allowing for underscores) following the regex [a-zA-Z_][a-zA-Z0-9_]*; that is starting with a case-insensitive letter or underscore, and proceeded by zero or more alphanumeric characters including underscores, and

  • systematic is assumed to be either NOSYS to indicate a branch with potential systematic variariations, or anything with typical characters (allowing for underscores) following the same regular expression as the subcollection.

Here, a collection refers to the top-level entry to access an item - a collection called el will be accessible under the el attributes via events['el'] or events.el. A subcollection called pt will be accessible under that collection, such as events['el']['pt'] or events.el.pt. This is the power of the schema providing a more user-friendly (and programmatic) access to the underlying branches.

The above logic means that the following branches below will be categorized as follows:

branch

collection

subcollection

systematic

'eventNumber'

'eventNumber'

None

None

'runNumber'

'runNumber'

None

None

'el_pt_NOSYS'

'el'

'pt'

'NOSYS'

'jet_cleanTightBad_NOSYS'

'jet'

'cleanTightBad'

'NOSYS'

'jet_select_btag_NOSYS'

'jet'

'select_btag'

'NOSYS'

'jet_e_NOSYS'

'jet'

'e'

'NOSYS'

'truthel_phi'

'truthel'

'phi'

None

'truthel_pt'

'truthel'

'pt'

None

'ph_eta'

'ph'

'eta'

None

'ph_phi_SCALE__1up'

'ph'

'phi'

'SCALE__1up'

'mu_TTVA_effSF_NOSYS'

'mu'

'TTVA_effSF'

'NOSYS'

'recojet_antikt4PFlow_pt'

'recojet'

'antikt4PFlow_pt'

'NOSYS'

'recojet_antikt10UFO_m'

'recojet'

'antikt10UFO_m'

None

Sometimes this logic is not what you want, and there are ways to teach NtupleSchema how to group some of these better for atypical cases. We can address these case-by-case.

Singletons

Sometimes you have particular branches that you don’t want to be treated as a collection (with subcollections). And sometimes you will see warnings about this (see FAQ). There are some pre-defined singletons stored under event_ids, and these will be lazily treated as a _singleton_. For other cases where you add your own branches, you can additionally extend this class to add your own singletons:

from atlas_schema.schema import NtupleSchema


class MySchema(NtupleSchema):
    singletons = {"RandomRunNumber"}

and use this schema in your analysis code. The rest of the logic will be handled for you, and you can access your singletons under events.RandomRunNumber as expected.

Mixins (collections, subcollections)

In more complicated scenarios, you might need to teach NtupleSchema how to handle collections that end up having underscores in their name, or other characters that make the grouping non-trivial. In some other scenarios, you want to tell the schema to assign a certain set of behaviors to a collection - rather than the default atlas_schema.methods.Particle behavior. This is where mixins comes in. Similar to how singletons are handled, you extend this schema to include your own mixins pointing them at one of the behaviors defined in atlas_schema.methods.

Let’s demonstrate both cases. Imagine you want to have your truthel collections above treated as atlas_schema.methods.Electron, then you would extend the existing mixins:

from atlas_schema.schema import NtupleSchema


class MySchema(NtupleSchema):
    mixins = {"truthel": "Electron", **NtupleSchema.mixins}

Now, events.truthel will give you arrays zipped up with atlas_schema.methods.Electron behaviors.

If instead, you run into problems with mixing different branches in the same collection, because the default behavior of this schema described above is not smart enough to handle the atypical cases, you can explicitly fix this by defining your collections:

from atlas_schema.schema import NtupleSchema


class MySchema(NtupleSchema):
    mixins = {
        "recojet_antikt4PFlow": "Jet",
        "recojet_antikt10UFO": "Jet",
        **NtupleSchema.mixins,
    }

Now, events.recojet_antikt4PFlow and events.recojet_antikt10UFO will be separate collections, instead of a single events.recojet that incorrectly merged branches from each of these collections.

Parameters:

Attributes

default_behavior: ClassVar[str] = 'NanoCollection'

default behavior to use for any collection (default "NanoCollection", from coffea.nanoevents.methods.base.NanoCollection)

docstrings: ClassVar[dict[str, str]] = {'charge': 'charge', 'eta': 'pseudorapidity', 'mass': 'invariant mass [MeV]', 'met': 'missing transverse energy [MeV]', 'phi': 'azimuthal angle', 'pt': 'transverse momentum [MeV]'}

docstrings to assign for specific subcollections across the various collections identified by this schema

error_missing_event_ids: ClassVar[bool] = False

Treat missing event-level branches as error instead of warning (default is False)

event_ids: ClassVar[set[str]] = {'actualInteractionsPerCrossing', 'averageInteractionsPerCrossing', 'dataTakingYear', 'eventNumber', 'lumiBlock', 'mcChannelNumber', 'mcEventWeights', 'runNumber'}

all event IDs to expect in the dataset

event_ids_data: ClassVar[set[str]] = {'actualInteractionsPerCrossing', 'averageInteractionsPerCrossing', 'dataTakingYear', 'lumiBlock'}

event IDs to expect in data datasets

event_ids_mc: ClassVar[set[str]] = {'eventNumber', 'mcChannelNumber', 'mcEventWeights', 'runNumber'}

event IDs to expect in MC datasets

identify_closest_behavior: ClassVar[bool] = True

Determine closest behavior for a given branch or treat branch as default_behavior (default is True)

mixins: ClassVar[dict[str, str]] = {'el': 'Electron', 'jet': 'Jet', 'met': 'MissingET', 'mu': 'Muon', 'pass': 'Pass', 'ph': 'Photon', 'trigPassed': 'Trigger', 'weight': 'Weight'}

mixins defining the mapping from collection name to behavior to use for that collection

singletons: ClassVar[set[str]] = {}

additional branches to pass-through with no zipping or additional interpretation (such as those stored as length-1 vectors)

warn_missing_crossrefs: ClassVar[bool] = True

Methods

__init__(base_form, version='latest')[source]
Parameters:
_build_collections(field_names, input_contents)[source]
Parameters:
Return type:

tuple[KeysView[str], ValuesView[dict[str, Any]], list[str]]

_discover_systematics(branch_forms, collections, subcollections)[source]

Extract systematic variations from branch names.

Returns:

Set of all systematic variation names found in branches

Return type:

set

Parameters:
classmethod behavior()[source]

Behaviors necessary to implement this schema

Returns:

an awkward.behavior dictionary

Return type:

dict[str | tuple[‘*’, str], type[awkward.Record]]

classmethod suggested_behavior(key, cutoff=0.4)[source]

Suggest e behavior to use for a provided collection or branch name.

Default behavior: NanoCollection.

Note

If identify_closest_behavior is False, then this function will return the default behavior NanoCollection.

Warning

If no behavior is found above the cutoff score, then this function will return the default behavior.

Parameters:
  • key (str) – collection name to suggest a matching behavior for

  • cutoff (float) – o ptional argument cutoff (default 0.4) is a float in the range [0, 1]. Possibilities that don’t score at least that similar to key are ignored.

Returns:

suggested behavior to use by string

Return type:

str

Example

>>> from atlas_schema.schema import NtupleSchema
>>> NtupleSchema.suggested_behavior("truthjet")
'Jet'
>>> NtupleSchema.suggested_behavior("SignalElectron")
'Electron'
>>> NtupleSchema.suggested_behavior("generatorWeight")
'Weight'
>>> NtupleSchema.suggested_behavior("aVeryStrangelyNamedBranchWithNoMatch")
'NanoCollection'
classmethod v1(base_form)[source]

Build the NtupleEvents

For example, one can use NanoEventsFactory.from_root("file.root", schemaclass=NtupleSchema.v1) to ensure NanoAODv7 compatibility.

Parameters:

base_form (dict[str, Any])

Return type:

Self