NtupleSchema¶
- class atlas_schema.schema.NtupleSchema(base_form, version='latest')[source]¶
Bases:
BaseSchemaThe schema for building ATLAS ntuples following the typical centralized formats.
This schema is built from all branches found in a tree in the supplied file, based on the naming pattern of the branches. This naming pattern is typically assumed to be
{collection:str}_{subcollection:str}_{systematic:str}
- where:
collectionis assumed to be a prefix with typical characters, following the regex[a-zA-Z][a-zA-Z0-9]*; that is starting with a case-insensitive letter, and proceeded by zero or more alphanumeric characters,subcollectionis assumed to be anything with typical characters (allowing for underscores) following the regex[a-zA-Z_][a-zA-Z0-9_]*; that is starting with a case-insensitive letter or underscore, and proceeded by zero or more alphanumeric characters including underscores, andsystematicis assumed to be eitherNOSYSto indicate a branch with potential systematic variariations, or anything with typical characters (allowing for underscores) following the same regular expression as thesubcollection.
Here, a collection refers to the top-level entry to access an item - a collection called
elwill be accessible under theelattributes viaevents['el']orevents.el. A subcollection calledptwill be accessible under that collection, such asevents['el']['pt']orevents.el.pt. This is the power of the schema providing a more user-friendly (and programmatic) access to the underlying branches.The above logic means that the following branches below will be categorized as follows:
branch
collection
subcollection
systematic
'eventNumber''eventNumber'NoneNone'runNumber''runNumber'NoneNone'el_pt_NOSYS''el''pt''NOSYS''jet_cleanTightBad_NOSYS''jet''cleanTightBad''NOSYS''jet_select_btag_NOSYS''jet''select_btag''NOSYS''jet_e_NOSYS''jet''e''NOSYS''truthel_phi''truthel''phi'None'truthel_pt''truthel''pt'None'ph_eta''ph''eta'None'ph_phi_SCALE__1up''ph''phi''SCALE__1up''mu_TTVA_effSF_NOSYS''mu''TTVA_effSF''NOSYS''recojet_antikt4PFlow_pt''recojet''antikt4PFlow_pt''NOSYS''recojet_antikt10UFO_m''recojet''antikt10UFO_m'NoneSometimes this logic is not what you want, and there are ways to teach
NtupleSchemahow to group some of these better for atypical cases. We can address these case-by-case.Singletons
Sometimes you have particular branches that you don’t want to be treated as a collection (with subcollections). And sometimes you will see warnings about this (see FAQ). There are some pre-defined
singletonsstored underevent_ids, and these will be lazily treated as a _singleton_. For other cases where you add your own branches, you can additionally extend this class to add your ownsingletons:from atlas_schema.schema import NtupleSchema class MySchema(NtupleSchema): singletons = {"RandomRunNumber"}
and use this schema in your analysis code. The rest of the logic will be handled for you, and you can access your singletons under
events.RandomRunNumberas expected.Mixins (collections, subcollections)
In more complicated scenarios, you might need to teach
NtupleSchemahow to handle collections that end up having underscores in their name, or other characters that make the grouping non-trivial. In some other scenarios, you want to tell the schema to assign a certain set of behaviors to a collection - rather than the defaultatlas_schema.methods.Particlebehavior. This is wheremixinscomes in. Similar to howsingletonsare handled, you extend this schema to include your ownmixinspointing them at one of the behaviors defined inatlas_schema.methods.Let’s demonstrate both cases. Imagine you want to have your
truthelcollections above treated asatlas_schema.methods.Electron, then you would extend the existingmixins:from atlas_schema.schema import NtupleSchema class MySchema(NtupleSchema): mixins = {"truthel": "Electron", **NtupleSchema.mixins}
Now,
events.truthelwill give you arrays zipped up withatlas_schema.methods.Electronbehaviors.If instead, you run into problems with mixing different branches in the same collection, because the default behavior of this schema described above is not smart enough to handle the atypical cases, you can explicitly fix this by defining your collections:
from atlas_schema.schema import NtupleSchema class MySchema(NtupleSchema): mixins = { "recojet_antikt4PFlow": "Jet", "recojet_antikt10UFO": "Jet", **NtupleSchema.mixins, }
Now,
events.recojet_antikt4PFlowandevents.recojet_antikt10UFOwill be separate collections, instead of a singleevents.recojetthat incorrectly merged branches from each of these collections.Attributes
-
default_behavior:
ClassVar[str] = 'NanoCollection'¶ default behavior to use for any collection (default
"NanoCollection", fromcoffea.nanoevents.methods.base.NanoCollection)
-
docstrings:
ClassVar[dict[str,str]] = {'charge': 'charge', 'eta': 'pseudorapidity', 'mass': 'invariant mass [MeV]', 'met': 'missing transverse energy [MeV]', 'phi': 'azimuthal angle', 'pt': 'transverse momentum [MeV]'}¶ docstrings to assign for specific subcollections across the various collections identified by this schema
-
error_missing_event_ids:
ClassVar[bool] = False¶ Treat missing event-level branches as error instead of warning (default is
False)
-
event_ids:
ClassVar[set[str]] = {'actualInteractionsPerCrossing', 'averageInteractionsPerCrossing', 'dataTakingYear', 'eventNumber', 'lumiBlock', 'mcChannelNumber', 'mcEventWeights', 'runNumber'}¶ all event IDs to expect in the dataset
-
event_ids_data:
ClassVar[set[str]] = {'actualInteractionsPerCrossing', 'averageInteractionsPerCrossing', 'dataTakingYear', 'lumiBlock'}¶ event IDs to expect in data datasets
-
event_ids_mc:
ClassVar[set[str]] = {'eventNumber', 'mcChannelNumber', 'mcEventWeights', 'runNumber'}¶ event IDs to expect in MC datasets
-
identify_closest_behavior:
ClassVar[bool] = True¶ Determine closest behavior for a given branch or treat branch as
default_behavior(default isTrue)
-
mixins:
ClassVar[dict[str,str]] = {'el': 'Electron', 'jet': 'Jet', 'met': 'MissingET', 'mu': 'Muon', 'pass': 'Pass', 'ph': 'Photon', 'trigPassed': 'Trigger', 'weight': 'Weight'}¶ mixins defining the mapping from collection name to behavior to use for that collection
-
singletons:
ClassVar[set[str]] = {}¶ additional branches to pass-through with no zipping or additional interpretation (such as those stored as length-1 vectors)
Methods
- _discover_systematics(branch_forms, collections, subcollections)[source]¶
Extract systematic variations from branch names.
- classmethod suggested_behavior(key, cutoff=0.4)[source]¶
Suggest e behavior to use for a provided collection or branch name.
Default behavior:
NanoCollection.Note
If
identify_closest_behaviorisFalse, then this function will return the default behaviorNanoCollection.Warning
If no behavior is found above the cutoff score, then this function will return the default behavior.
- Parameters:
- Returns:
suggested behavior to use by string
- Return type:
Example
>>> from atlas_schema.schema import NtupleSchema >>> NtupleSchema.suggested_behavior("truthjet") 'Jet' >>> NtupleSchema.suggested_behavior("SignalElectron") 'Electron' >>> NtupleSchema.suggested_behavior("generatorWeight") 'Weight' >>> NtupleSchema.suggested_behavior("aVeryStrangelyNamedBranchWithNoMatch") 'NanoCollection'