WHISK: An Uncore Architecture for Dynamic Information Flow Tracking in Heterogeneous Embedded SoCs ⇤
Joël Porquet and Department of Computer Science, Columbia University, NY, NY 10027 E-mail:
In this paper, we describe for the first time, how Dynamic Information Flow Tracking (DIFT) can be implemented for heterogeneous designs that contain one or more on-chip ac- celerators attached to a network-on-chip. We observe that implementing DIFT for such systems requires holistic plat- form level view, i.e., designing individual components in the heterogeneous system to be capable of supporting DIFT is necessary but not su cient to correctly implement full- system DIFT. Based on this observation we present a new system architecture for implementing DIFT, and also de- scribe wrappers that provide DIFT functionality for third- party IP components. Results show that our implementa- tion minimally impacts performance of programs that do not utilize DIFT, and the price of security is constant for modest amounts of tagging and then sub-linearly increases with the amount of tagging.
Categories and Subject Descriptors
Copyright By PowCoder代写 加微信 powcoder
C.0 [General]: Hardware-Software Interfaces; D.4.6 [Operating Systems]: Security and protection—Information flow con- trols
General Terms
Security, Simulation, Evaluation, Performance
Security, System-on-Chip, Network-on-Chip, Heterogeneous designs, Hardware accelerators, Dynamic Information Flow Tracking
⇤This work was supported by grants FA 99500910389 (AFOSR), FA 865011C7190 (DARPA), FA 87501020253 (DARPA), CCF/TC 1054844 (NSF), . Sloan fellow- ship, and gifts from Microsoft Research, WindRiver Corp, Xilinx and Synopsys Inc. Any opinions, findings, conclu- sions and recommendations do not reflect the views of the US Government or commercial entities.
International Conference on Hardware/Software Codeisng and System Syn- thesis, September 29 – October 4 2013, Montreal, Canada
1. INTRODUCTION
Dynamic Information Flow Tracking (DIFT) is a valuable system primitive that finds widespread use in security, pri- vacy, and program analysis applications. For example, DIFT has been used to ensure that private data does not leave a smart phone, detect security attacks such as SQL injection or bu↵er overflows or identify fault locations in programs when they fail [1, 2]. To support DIFT in a computing sys- tem each data item in a program is enhanced to include a tag that identifies some property of that data item. Then during program execution, as old data items are modified the properties of their tags are also modified, or as new data items are produced they get new property tags according to some DIFT policy. The specific policy for creating and propagating tags is based on how DIFT is used: in a privacy application, for instance, data from the GPS receiver may be tagged as confidential, and this data and derivatives may be unsafe to leave the phone through any network interface. The size of the tags used in DIFT can vary widely depend- ing on the application, and range from 1-bit taint tags for security to multi-byte object tags that specify data type or object evanescence [3].
The central question we address in this paper is: What SoC platform architecture will allow us to easily integrate DIFT support?
The question of platform architecture for DIFT has not been addressed previously. The main focus so far in the DIFT research area has been how to enhance processor ar- chitectures with DIFT [4, 5, 6, 7, 8]. In this paper we ask how we can design the DIFT mechanism at the platform level so that it is simple for third-party IP components, be it accelerators, controllers or even special cores, to be easily integrated in the SoC without intrusive changes. Towards this goal, we provide a set of recommendations for platform designers to implement DIFT as a general hardware service.
We will explain the capability we wish to provide with a simple but realistic example. Let us say we want to build a SoC with DIFT support. Assume that our SoC only has a general-purpose core and a controller, say a DMA engine. Let us also say that on this system the data and tags for the data are stored in di↵erent locations in DRAM memory (for e ciency reasons). If the DMA engine is unaware of the separation of tags and data it will miss the tags associated with the data during copies and thus break information flow tracking. Clearly the DMA engine needs to be aware of the tag storage mechanism, i.e., it should know how to compute the address of the tags given the data address. Now, in our simple SoC, instead of a DMA engine, let us say we had a
compression accelerator (or any other computational accel- erator that modifies the input data). In addition to being aware of the tag storage, it should also be capable of propa- gating the tags through the datapath within the accelerator. In this paper, we show how to extend the platform archi- tecture so that any SoC component can easily find the tags stored in memory; the issue of tag propagation is orthogonal and not described in this paper. However, as it will become clear later in the paper for many common third-party IP components, the implementation of tag propagation logic is straightforward, or in some cases may not even require modifications to the accelerator.
To easily integrate third-party IP components in a DIFT aware platform we propose a new architecture called WHISK. In our architecture the data and tags are stored separately in memory to keep a low area overhead and improve flexibility. The salient features of our architecture are (Figure 1):
(a) Implicit Addressing of Tags and Data: We propose an architecture in which a NoC client on the SoC does not have to know anything about the tag layout or storage. In- stead of sending a pair of addresses to access a data and its associated tag, which forces the clients to know the associa- tion mechanism between data and tags, in WHISK we allow clients to send only the data address and automatically re- ceive or send the requested data along with its associate tag in the same packet. This strategy lowers the complexity of adapting DIFT to IP components since tags are automat- ically and transparently accessed with data. Further since the tag calculation is isolated from the clients, the system supports flexible tag layout and storage in memory, allowing DIFT to be easily customized for di↵erent applications.
(b) Atomic transmission: While the data and tags are stored separately in memory to keep a low area overhead, they are transported together from memory through the in- terconnect instead of being fetched separately as is done in single processor DIFT implementations. This coupled atomic transport decreases the complexity of adapting ac- celerators to DIFT by avoiding subtle memory coherence and consistency problems between tags and data.
(c) Pipelined transfer: In our WHISK NoC protocol we send the data from/to memory one cycle after the tag. This has three main benefits. First it reduces the area overhead and design complexity since the data and tags can be sent on the same interconnect. Second the tags are already available at the clients when the data arrives at the client mitigating or completely avoiding serialization latencies during DIFT processing. Finally, since the tag and data use the same interconnect, the tag can be arbitrarily large: it can be as large as the data or if needed even larger by sending the tag over multiple packets. This allows flexible implementations of DIFT policies.
(d) Configurable, multi-granular caching: In DIFT appli- cations, often large portions of nearby data items tend to have the same tag properties. This property can be used to reduce the area overhead of tags by representing common properties for many addresses using one tag instead of one tag per address. WHISK supports this multi-granular tag optimization. Further, in WHISK we allow clients to cache these tags to allow temporal reuse of tags to avoid latency overhead of tag accesses. These caches are also implicitly addressed with data.
(e) Standard wrapper for SoC clients: Finally, and per- haps most importantly, we show how all of the above fea-
− tag page table cache − tag caches
− tag (de−)serialization
Figure 1: Overview of WHISK: red regions denote tag ex- tensions. The callout boxes describe functionality.
tures – implicit addressing, atomic transmission, pipelined transfer and tag caching – can be built in a way that allows these functions to be wrapped around existing clients in the SoC with minimum changes to the NoC or the SoC mem- ory architecture (Figure 2). Our wrappers also handle OS interrupt processing. The wrappers are placed on the path between the NoC and the clients.
To examine the practicality of WHISK we developed a cycle accurate SoC in SystemC. We were able to integrate di↵erent types of accelerators with DIFT into the system (e.g., compression, cryptography). We were also able to boot an embedded Operating System and run full appli- cations. To test the utility of DIFT as a service we mea- sure the impact of DIFT for di↵erent amounts of tagging by varying the fraction of the program’s input data that can be tagged, and the width of the tags. This is di↵erent from prior works where overheads of DIFT were measured for specific applications of DIFT such as bu↵er overflows. Our experimental results with microbenchmarks show that WHISK exhibits security-proportionality: the performance overhead is relatively proportional to the amount of tag- ging in the system. When running full software applications, however, the performance overhead stays almost constant, i.e., is less impacted by the amount of tagging, because the cost of WHISK is amortized with microarchitectural opti- mizations, and also because of tag aggregation and caching. Finally, when active but not used, i.e., when the amount of tagging is null, the overhead of WHISK is negligible.
2. BACKGROUND
There are generally two ways to store tags in DIFT archi- tectures.
Coupled scheme.
In the coupled scheme each data element is physically stored with its associated tag [5, 6] and they are always
− tag page table cache − tag (de−)coupling
tag propagation logic
Interconnect data tag
Accelerator
Accelerator
tag propagation logic
− tag page table cache − tag (de−)serialization
Memory Controller
transmitted atomically throughout the system. This means that the main memory, as well as all the components along the memory chain, the system bus, processors datapaths, caches and registers are all made wider to accommodate the tag. Although this scheme seems to be the easiest solution in terms of implementation complexity, it necessitates special adaptations, as highlighted by Venkatramani et al.[7]. For example, this scheme requires use of non-standard memory bank sizes and modification to processor instructions to al- low them to access and manipulate the tag bits. Further, tag storage is wasted when tagging is not needed, and new pro- cessor instructions are required to access and manipulate tag bits independently (e.g. for initialization). Consequently the area overhead can be tremendous especially given the need for using large tags (from at least 8 bits [9] and up to 32 bits [10, 11]) as the coupled scheme adds a +3% overhead for each additional tag bit (for a word-granularity policy).
Decoupled scheme.
The decoupled scheme consists in dedicating a portion of memory to store tags separately from the data. Tags and data are thus both in memory but in two separate regions. The separation involves the use of an association algorithm to find the associated tag of a data when reading or writing this data. Some of the existing approaches [7, 12] store tags as a bitmap in a protected area of applications’ virtual ad- dress space and perform the association between data and tags via a simple index calculation. Another type of decou- pled scheme suggests that tags protect the whole address space by the means of a multi-level page table [4, 13]. For instance, code segments tend to share the same tag for dif- ferent DIFT applications and as such may not need such a fine tagging granularity. The first level of this table typically covers the address space at page-level granularity: pages can therefore be considered as fully tagged or untagged, or as partially tagged. In the latter case, a second level in the tag page table allows to protect a data page at a finer granularity (e.g. word- or byte-level). Such multi-granular approaches are able to accommodate flexible tagging requirements at runtime, reducing the area overhead but still allowing fine granularity when needed.
Discussion.
In prior implementations the complexity of the decou- pled scheme comes mostly from the fact that data and tags must be retrieved separately since they are stored in di↵er- ent memory regions. The association algorithm is usually implemented in the processor core, and generally involves using additional TLBs for tags and tag caches for perfor- mance improvements. A software/hardware mechanism is also required to detect and process when tags have to be ag- gregated: for example, when storing a tagged data element in an untagged page for the first time, the tagging granu- larity must be refined by creating a second level in the tag page table for that particular data page.
Using a decoupled scheme also raises the possibility of in- coherence and inconsistency between tag and data in multi- cores. In proposals where tags are manipulated in the same pipeline stage as data [4, 6] this issue is avoided by ensur- ing that when writing or reading a data and its associated tag, both are available in their respective cache and the op- eration completes in one cycle. For proposals that perform tag manipulation in di↵erent pipeline stages than the stage
reading/operating on data [7, 12], this issue is more chal- lenging but was addressed by Venkatramani et al. [7]. Note that architectures implementing the coupled scheme are not a↵ected by coherence and consistency issues since data and tags are always paired together in the system.
In general DIFT also requires system level support to pre- serve tags when data leave memory. Crandall et al.[5] store the tags in kernel memory when a data page is swapped out on disk so that they can be restored when the data page is swapped back in. Some software implementations [14, 15] modify the file system so that when data is written to a file, tags are stored as well.
DIFT in SoCs.
SoC design issues and requirements have never been con- sidered in past DIFT proposals. Yet, such systems are spread- ing, and there is strict emphasis on low cost and low design complexity. In our approach, we accommodate these re- quirements by adopting a hybrid scheme of the coupled and decoupled schemes, where data and tags are transported to- gether between memory and cores and then split just before storage in memory. This hybrid scheme guarantees reading and writing atomicity and thus consistency between data and tags. And tagging granularity refinements are detected at the earliest, by putting small hardware modules in front of the clients. Further many of these changes can be wrapped over existing clients enabling easy integration of security functionality.
3. THE WHISK ARCHITECTURE
In this section we describe our DIFT architecture called WHISK.
Tag storage.
WHISK uses a two-level table for tag storage in DRAM. Both levels are indexed by physical memory address and re- turn the tag associated with that address. The first level is a linear array containing tags for each physical memory page, indicating if the page is fully untagged, fully (uni- formly) tagged, or partially tagged. The second level holds tags for pages at word granularity, and can be allocated on demand when a page becomes partially (non-uniformly) tagged. The page table is statically allocated and guaran- teed to be present in physical memory, which is realistic for many embedded system-on-chip designs.
Hardware support for tag management.
In WHISK the mechanism for associating data with tags is integrated in a small hardware module in front each net- work client such as an accelerator or a memory controller (See Figure 2). Conceptually, on a read operation from a client, this module retrieves the requested data at the spec- ified address, and the tag page table is accessed to find the tag corresponding to the given address, before giving it back the client that initiated the request. The same concept is reversed for write operations: if the tags have already been allocated they are updated by the hardware module other- wise the modules issue an interrupt to the OS to allocate tags.
Some clients may optionally include caching for page level tags in the wrappers. These Page Table Caches (PTCs) re- turn for a given memory access whether the containing page
Implicit Translation Engine
Page Table Cache (Optional)
Network Serializer/ Deserializer
Figure 2: DIFT Wrapper
is fully untagged or tagged (along with the tag value), or partially tagged. In the former two cases, the tags need not be explicitly accessed since the tag property of the re- quested data is already known. It is only for partially tagged pages that tags might need to be fetched from memory. Such caching can significantly reduce the tra c overhead. Since the tags are present in the wrapper, in WHISK, the DIFT module implicitly retrieves the tags when the data is ac- cessed i.e., the clients are not able to identify the address mapping between data and tags.
In WHISK the communication interface is widened with an extra two-bit field which encodes three possible com- mands. The NONE command instructs the memory to return or to expect only the data. The WITH command instructs the memory to return or expect a cache line along with its associated tag. The ONLY command instructs the memory to return the tags associated to a certain cache line. Since in most cases data is expected to be in fully tagged or un- tagged pages, a large majority of read or write operations should cause only one memory access by using a NONE com- mand and packets will not need to be extended with tags. This strategy reduces the tra c overhead on the intercon- nect and the number of memory accesses, lowering both the energy consumption and the performance overhead.
Software support for tag management.
There are three mandatory tasks that the OS has to per- form. The first task is to allocate the tag page table, usually during the system boot. The second task is to configure the di↵erent PTCs of the system, namely the register that holds the base address of the tag page table. The last task con- cerns writes to untagged pages: in this case, the OS must provide a exception/interruption handler to process those requests, by creating on-demand a second level in the tag page table.
IP Integration in WHISK.
From an implementation standpoint, WHISK introduces wrappers in front of NoC clients (IP blocks), which provide two functions. First, the wrapper abstracts away the decou- pled nature of the tags by acting as a serializer/deserializer for write/read operations. Second, by hosting a PTC, the wrapper is able to obviate page table accesses when tags are known (using the NONE command), and also to recognize, at the earliest point, when a page refinement is required.
Tag propagation and tagging policies.
Tag management provides the infrastructure for manag- ing tags; another important part of DIFT deals with the tag propagation, that is how cores and accelerators compute and propagate tags through their internal storage location, and the policies for such propagation. While WHISK does not focus on that second part, WHISK is compatible with exist- ing mechanisms [16]. Our design is rather agnostic to those propagation policies as long as data and tags are conceptu-
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com